Free-text prompted 3D masks

Use VoxTell in MedSeg: type the structure, inspect the mask

VoxTell is a 3D vision-language segmentation model. In MedSeg, it is useful when you want a candidate mask for a target that is easier to describe in text than to find in a fixed model menu.

Try MedSeg Compare with nnInteractive

CT MRI PET Availability depends on worker

MedSeg editor with segmented medical imaging views. — VoxTell outputs become normal MedSeg masks that can be opened, corrected, measured, and exported.

Watch VoxTell in MedSeg

This silent demo shows the idea: type the target, run the model, and review the returned segmentation as a normal MedSeg mask.

Open the video on YouTube

What to use VoxTell for

VoxTell is best treated as fast candidate-mask generation for research, not as a final unchecked annotation.

Unusual targets

Try prompts such as "right vestibular schwannoma", "aortic arch", "aneurysm", or "portal vein" when a fixed-class model does not fit.

Exploratory cohorts

Use text prompts to quickly test whether a structure or lesion is separable before committing to manual labels or custom training.

First-pass labels

Generate a rough mask, correct it in the editor, then use the corrected result as training data for a repeatable model.

How VoxTell appears in MedSeg

It is a project-view inference action, unlike nnInteractive, which lives inside the editor.

Select one or more series.
Use a 3D CT, MRI, or PET volume. Good orientation and spacing matter.
Choose VoxTell from the model menu.
The entry appears only when the warm VoxTell worker and checkpoint are available.
Enter one or more prompts.
Start specific. If "lesions" is too broad, try a singular, anatomical, or location-aware phrase.
Review the returned mask.
Open the segmentation in the editor, correct errors, then measure, export, or mark it for training.

Where it fits beside other MedSeg models

Need	Best starting point	Why
Common whole-body CT anatomy	TotalSegmentator	Fixed labels are usually more predictable and easier to validate.
Target visible but hard to describe precisely	nnInteractive	Clicks and scribbles give direct spatial guidance.
Target can be described naturally	VoxTell	Free text can cover uncommon anatomy or pathology without choosing a predefined class.
Same target across many cases	Custom nnU-Net	Corrected labels become a repeatable project-specific model.

Important limitations

VoxTell checkpoints are for the free/non-commercial research workflow unless separately licensed. Outputs are prompt-sensitive and should be reviewed. If the VoxTell entry is missing, the worker is offline, busy, or not enabled on that deployment.

VoxTell GitHub VoxTell arXiv VoxTell model card MedSeg Terms