Instructions to use CYX1998/Meissa-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use CYX1998/Meissa-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="CYX1998/Meissa-4B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("CYX1998/Meissa-4B") model = AutoModelForImageTextToText.from_pretrained("CYX1998/Meissa-4B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use CYX1998/Meissa-4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "CYX1998/Meissa-4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CYX1998/Meissa-4B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/CYX1998/Meissa-4B
- SGLang
How to use CYX1998/Meissa-4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "CYX1998/Meissa-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CYX1998/Meissa-4B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "CYX1998/Meissa-4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "CYX1998/Meissa-4B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use CYX1998/Meissa-4B with Docker Model Runner:
docker model run hf.co/CYX1998/Meissa-4B
Meissa-4B: Multi-modal Medical Agentic Intelligence
Meissa-4B is a lightweight 4B-parameter medical multi-modal LLM with full agentic capability. Instead of relying on proprietary frontier models (GPT, Gemini), Meissa brings tool calling, multi-agent collaboration, and clinical simulation offline by distilling structured trajectories from frontier agent systems into a compact vision-language model.
Key Features
- 4 agentic paradigms in a single model: continuous tool calling, interleaved thinking with images, multi-agent collaboration, and multi-turn clinical simulation
- Offline deployment: runs entirely locally with vLLM, no API calls needed
- Tool calling: native
<tool_call>support via Hermes format, compatible with vLLM's tool-call parser - Thinking: built-in
<think>chain-of-thought reasoning before actions
Model Details
| Base model | Qwen3-VL-4B-Instruct |
| Architecture | Qwen3VLForConditionalGeneration |
| Parameters | 4B |
| Precision | bfloat16 |
| Training method | LoRA SFT (rank=32, alpha=64), merged |
| Training data | 43,210 medical agentic trajectories (open subset) |
| Training framework | LLaMA-Factory |
| Context length | 8,192 tokens (training) |
| Tested environment | transformers 4.57.0, vLLM 0.11.0 |
Quickstart
Load with Transformers
from transformers import AutoModelForImageTextToText, AutoProcessor
import torch
model = AutoModelForImageTextToText.from_pretrained(
"CYX1998/Meissa-4B",
torch_dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained("CYX1998/Meissa-4B")
Serve with vLLM (Recommended)
For agentic use cases, serve Meissa with vLLM to enable tool calling:
python -m vllm.entrypoints.openai.api_server \
--model CYX1998/Meissa-4B \
--port 8877 \
--max-model-len 8192 \
--gpu-memory-utilization 0.85 \
--dtype bfloat16 \
--enable-auto-tool-choice \
--tool-call-parser hermes
# Set the endpoint
export OPENAI_BASE_URL="http://127.0.0.1:8877/v1"
export OPENAI_API_KEY="dummy"
The --enable-auto-tool-choice --tool-call-parser hermes flags are required for tool calling.
Example: Tool Calling
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8877/v1", api_key="dummy")
tools = [{
"type": "function",
"function": {
"name": "ChestXRayClassifier",
"description": "Classify pathologies in a chest X-ray image.",
"parameters": {
"type": "object",
"properties": {
"image_path": {"type": "string", "description": "Path to the chest X-ray image"}
},
"required": ["image_path"]
}
}
}]
response = client.chat.completions.create(
model="CYX1998/Meissa-4B",
messages=[{"role": "user", "content": "Analyze this chest X-ray: /path/to/cxr.jpg"}],
tools=tools,
)
print(response.choices[0].message)
Supported Agentic Frameworks
| Framework | Description | Tools |
|---|---|---|
| I: Continuous Tool Calling | Sequential tool use for radiology analysis | 8 chest X-ray tools (classifier, report generator, VQA, segmentation, etc.) |
| II: Interleaved Thinking with Images | Iterative visual reasoning with zoom | ZoomInSubfigure, SegmentRegion, Terminate |
| III: Multi-Agent Collaboration | Multi-agent medical consultation | AssessDifficulty, RecruitExperts, ConsultExperts, FacilitateDebate |
| IV: Clinical Simulation | Multi-turn doctor-patient interaction | RequestPhysicalExam, RequestTest, Terminate |
Training Data
Trained on 43,210 medical agentic SFT trajectories distilled from Gemini:
| Framework | Samples | Source Datasets |
|---|---|---|
| I: Continuous Tool Calling | 4,898 | MIMIC-CXR-VQA |
| II: Interleaved Thinking | 15,211 | PathVQA, MIMIC-CXR-VQA, SLAKE, VQA-RAD |
| III: Multi-Agent Collaboration | 15,427 | MIMIC-CXR-VQA, PathVQA, MedQA, PubMedQA |
| IV: Clinical Simulation | 7,674 | MedQA, MIMIC-CXR |
The open-source subset (25,018 samples) is available at CYX1998/Meissa-SFT.
Evaluation
Meissa-4B matches or exceeds GPT-4o and Gemini-3-flash on multiple medical agentic benchmarks while being deployable offline on a single GPU. See our paper for full results.
Limitations
- Not for clinical use: This model is a research prototype and should NOT be used for real clinical decision-making.
- English only: Trained and evaluated on English medical data only.
- Domain scope: Primarily trained on radiology, pathology, and general clinical reasoning. Performance on other medical specialties may vary.
- Hallucination: Like all LLMs, Meissa may generate plausible but incorrect medical information.
Citation
@inproceedings{chen2026meissa,
title={Meissa: Multi-modal Medical Agentic Intelligence},
author={Chen, Yixiong and Bai, Xinyi and Pan, Yue and Zhou, Zongwei and Yuille, Alan},
journal={arXiv preprint arXiv:2603.09018},
year={2026}
}
License
This model is released under Apache 2.0. The base model Qwen3-VL-4B-Instruct is subject to the Qwen License.
- Downloads last month
- 322