SigLino-MoE-0.15-0.6B

Accepted at CVPR 2026

This work stems from the CVPR 2026 AMoE paper, which designs and applies distillation into a Mixture-of-Experts (MoE) vision architecture. We have chosen the name SigLino for better clarity (SigLIP2 + DINOv3).

Ultra-sparse MoE variant of SigLino with top-2 out of 28 experts routing. 0.15B active parameters, 0.6B total.

Part of the SigLino model family.

Usage

import torch
from PIL import Image
from transformers import AutoModel, AutoImageProcessor

model_id = "tiiuae/siglino-moe-0.15B"
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).to("cuda", dtype=torch.bfloat16)
processor = AutoImageProcessor.from_pretrained(model_id, trust_remote_code=True)

image = Image.open("image.jpg").convert("RGB")
inputs = processor(image, return_tensors="pt").to("cuda")
inputs["pixel_values"] = inputs["pixel_values"].to(torch.bfloat16)

with torch.no_grad():
    outputs = model(**inputs)

# Options: 'siglino' (768d), 'siglip2' (1152d), 'dinov3' (1024d)
patch_features = outputs["patch_features"]["siglino"]         # (Batch, Tokens, 768)
summary_features = outputs["summary_features"]["siglip2"]  # (Batch, 1152)

Model Details

Property	Value
Architecture	MoE (top-2/28 experts)
Active Parameters	0.15B
Total Parameters	0.6B
Layers	18
Hidden Dim	768
Patch Size	16x16
Teachers	DINOv3, SigLIP2

Results (512x512, ensemble features)

Task	Metric	Score
kNN (ImageNet)	Acc	85.0
kNN (6-dataset avg)	Acc	89.8
Zero-shot cls (ImageNet)	Acc	78.8
Flickr30K I2T	R@1	92.9
MSCOCO I2T	R@1	71.1
Pascal VOC (1024)	mIoU	88.1
Cityscapes (1024)	mIoU	63.6

Citation

@article{chaybouti2025amoe,
  title={AMoE: Agglomerative Mixture-of-Experts Vision Foundation Models},
  author={Chaybouti, Sofian and Narayan, Sanath and Dahou, Yasser and Le Khac, Phuc H. and Singh, Ankit and Huynh, Ngoc Dung and Para, Wamiq Reyaz and Kuehne, Hilde and Hacid, Hakim},
  journal={arXiv preprint arXiv:2512.20157},
  year={2025}
}

Downloads last month: 73

Safetensors

Model size

0.5B params

Tensor type

F32

Collection including tiiuae/siglino-moe-0.15-0.6B

SigLino: Vision Foundation Models (SigLIP2 + DINOv3)

Collection

Vision encoders distilled from DINOv3 and SigLIP2 (MoE & Dense). Stems from the CVPR 2026 AMoE paper. • 5 items • Updated 12 days ago • 6

Paper for tiiuae/siglino-moe-0.15-0.6B

AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model

Paper • 2512.20157 • Published Dec 23, 2025 • 4