Mobile-O-Models
Collection
This collection contains all models of Mobile-O project β’ 3 items β’ Updated β’ 2
How to use Amshaker/Mobile-O-0.5B-iOS with MLX:
# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
# Load the model
model, processor = load("Amshaker/Mobile-O-0.5B-iOS")
config = load_config("Amshaker/Mobile-O-0.5B-iOS")
# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."
# Apply chat template
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=1
)
# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)This repository contains the optimized MLX and CoreML model components of Mobile-O-0.5B for native iOS deployment. These components power the Mobile-O iOS app, enabling fully on-device multimodal understanding and image generation with no cloud dependency.
| Spec | Detail |
|---|---|
| β‘ Image Generation | ~3 seconds |
| ποΈ Visual Understanding | ~0.4 seconds |
| πΎ Memory Footprint | < 2GB |
| π± Compatible Devices | iPhone (A17+ / M-series) |
| π Cloud Dependency | None β fully on-device |
This repo includes optimized model components in both MLX and CoreML formats:
| Component | Format | Description |
|---|---|---|
| VLM | MLX / CoreML | FastVLM-0.5B (FastViT + Qwen2-0.5B) |
| Diffusion Decoder | MLX / CoreML | SANA-600M-512 (Linear DiT + VAE) |
| MCP | MLX / CoreML | Mobile Conditioning Projector (~2.4M params) |
Mobile-O-App/ directorygit clone https://github.com/Amshaker/Mobile-O.git
cd Mobile-O/Mobile-O-App
Refer to the Mobile-O-App README for detailed setup instructions.
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="Amshaker/Mobile-O-0.5B-iOS",
repo_type="model",
local_dir="ios_models"
)
| Resource | Link |
|---|---|
| π€ Mobile-O-0.5B | PyTorch Model |
| π€ Mobile-O-1.5B | PyTorch Model |
| π± iOS App Source Code | Mobile-O-App |
| π€ Training Datasets | Collection |
@article{shaker2026mobileo,
title={Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device},
author={Shaker, Abdelrahman and Heakl, Ahmed and Muhammad, Jaseel and Thawkar, Ritesh and Thawakar, Omkar and Li, Senmao and Cholakkal, Hisham and Reid, Ian and Xing, Eric P. and Khan, Salman and Khan, Fahad Shahbaz},
journal={arXiv preprint arXiv:2602.20161},
year={2026}
}
Released under CC BY-NC 4.0. For research purposes only.
Quantized