SargalaySargalay

Command Palette

Search for a command to run...

Back to all models
xiaomi

Xiaomi: MiMo-V2-Omni

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities. 256K context window.

xiaomi/mimo-v2-omni

Context Size

262.144K

Input Price

2,460 Ks/M

Output Price

12,300 Ks/M


Architecture

Text
Audio
Image
Video

Supported Parameters

frequency_penaltyinclude_reasoningmax_tokenspresence_penaltyreasoningresponse_formatstoptemperaturetool_choicetoolstop_p

Details

TokenizerOther
Max Completion65,536 tokens
Provider Context262.144K tokens
ModeratedNo