Back to all models
xiaomi
Xiaomi: MiMo-V2-Omni
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities. 256K context window.
xiaomi/mimo-v2-omni
Context Size
262.144K
Input Price
2,460 Ks/M
Output Price
12,300 Ks/M
Architecture
Text
Audio
Image
Video
Supported Parameters
frequency_penaltyinclude_reasoningmax_tokenspresence_penaltyreasoningresponse_formatstoptemperaturetool_choicetoolstop_p
Details
TokenizerOther
Max Completion65,536 tokens
Provider Context262.144K tokens
ModeratedNo