xiaomi

Xiaomi: MiMo-V2-Omni

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

xiaomi/mimo-v2-omni

Context Size

262.144K

Input Price

2,460 Ks/M

Output Price

12,300 Ks/M

Architecture

Text

Audio

Image

Video

Supported Parameters

frequency_penaltyinclude_reasoningmax_tokenspresence_penaltyreasoningresponse_formatstoptemperaturetool_choicetoolstop_p

Details

TokenizerOther

Max Completion65,536 tokens

Provider Context262.144K tokens

ModeratedNo

Command Palette

Xiaomi: MiMo-V2-Omni

Architecture

Supported Parameters

Details