inception

Inception: Mercury

Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the [blog post] (https://www.inceptionlabs.ai/blog/introducing-mercury) here.

inception/mercury

Context Size

128K

Input Price

1,537.5 Ks/M

Output Price

4,612.5 Ks/M

Architecture

Text

Supported Parameters

max_tokensresponse_formatstopstructured_outputstemperaturetool_choicetools

Details

TokenizerOther

Max Completion32,000 tokens

Provider Context128K tokens

ModeratedNo

Command Palette

Inception: Mercury

Architecture

Supported Parameters

Details