SargalaySargalay

Command Palette

Search for a command to run...

Back to all models
bytedance

ByteDance: UI-TARS 7B

UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement learning-based reasoning, enabling robust action planning and execution across virtual interfaces. This model achieves state-of-the-art results on a range of interactive and grounding benchmarks, including OSworld, WebVoyager, AndroidWorld, and ScreenSpot. It also demonstrates perfect task completion across diverse Poki games and outperforms prior models in Minecraft agent tasks. UI-TARS-1.5 supports thought decomposition during inference and shows strong scaling across variants, with the 1.5 version notably exceeding the performance of earlier 72B and 7B checkpoints.

bytedance/ui-tars-1.5-7b

Context Size

128K

Input Price

615 Ks/M

Output Price

1,230 Ks/M


Architecture

Image
Text

Supported Parameters

frequency_penaltylogit_biasmax_tokenspresence_penaltyrepetition_penaltyseedstoptemperaturetop_ktop_p

Details

TokenizerOther
Max Completion2,048 tokens
Provider Context128K tokens
ModeratedNo