通義千問 | ThinDeep

qwen3-vl

Qwen3-VL (阿里巴巴通義千問 3 視覺語言) is the most powerful vision-language model in the Qwen family to date.

In this generation, there are improvements to the model in many areas: its understanding and generating text, perceiving and reasoning about visual content, supporting longer context lengths, understanding spatial relationships and dynamic videos, or interacting with AI agents — Qwen3-VL shows clear and significant progress in every area.

ollama run qwen3-vl
6.1GB

ollama run qwen3-vl:4b

3.3GB