Qwen3-VL (阿里巴巴通義千問 3 視覺語言) is the most powerful vision-language model in the Qwen family to date.
In this generation, there are improvements to the model in many areas: its understanding and generating text, perceiving and reasoning about visual content, supporting longer context lengths, understanding spatial relationships and dynamic videos, or interacting with AI agents — Qwen3-VL shows clear and significant progress in every area.
ollama run qwen3-vl
6.1GB
ollama run qwen3-vl:4b
3.3GB