Discover Qwen2.5 on Hugging Face and ModelScope
Visit our Hugging Face or ModelScope organization, search for checkpoints starting with Qwen2.5-, or explore the Qwen2.5 collection to find everything you need. Enjoy the latest advancements in AI-powered language modeling!
Comprehensive Documentation for Qwen2.5
To get the most out of Qwen2.5, refer to our detailed documentation available in [English (EN)] and [Chinese (ZH)]. The documentation covers the following sections:
- Quickstart – Basic usage and demonstrations to get started quickly.
- Inference – Guidance for inference using Transformers, including batch inference and streaming.
- Run Locally – Step-by-step instructions for running Qwen2.5 locally on CPU and GPU using frameworks like llama.cpp and Ollama.
- Deployment – Learn how to deploy Qwen2.5 for large-scale inference with frameworks like vLLM and TGI.
- Quantization – Detailed practices for quantizing LLMs with GPTQ, AWQ, and creating high-quality GGUF files.
- Training – Instructions for fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) using frameworks like Axolotl and LLaMA-Factory.
- Framework Integration – How to use Qwen2.5 with Retrieval-Augmented Generation (RAG), AI Agents, and other applications.
- Benchmarking – Performance metrics on inference speed and memory usage (available for Qwen2.5).
Introduction to Qwen 2.5 APK
Following the success of Qwen2, developers worldwide have utilized the model to build advanced AI applications. Thanks to valuable feedback, we have improved and refined the model significantly.
We are thrilled to introduce Qwen2.5, a more intelligent, efficient, and capable language model designed to enhance performance across multiple domains.
Key Features of Qwen2.5 APK
- Versatile Model Sizes – Available in 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B sizes with base and instruct variants.
- Pretrained on a Massive Dataset – Up to 18 trillion tokens, ensuring extensive knowledge and adaptability.
- Improved Instruction Following – Generates long-form text (8K+ tokens), understands structured data (e.g., tables), and produces structured outputs (especially JSON).
- Enhanced System Prompt Handling – More resilient to varied system prompts, improving role-play execution and context setting in chatbots.
- Extended Context Window – Supports up to 128K tokens for input and generates up to 8K tokens in a single response.
- Multilingual Capabilities – Supports over 29 languages, including:
🇨🇳 Chinese | 🇬🇧 English | 🇫🇷 French | 🇪🇸 Spanish | 🇵🇹 Portuguese | 🇩🇪 German | 🇮🇹 Italian | 🇷🇺 Russian | 🇯🇵 Japanese | 🇰🇷 Korean | 🇻🇳 Vietnamese | 🇹🇭 Thai | 🇦🇪 Arabic and more!
Latest News & Updates
September 19, 2024 – Released Qwen2.5, introducing new model sizes: 3B, 14B, and 32B for greater flexibility. Read the full announcement →
June 6, 2024 – Launched the Qwen2 series. Check out our blog →
March 28, 2024 – Released Qwen1.5-MoE-A2.7B, the first Mixture of Experts (MoE) model in the Qwen family, supported by HF transformers and vLLM. Support for llama.cpp, mlx-lm, and more is coming soon!
February 5, 2024 – Released Qwen1.5 series.
Get Started with Qwen2.5 Today!
Explore the latest advancements in AI language models with Qwen2.5. Whether you're a developer, researcher, or business looking for cutting-edge AI capabilities, Qwen2.5 offers the tools and flexibility you need.
- Visit Qwen2.5 on Hugging Face
- Explore Qwen2.5 on ModelScope
For more information about optimizing your AI models and deployment strategies, check out The Insider’s Views.