feihu.hf commited on
Commit
39d9315
·
1 Parent(s): 30c8f30

update README

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md CHANGED
@@ -31,6 +31,59 @@ Qwen3 is the latest generation of large language models in Qwen series, offering
31
  - Number of Layers: 28
32
  - Number of Attention Heads (GQA): 16 for Q and 8 for KV
33
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## Best Practices
36
 
 
31
  - Number of Layers: 28
32
  - Number of Attention Heads (GQA): 16 for Q and 8 for KV
33
 
34
+ - Context Length: 32,768.
35
+ - Quantization: q4_K_M, q5_0, q5_K_M, q6_K, q8_0
36
+
37
+ For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
38
+
39
+ ## Quickstart
40
+
41
+ ### llama.cpp
42
+
43
+ Check out our [llama.cpp documentation](https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html) for more usage guide.
44
+
45
+ We advise you to clone [`llama.cpp`](https://github.com/ggerganov/llama.cpp) and install it following the official guide. We follow the latest version of llama.cpp.
46
+ In the following demonstration, we assume that you are running commands under the repository `llama.cpp`.
47
+
48
+ ```shell
49
+ ./llama-cli -hf Qwen/Qwen3-1.7B:Q8_0 --jinja --color -ngl 99 -fa -sm row --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5 -c 40960 -n 32768 --no-context-shift
50
+ ```
51
+
52
+ ### ollama
53
+
54
+ Check out our [ollama documentation](https://qwen.readthedocs.io/en/latest/run_locally/ollama.html) for more usage guide.
55
+
56
+ You can run Qwen3 with one command:
57
+
58
+ ```shell
59
+ ollama run hf.co/Qwen/Qwen3-1.7B-GGUF:Q8_0
60
+ ```
61
+
62
+ ## Switching Between Thinking and Non-Thinking Mode
63
+
64
+ You can add `/think` and `/no_think` to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.
65
+
66
+ Here is an example of multi-turn conversation:
67
+
68
+ ```
69
+ > Who are you /no_think
70
+
71
+ <think>
72
+
73
+ </think>
74
+
75
+ I am Qwen, a large-scale language model developed by Alibaba Cloud. [...]
76
+
77
+ > How many 'r's are in 'strawberries'? /think
78
+
79
+ <think>
80
+ Okay, let's see. The user is asking how many times the letter 'r' appears in the word "strawberries". [...]
81
+ </think>
82
+
83
+ The word strawberries contains 3 instances of the letter r. [...]
84
+ ```
85
+
86
+
87
 
88
  ## Best Practices
89