Qwen
/

Qwen3-1.7B-GGUF

Text Generation

Model card Files Files and versions

feihu.hf commited on May 5

Commit

39d9315

·

1 Parent(s): 30c8f30

update README

Files changed (1) hide show

README.md +53 -0

README.md CHANGED Viewed

@@ -31,6 +31,59 @@ Qwen3 is the latest generation of large language models in Qwen series, offering
 - Number of Layers: 28
 - Number of Attention Heads (GQA): 16 for Q and 8 for KV
 ## Best Practices

 - Number of Layers: 28
 - Number of Attention Heads (GQA): 16 for Q and 8 for KV
+- Context Length: 32,768.
+- Quantization: q4_K_M, q5_0, q5_K_M, q6_K, q8_0
+For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
+## Quickstart
+### llama.cpp
+Check out our [llama.cpp documentation](https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html) for more usage guide.
+We advise you to clone [`llama.cpp`](https://github.com/ggerganov/llama.cpp) and install it following the official guide. We follow the latest version of llama.cpp.
+In the following demonstration, we assume that you are running commands under the repository `llama.cpp`.
+```shell
+./llama-cli -hf Qwen/Qwen3-1.7B:Q8_0 --jinja --color -ngl 99 -fa -sm row --temp 0.6 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5 -c 40960 -n 32768 --no-context-shift
+```
+### ollama
+Check out our [ollama documentation](https://qwen.readthedocs.io/en/latest/run_locally/ollama.html) for more usage guide.
+You can run Qwen3 with one command:
+```shell
+ollama run hf.co/Qwen/Qwen3-1.7B-GGUF:Q8_0
+```
+## Switching Between Thinking and Non-Thinking Mode
+You can add `/think` and `/no_think` to user prompts or system messages to switch the model's thinking mode from turn to turn. The model will follow the most recent instruction in multi-turn conversations.
+Here is an example of multi-turn conversation:
+```
+> Who are you /no_think
+<think>
+</think>
+I am Qwen, a large-scale language model developed by Alibaba Cloud. [...]
+> How many 'r's are in 'strawberries'? /think
+<think>
+Okay, let's see. The user is asking how many times the letter 'r' appears in the word "strawberries". [...]
+</think>
+The word strawberries contains 3 instances of the letter r. [...]
+```
 ## Best Practices