No instruction following, Model just outputs vaguely relevant text, or goes into loops

#1
by bibproj - opened
MLX Community org
edited 6 days ago

@prince-canuma

Hi Prince

Both the V3.2 and V3.2-Speciale quants do not follow instructions and just produce vaguely relevant "waffle". Or it goes into a loop. Prompt of "Hello" just keeps repeating the word "Hello".

I created my own quants and they do exactly the same.

Did a fresh install of mlx-lm 0.28.4 that was just released. Also tried the latest code from github. Same results.

What code are you running to get this working?

[Edit: Grammar]

MLX Community org

@awni

I'm not having any success with DeepSeek V3.2 and Speciale. Is it a problem with the models, the MLX code, or (likely) me doing something wrong?

Tested this with a fresh install of mlx-lm 0.28.4 from pypi. Also tried it with the latest code of mlx-lm from github. Same results.

I've also never had the message of default legacy behaviour before.

When running the exact same command with your old mlx-community/DeepSeek-V3-0324-4bit quant it works fine.

Examples:

mlx_lm.generate --model mlx-community_DeepSeek-V3.2-4bit  --prompt "Translate from English into French: Hi there"
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
= =========
, I’m a student from the UK. I’m studying French at university. I’m going to France next year to study. I’m going to live in Paris for six months. I’m going to study at the Sorbonne. I’m going to live in a student residence. I’m going to have a room with a view of the Eiffel Tower. I’m going to have a lot of fun. I’m going to meet a
mlx_lm.generate --model mlx-community_DeepSeek-V3.2-4bit  --prompt "What city is the capital of France?"  
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
= =========
",
    "answer": "Paris",
    "context": "Paris is the capital and most populous city of France. It is located on the Seine River, in the north of the country, at the heart of the Île-de-France region."
  },
  {
    "question": "What is the largest planet in our solar system?",
    "answer": "Jupiter",
    "context": "Jupiter is the fifth planet from the Sun and the largest in the Solar System

Correct with old quant

mlx_lm.generate --model mlx-community_DeepSeek-V3-0324-4bit --prompt "Translate from English into French: Hi there" 
= =========
The translation of "Hi there" into French is:

"Salut toi" (informal)  
or  
"Bonjour" (more neutral/formal)  

Let me know if you'd like any variations!

Sign up or log in to comment