File size: 3,578 Bytes
684cf0b bab59f8 684cf0b 5635647 684cf0b 0df1cf3 684cf0b bab59f8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
---
library_name: transformers
base_model: andresnowak/Qwen3-0.6B-instruction-finetuned
tags:
- unsloth
- generated_from_trainer
model-index:
- name: Qwen3-0.6B-instruction-finetuned-MCQA
results: []
datasets:
- andresnowak/MNLP_MCQA_dataset
- andresnowak/MNLP_M2_mcqa_dataset
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# Qwen3-0.6B-instruction-finetuned-MCQA
This model is a fine-tuned version of [andresnowak/Qwen3-0.6B-instruction-finetuned](https://huggingface.co/andresnowak/Qwen3-0.6B-instruction-finetuned) on an unknown dataset.
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
This model was trained with the same methodology as [https://huggingface.co/andresnowak/MNLP_M2_mcqa_model](MNLP_M2_mcqa_model), where we only do a feedforward on the prompt
we get the last logit token and we do cross entropy loss on that token and the 4 options of the question (so the idea is that we want to maximize the likelihood of the model
of printing the correct letter to the question)
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 32
- total_train_batch_size: 64
- optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 2
### Training results
The model was evaluated on a suite of Multiple Choice Question Answering (MCQA) benchmarks (on its validation and test sets repsectively for each one),
and NLP4education is only the approximated 1000 question and answers given to use.
The performance on the MCQA benchmarks is:
| Benchmark | Accuracy (Acc) | Normalized Accuracy (Acc Norm) |
| :----------------- | :------------- | :----------------------------- |
| ARC Challenge | 61.39% | 59.96% |
| ARC Easy | 79.43% | 76.51% |
| GPQA | 32.59% | 28.57% |
| Math QA | 24.69% | 24.80% |
| MCQA Evals | 41.82% | 39.22% |
| MMLU | 52.11% | 52.11% |
| MMLU Pro | 15.41% | 14.31% |
| MuSR | 51.06% | 48.41% |
| NLP4Education | 44.14% | 42.73% |
| **Overall** | **44.74%** | **42.96%** |
The tests where done with this prompt (And only MusR used a different one where you add the Question: and Narrative: )
```
This question assesses challenging STEM problems as found on graduate standardized tests. Carefully evaluate the options and select the correct answer.
---
[Insert Question Here]
---
[Insert Choices Here, e.g.:
A. Option 1
B. Option 2
C. Option 3
D. Option 4]
---
Your response should include the letter and the exact text of the correct choice.
Example: B. Entropy increases.
Answer:
```
And the teseting was done on ``` [Letter]. [Text answer]```
### Framework versions
- Transformers 4.51.3
- Pytorch 2.5.1+cu121
- Datasets 3.6.0
- Tokenizers 0.21.0 |