head_dim in config.json is incorrect?
#36
by
zackangelo
- opened
Shouldn't head_dim be equal to hidden_size / num_attention_heads?
Maybe I'm missing something but that would mean head_dim == 64, right?
zackangelo
changed discussion title from
`head_dim` in config.json is incorrect?
to head_dim in config.json is incorrect?
I noticed the same issue. The kv_channels (or head_dim) used during training (with Megatron Core) DOES NOTE match the head_dim in config.json. (64 vs 128)
Megatron-Core automatically computes kv_channels as hidden_size // num_attention_heads when it's not explicitly specified:
https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/training/arguments.py#L722-L724