head_dim in config.json is incorrect?

#36
by zackangelo - opened

Shouldn't head_dim be equal to hidden_size / num_attention_heads?

Maybe I'm missing something but that would mean head_dim == 64, right?

zackangelo changed discussion title from `head_dim` in config.json is incorrect? to head_dim in config.json is incorrect?

I noticed the same issue. The kv_channels (or head_dim) used during training (with Megatron Core) DOES NOTE match the head_dim in config.json. (64 vs 128)

Megatron-Core automatically computes kv_channels as hidden_size // num_attention_heads when it's not explicitly specified:
https://github.com/NVIDIA/Megatron-LM/blob/main/megatron/training/arguments.py#L722-L724

Sign up or log in to comment