Reasoning and Multilingual Performance
I've been evaluating Gemma-2-9b-it for production deployment and have some questions about its capabilities:
Reasoning Performance: The model card mentions strong reasoning capabilities. Has anyone benchmarked this against GPT-3.5 or similar models on complex reasoning tasks?
Multilingual Support: While English is primary, the model was trained on 8 languages. What's the quality degradation for non-English languages in production use?
Temperature Settings: The documentation recommends lower temperatures (0.3) compared to other models. What's the reasoning behind this, and how does it affect output diversity?
Context Window Management: With the standard context length, what are best practices for handling longer conversations or documents?
Fine-tuning Results: Has anyone successfully fine-tuned this model for domain-specific tasks? What were the results?
Looking forward to hearing from the community about real-world deployment experiences.