--- license: mpl-2.0 language: - si tags: - si - lk - dialog - male - tts - uom - vits --- # SinhalaVITS-TTS-M2 - Male Voice 02 This is a specially trained Coqui TTS [Coqui TTS](https://github.com/coqui-ai/TTS) model specially for **Sinhala**, developed by **Dialog Axiata PLC** and the **Dialog – UoM Research Lab**. We trained it on a custom recorded dataset adapting a strong male voice. --- ## Features - Model architecture: VITS - Language: Sinhala (si-lk) - Training Sampling rate: 22050 Hz - Framework: Coqui TTS --- ## Dataset - Voice: Male (Roshan) - Recording Sampling Rate: 44100Hz - No. of Clips: 1096 - Total Length: >100mins (~2 hrs.) ## Training Specs - Hardware: NVidia GeForce GTX1060 6GB GPU - Training Time: **~100 hours** - Global Steps: 210,000 - Batch Size: 16 - Epochs: - Loss Convergence: Stable mel + KL losses ## Installation You can run this model locally using the included Flask-based inference server. This server will automatically use CUDA if it's available on your system. 1. First install requirements. ```bash pip install -r requirements.txt ``` 2. Then start the API server ```bash python inference_M2.py ``` _This starts a Flask server at http://localhost:8000._ 3. Then you can use curl or any HTTP client (like Postman) to send Sinhala text to the server. The API endpoint is '/tts' ```bash curl -X POST http://localhost:8000/tts \ -H "Content-Type: application/json" \ -d '{"text": "ආයුබෝවන්"}' \ --output output.wav ``` 4. This API will, * Convert Sinhala text → Romanized Sinhala (via romanizer.py) * Generate speech using the VITS model * Return output.wav (Sinhala voice) ## File Structure ```bash SinhalaVITS-TTS-M2/ ├── Roshan_270000.pth # Fine-tuned VITS checkpoint ├── Roshan_config.json # Model configuration ├── romanizer.py # Sinhala → Roman converter ├── inference_M2.py # Flask-based inference server ├── requirements.txt # Required dependencies ├── LICENSE # MPL-2.0 license └── README.md # This file ``` ## Contributors * Kasun Ranasinghe (Dialog-UoM Reasearch Lab) * Randika Silva (Dialog Axiata PLC) * Vipula Wakkumbura (Dialog-UoM Reasearch Lab) ## Acknowledgements * PathNirvana (https://github.com/pathnirvana/coqui-tts) – Previous work in Sinhala TTS * Coqui TTS – Open-source TTS framework enabling the foundation of this work * Sinhala dataset contributor (Roshan Ranasinghe) – for providing professional, quality speech samples ## License This model is released under the MPL-2.0 license.