xlm-roberta-persian-meter
This model is trained to classify the meter (baḥr) of a hemistich of classical Persian poetry. It is fine-tuned from XLM-RoBERTa and trained on a dataset drawn from Ganjoor.
The training data, scripts, etc. can all be found on GitHub.
Performance
So far, I have tested the performance of this model only on a randomly selected
sample (10%) from the training dataset. (See eval.py in the GitHub repo.)
While I understand that this is not ideal, and while I plan to carry out a more
rigorous evaluation, the results are promising:
Accuracy: 0.9801
F1: 0.9792
Precision: 0.9789
Recall: 0.9801
Loss: 0.0996
The model seems to perform nicely on all common Persian meters. Some rare meters remain underrepresented in the training data.
Training data
The training examples are hemistichs from classical Persian poems, all taken from the Ganjoor corpus, where poems are already tagged with their meters. Currently, the dataset contains 277,248 unique hemistichs, representing the following works:
- The complete ghazals of Ṣāʾib Tabrīzī
- The complete ghazals of Ḥāfiẓ
- The complete ghazals of Saʿdī
- All the ghazals in the Dīvān-i Shams of Rūmī (excepting some with obscure meters)
- The first daftar of the Maṡnavī of Rūmī
- Selections from the Shāhnāma of Firdawsī
- Four of the five poems in the Khamsa of Niẓāmī:
- Laylī u Majnūn
- Khusraw va Shīrīn
- Haft paykar
- Makhzan al-asrār
Label mapping
The model predicts the meter of a given hemistich as an integer, which is mapped
to the Persian name of a meter. Fortunately, it is trivial to reverse this
mapping: see label_map.json, included both here and in the GitHub repo.
Roadmap
- Improve this readme/model card, adding e.g. a usage example.
- Construct an evaluation dataset entirely separate from the training set.
- Continue to improve the model by adding more training data, adjusting the training parameters, etc.
- Deploy an inference server and a web front end, so that a user could paste a few lines of a classical Persian poem and have the predicted meter returned.
- Downloads last month
- 7
Model tree for katomyomachia/xlm-roberta-persian-meter
Base model
FacebookAI/xlm-roberta-base