xlm-roberta-persian-meter

This model is trained to classify the meter (baḥr) of a hemistich of classical Persian poetry. It is fine-tuned from XLM-RoBERTa and trained on a dataset drawn from Ganjoor.

The training data, scripts, etc. can all be found on GitHub.

Performance

So far, I have tested the performance of this model only on a randomly selected sample (10%) from the training dataset. (See eval.py in the GitHub repo.) While I understand that this is not ideal, and while I plan to carry out a more rigorous evaluation, the results are promising:

Accuracy: 0.9801
F1: 0.9792
Precision: 0.9789
Recall: 0.9801
Loss: 0.0996

The model seems to perform nicely on all common Persian meters. Some rare meters remain underrepresented in the training data.

Training data

The training examples are hemistichs from classical Persian poems, all taken from the Ganjoor corpus, where poems are already tagged with their meters. Currently, the dataset contains 277,248 unique hemistichs, representing the following works:

The complete ghazals of Ṣāʾib Tabrīzī
The complete ghazals of Ḥāfiẓ
The complete ghazals of Saʿdī
All the ghazals in the Dīvān-i Shams of Rūmī (excepting some with obscure meters)
The first daftar of the Maṡnavī of Rūmī
Selections from the Shāhnāma of Firdawsī
Four of the five poems in the Khamsa of Niẓāmī:
- Laylī u Majnūn
- Khusraw va Shīrīn
- Haft paykar
- Makhzan al-asrār

Label mapping

The model predicts the meter of a given hemistich as an integer, which is mapped to the Persian name of a meter. Fortunately, it is trivial to reverse this mapping: see label_map.json, included both here and in the GitHub repo.

Roadmap

Improve this readme/model card, adding e.g. a usage example.
Construct an evaluation dataset entirely separate from the training set.
Continue to improve the model by adding more training data, adjusting the training parameters, etc.
Deploy an inference server and a web front end, so that a user could paste a few lines of a classical Persian poem and have the predicted meter returned.

Downloads last month: 7

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for katomyomachia/xlm-roberta-persian-meter

Base model

FacebookAI/xlm-roberta-base

Finetuned

(3705)

this model