LusakaLang MultiβTask Model (Language + Sentiment + Topic)
Model Description
LusakaLangβMultiTask is a unified transformer model built on top of bert-base-multilingual-cased, designed to perform three tasks simultaneously:
- Language Identification
- Sentiment Analysis
- Topic Classification
The model integrates three fineβtuned LusakaLang checkpoints:
Kelvinmbewe/mbert_Lusaka_Language_AnalysisKelvinmbewe/mbert_LusakaLang_Sentiment_AnalysisKelvinmbewe/mbert_LusakaLang_Topic
All tasks share a single mBERT encoder, with three independent classifier heads.
This architecture improves efficiency, reduces memory footprint, and enables consistent predictions across tasks.
Why This Model Matters
Zambian communication is multilingual, fluid, and highly contextβdependent.
A single message may include:
- English
- Bemba
- Nyanja
- Slang
- Codeβswitching
- Cultural idioms
- Indirect emotional cues
This model is designed specifically for that environment.
It excels at:
- Identifying the dominant language or codeβswitching
- Detecting sentiment polarity in culturally nuanced text
- Classifying topics such as:
- driver behaviour
- payment issues
- app performance
- customer support
- ride availability
Training Architecture
The model uses:
- Shared Encoder: mBERT
- Head 1: Language classifier
- Head 2: Sentiment classifier
- Head 3: Topic classifier
This multiβtask setup improves generalization and reduces inference cost.
Performance Summary
Language Identification
| Metric | Score |
|---|---|
| Accuracy | 0.97 |
| MacroβF1 | 0.96 |
Sentiment Analysis (Epoch 30 β Final Checkpoint)
| Metric | Score |
|---|---|
| Accuracy | 0.9322 |
| MacroβF1 | 0.9216 |
| Negative F1 | 0.8649 |
| Neutral F1 | 0.95 |
| Positive F1 | 0.95 |
Topic Classification
| Metric | Score |
|---|---|
| Accuracy | 0.91 |
| MacroβF1 | 0.90 |
How to Use This Model
Load the MultiβTask Model
from transformers import AutoTokenizer
import torch
tokenizer = AutoTokenizer.from_pretrained("Kelvinmbewe/LusakaLang-MultiTask")
model = torch.load("Kelvinmbewe/LusakaLang-MultiTask/model.pt")
model.eval()
predict_language([
"Ndeumfwa bwino lelo",
"Galimoto inachedwa koma driver anali bwino",
"The service was terrible today"
])
predict_sentiment([
"Driver was rude and unprofessional",
"Ndimvela bwino lelo",
"The ride was okay, nothing special"
])
predict_topic([
"Payment failed but money was deducted",
"Support siyankhapo, waited long",
"Driver was over speeding"
])
@model{LusakaLangMultiTask,
author = {Kelvin Mbewe},
title = {LusakaLang Multi-Task Model},
year = 2025,
url = {https://huggingface.co/Kelvinmbewe/LusakaLang-MultiTask}
}
ββββββββββββββββββββββββββββββββββββββββ
β Input Text (Any Language) β
ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββ
β Tokenizer (mBERT-based) β
ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββ
β Shared mBERT Encoder Layer β
β (bert-base-multilingual-cased) β
ββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββ
β [CLS] Pooled Representation β
ββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββββββββββ ββββββββββββββββββββββββββ ββββββββββββββββββββββββββ
β Language Head β β Sentiment Head β β Topic Head β
β (Kelvinmbewe/ β β (Kelvinmbewe/ β β (Kelvinmbewe/ β
β mbert_Lusaka_ β β mbert_LusakaLang_ β β mbert_LusakaLang_ β
β Language_Analysis) β β Sentiment_Analysis) β β Topic) β
ββββββββββββββββββββββββ ββββββββββββββββββββββββββ ββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
ββββββββββββββββββββββββ ββββββββββββββββββββββββββ ββββββββββββββββββββββββββ
β Language Label β β Sentiment Label β β Topic Label β
β (e.g., Bemba, Nyanja, β β (Negative/Neutral/ β β (Driver, Payment, β
β English, CodeβSwitch)β β Positive) β β Support, etc.) β
ββββββββββββββββββββββββ ββββββββββββββββββββββββββ ββββββββββββββββββββββββββ
- Downloads last month
- 36
Evaluation results
- accuracy on LusakaLang Language Datatest set self-reported0.970
- f1_macro on LusakaLang Language Datatest set self-reported0.960
- accuracy on LusakaLang Language Datatest set self-reported0.932
- f1_macro on LusakaLang Language Datatest set self-reported0.922
- f1_negative on LusakaLang Language Datatest set self-reported0.865
- f1_neutral on LusakaLang Language Datatest set self-reported0.950
- f1_positive on LusakaLang Language Datatest set self-reported0.950
- accuracy on LusakaLang Language Datatest set self-reported0.910
- f1_macro on LusakaLang Language Datatest set self-reported0.900