language:
- en
- bem
- ny
tags:
- multi-task
- sentiment-analysis
- topic-classification
- language-identification
- multilingual
- transformer
- zambia
- lusaka
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
model-index:
- name: LusakaLang-MultiTask
results:
- task:
type: text-classification
name: Language Identification
dataset:
name: LusakaLang Language Data
type: lusakalang
split: test
metrics:
- type: accuracy
value: 0.97
name: accuracy
- type: f1
value: 0.96
name: f1_macro
- type: accuracy
value: 0.9322
name: accuracy
- type: f1
value: 0.9216
name: f1_macro
- type: f1
value: 0.8649
name: f1_negative
- type: f1
value: 0.95
name: f1_neutral
- type: f1
value: 0.95
name: f1_positive
- type: accuracy
value: 0.91
name: accuracy
- type: f1
value: 0.9
name: f1_macro
base_model:
- Kelvinmbewe/mbert_Lusaka_Language_Analysis
- Kelvinmbewe/mbert_LusakaLang_Sentiment_Analysis
- Kelvinmbewe/mbert_LusakaLang_Topic
LusakaLang MultiTask Model
This model is a unified transformer architecture built on top of bert-base-multilingual-cased, designed to perform three tasks simultaneously:
- Language Identification
- Sentiment Analysis
- Topic Classification
The system integrates three fineโtuned LusakaLang checkpoints:
- mbert_Lusaka_Language_Analysis
- mbert_LusakaLang_Sentiment_Analysis
- mbert_LusakaLang_Topic
All tasks share a single mBERT encoder, supported by three independent classifier heads. This architecture enhances computational efficiency, reduces memory overhead and promotes consistent, harmonized predictions across all tasks.
Why This Model Matters
Zambian communication is inherently multilingual, fluid, and deeply shaped by context. A single message may blend English, Bemba, Nyanja, local slang, and frequent codeโswitching, often expressed through culturally grounded idioms and subtle emotional cues. This model is designed specifically for that environment, where meaning depends not only on the words used but on how languages interact within a single utterance.
It excels at identifying the dominant language or detecting when multiple languages are being used together, interpreting sentiment even when it is conveyed indirectly or through culturally specific phrasing, and classifying text into practical topics such as driver behaviour, payment issues, app performance, customer support, and ride availability. By capturing these nuances, the model provides a more accurate and contextโaware understanding of real Zambian communication.
How to Use This Model
from transformers import AutoTokenizer
import torch
class LusakaLangMultiTask:
def __init__(self, path="Kelvinmbewe/LusakaLang-MultiTask"):
self.tokenizer = AutoTokenizer.from_pretrained(path)
self.model = torch.load(f"{path}/model.pt").eval()
def predict_language(self, texts): pass
def predict_sentiment(self, texts): pass
def predict_topic(self, texts): pass
llm = LusakaLangMultiTask()
print(llm.predict_language([...]))
print(llm.predict_sentiment([...]))
print(llm.predict_topic([...]))
Sample Output
# Language Identification ๐
[
{"lang": "Bemba", "conf": 0.96},
{"lang": "Nyanja", "conf": 0.95},
{"lang": "English","conf": 0.99}
]
# Sentiment โค๏ธ
[
{"sent": "Negative", "conf": 0.98},
{"sent": "Positive", "conf": 0.95},
{"sent": "Neutral", "conf": 0.87}
]
# Topic ๐๏ธ
[
{"topic": "Payment Issue", "conf": 0.97},
{"topic": "Customer Support", "conf": 0.95},
{"topic": "Driver Behaviour", "conf": 0.96}
]
=========================== Training Architecture ===========================
๐ฅ Input โ ๐ง Core Engine โ ๐ Output
------------------------------------------------------------------------------------
Text (Any Language) โ Tokenizer ๐ค โ Language ๐
โ Shared mBERT Encoder ๐ง โ Bemba / Nyanja /
โ CLS Vector ๐ฏ โ English / Mixed
------------------------------------------------------------------------------------
User Feedback ๐ฌ โ Tokenizer ๐ค โ Sentiment โค๏ธ
โ Shared Encoder ๐ง โ Negative / Neutral /
โ CLS Vector ๐ฏ โ Positive
------------------------------------------------------------------------------------
Ride Context ๐ โ Tokenizer ๐ค โ Topic ๐๏ธ
โ Shared Encoder ๐ง โ Driver / Payment /
โ CLS Vector ๐ฏ โ Support / App / Availability
------------------------------------------------------------------------------------