--- license: mit tags: - privacy - policy-analysis - classification - text-classification - transformers - distilbert library_name: transformers datasets: - opp-115 model-index: - name: Privacy Clause Classifier (DistilBERT - OPP-115) results: [] --- # Privacy Clause Classifier (DistilBERT - OPP-115) This model is a fine-tuned DistilBERT model designed to classify **privacy policy clauses** into one of the predefined privacy practices based on the [OPP-115 dataset](https://privacy-hosting.isi.edu/data/OPP-115.pdf). | ID | Category | |----|---------------------------------| | 0 | Data Retention | | 1 | Data Security | | 2 | Do Not Track | | 3 | First Party Collection/Use | | 4 | International and Specific Audiences | | 5 | Other | | 6 | Policy Change | | 7 | Third Party Sharing/Collection | | 8 | User Access, Edit and Deletion | | 9 | User Choice/Control | --- ## Model Details - **Architecture**: DistilBERT (pretrained) - **Fine-tuning Dataset**: [OPP-115 Dataset](https://privacy-hosting.isi.edu/data/OPP-115.pdf) - **Input Format**: Text snippets from privacy policies - **Output Format**: Predicted class label with probabilities --- ## Intended Uses - Automatic **privacy policy clause classification** - **Regulatory technology (RegTech)** tools - **Privacy policy summarization** and simplification - **Risk analysis** for data sharing and collection practices --- ## How to Use ```python from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification import torch # Load model tokenizer = DistilBertTokenizerFast.from_pretrained("your-hf-username/your-model-name") model = DistilBertForSequenceClassification.from_pretrained("your-hf-username/your-model-name") # Predict text = "We may collect your location data to provide customized services." inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) outputs = model(**inputs) predicted_class = torch.argmax(outputs.logits, dim=-1).item() print(f"Predicted Category: {predicted_class}")