Cybersecurity NER Model
spaCy NER model with RoBERTa transformer backbone, trained for cybersecurity entity extraction.
Entity Types (9)
| Entity | Description | F1 Score |
|---|---|---|
| SECURITY_ROLE | Job titles (CISO, SOC Analyst, Pentester) | 57.8% |
| TECHNICAL_SKILL | Skills (Incident Response, Threat Hunting) | 54.7% |
| SECURITY_TOOL | Tools (Splunk, CrowdStrike, Metasploit) | 100% |
| CERTIFICATION | Certs (CISSP, OSCP, CEH) | 100% |
| FRAMEWORK | Frameworks (NIST, MITRE ATT&CK, ISO 27001) | 100% |
| THREAT_TYPE | Threats (APT, ransomware, phishing) | 90% |
| ATTACK_TECHNIQUE | Attacks (SQL injection, XSS, RCE) | 100% |
| REGULATION | Regulations (GDPR, HIPAA, PCI-DSS) | 100% |
| SECURITY_DOMAIN | Domains (Cloud Security, Network Security) | 13% |
Overall: F1 69.4% | Precision 69.1% | Recall 69.8%
Training Data
- 1,500+ unique cybersecurity entities
- 1,000 synthetic training examples (CVs, job descriptions)
- Domain-adapted RoBERTa on 40K security texts
Usage
import spacy
# Load model
nlp = spacy.load("path/to/model")
# Extract entities
doc = nlp("CISO with CISSP certification, expert in Splunk SIEM and threat hunting")
for ent in doc.ents:
print(f"{ent.text}: {ent.label_}")
Output:
CISO: SECURITY_ROLE
CISSP: CERTIFICATION
Splunk: SECURITY_TOOL
threat hunting: TECHNICAL_SKILL
Requirements
spacy>=3.8.0
spacy-transformers>=1.3.0
Use Cases
- Threat intelligence parsing
- Security talent matching (CV/job analysis)
- Skills inventory extraction
- Compliance document analysis
Limitations
- SECURITY_DOMAIN has low recall (7%) - needs more training data
- SECURITY_ROLE and TECHNICAL_SKILL F1 below target - ongoing improvement
- Trained primarily on English text
License
Apache 2.0
Citation
@misc{cybersec-ner-2024,
author = {PKI},
title = {Cybersecurity NER Model},
year = {2024},
publisher = {HuggingFace},
}
- Downloads last month
- -
Evaluation results
- F1self-reported0.694
- Precisionself-reported0.691
- Recallself-reported0.698