Cybersecurity NER Model

spaCy NER model with RoBERTa transformer backbone, trained for cybersecurity entity extraction.

Entity Types (9)

Entity	Description	F1 Score
SECURITY_ROLE	Job titles (CISO, SOC Analyst, Pentester)	57.8%
TECHNICAL_SKILL	Skills (Incident Response, Threat Hunting)	54.7%
SECURITY_TOOL	Tools (Splunk, CrowdStrike, Metasploit)	100%
CERTIFICATION	Certs (CISSP, OSCP, CEH)	100%
FRAMEWORK	Frameworks (NIST, MITRE ATT&CK, ISO 27001)	100%
THREAT_TYPE	Threats (APT, ransomware, phishing)	90%
ATTACK_TECHNIQUE	Attacks (SQL injection, XSS, RCE)	100%
REGULATION	Regulations (GDPR, HIPAA, PCI-DSS)	100%
SECURITY_DOMAIN	Domains (Cloud Security, Network Security)	13%

Overall: F1 69.4% | Precision 69.1% | Recall 69.8%

Training Data

1,500+ unique cybersecurity entities
1,000 synthetic training examples (CVs, job descriptions)
Domain-adapted RoBERTa on 40K security texts

Usage

import spacy

# Load model
nlp = spacy.load("path/to/model")

# Extract entities
doc = nlp("CISO with CISSP certification, expert in Splunk SIEM and threat hunting")

for ent in doc.ents:
    print(f"{ent.text}: {ent.label_}")

Output:

CISO: SECURITY_ROLE
CISSP: CERTIFICATION
Splunk: SECURITY_TOOL
threat hunting: TECHNICAL_SKILL

Requirements

spacy>=3.8.0
spacy-transformers>=1.3.0

Use Cases

Threat intelligence parsing
Security talent matching (CV/job analysis)
Skills inventory extraction
Compliance document analysis

Limitations

SECURITY_DOMAIN has low recall (7%) - needs more training data
SECURITY_ROLE and TECHNICAL_SKILL F1 below target - ongoing improvement
Trained primarily on English text

License

Apache 2.0

Citation

@misc{cybersec-ner-2024,
  author = {PKI},
  title = {Cybersecurity NER Model},
  year = {2024},
  publisher = {HuggingFace},
}

Downloads last month: -

Evaluation results

F1
self-reported

0.694
Precision
self-reported

0.691
Recall
self-reported

0.698