Cybersecurity NER Model

spaCy NER model with RoBERTa transformer backbone, trained for cybersecurity entity extraction.

Entity Types (9)

Entity Description F1 Score
SECURITY_ROLE Job titles (CISO, SOC Analyst, Pentester) 57.8%
TECHNICAL_SKILL Skills (Incident Response, Threat Hunting) 54.7%
SECURITY_TOOL Tools (Splunk, CrowdStrike, Metasploit) 100%
CERTIFICATION Certs (CISSP, OSCP, CEH) 100%
FRAMEWORK Frameworks (NIST, MITRE ATT&CK, ISO 27001) 100%
THREAT_TYPE Threats (APT, ransomware, phishing) 90%
ATTACK_TECHNIQUE Attacks (SQL injection, XSS, RCE) 100%
REGULATION Regulations (GDPR, HIPAA, PCI-DSS) 100%
SECURITY_DOMAIN Domains (Cloud Security, Network Security) 13%

Overall: F1 69.4% | Precision 69.1% | Recall 69.8%

Training Data

  • 1,500+ unique cybersecurity entities
  • 1,000 synthetic training examples (CVs, job descriptions)
  • Domain-adapted RoBERTa on 40K security texts

Usage

import spacy

# Load model
nlp = spacy.load("path/to/model")

# Extract entities
doc = nlp("CISO with CISSP certification, expert in Splunk SIEM and threat hunting")

for ent in doc.ents:
    print(f"{ent.text}: {ent.label_}")

Output:

CISO: SECURITY_ROLE
CISSP: CERTIFICATION
Splunk: SECURITY_TOOL
threat hunting: TECHNICAL_SKILL

Requirements

spacy>=3.8.0
spacy-transformers>=1.3.0

Use Cases

  • Threat intelligence parsing
  • Security talent matching (CV/job analysis)
  • Skills inventory extraction
  • Compliance document analysis

Limitations

  • SECURITY_DOMAIN has low recall (7%) - needs more training data
  • SECURITY_ROLE and TECHNICAL_SKILL F1 below target - ongoing improvement
  • Trained primarily on English text

License

Apache 2.0

Citation

@misc{cybersec-ner-2024,
  author = {PKI},
  title = {Cybersecurity NER Model},
  year = {2024},
  publisher = {HuggingFace},
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results