Post
SecureCode v2.1: framework-specific secure coding patterns, now on HuggingFace
Quick update on the SecureCode dataset. After testing the v2.0 models against real codebases, one gap kept showing up: the models understood *what* was insecure but generated language-generic fixes. A developer using Express.js doesn't need "set security headers"they need
What changed in v2.1:
- 1,435 total examples (v2.0's 1,216 baseline + 219 new framework-specific additions)
- 9 production frameworks: Express.js, Spring Boot, React, Next.js, FastAPI, GraphQL, SQLAlchemy, Flask, Vue.js
- 475 unique CVEs (73 new, including framework-specific treatments of Log4Shell, Spring4Shell, and others)
- 5-tier quality rubric: Every new example scores 90+/100 across correctness, new dataset average is nearly 97+, security hardening, real-world grounding, educational scaffolding, and production readiness
- Structured references: CVE IDs, advisory URLs, discovery/remediation dates, affected versions — not just "related to CVE-XXXX"
What stayed the same:
- Same 4-turn conversation format (compatible with existing fine-tuning workflows)
- Same license (CC BY-NC-SA 4.0)
- Full v2.0 baseline included — no need to download both
- All 8 fine-tuned models still work; v2.1-specific fine-tuning coming soon
The new examples look like this:
Instead of generic "use parameterized queries", you get Express.js with
Two configs to load:
Quick update on the SecureCode dataset. After testing the v2.0 models against real codebases, one gap kept showing up: the models understood *what* was insecure but generated language-generic fixes. A developer using Express.js doesn't need "set security headers"they need
helmet() middleware chains configured correctly. Spring Boot developers need @PreAuthorize annotations, not abstract RBAC pseudocode.What changed in v2.1:
- 1,435 total examples (v2.0's 1,216 baseline + 219 new framework-specific additions)
- 9 production frameworks: Express.js, Spring Boot, React, Next.js, FastAPI, GraphQL, SQLAlchemy, Flask, Vue.js
- 475 unique CVEs (73 new, including framework-specific treatments of Log4Shell, Spring4Shell, and others)
- 5-tier quality rubric: Every new example scores 90+/100 across correctness, new dataset average is nearly 97+, security hardening, real-world grounding, educational scaffolding, and production readiness
- Structured references: CVE IDs, advisory URLs, discovery/remediation dates, affected versions — not just "related to CVE-XXXX"
What stayed the same:
- Same 4-turn conversation format (compatible with existing fine-tuning workflows)
- Same license (CC BY-NC-SA 4.0)
- Full v2.0 baseline included — no need to download both
- All 8 fine-tuned models still work; v2.1-specific fine-tuning coming soon
The new examples look like this:
Instead of generic "use parameterized queries", you get Express.js with
express-validator input chains, Spring Boot with @Valid bean validation + BCryptPasswordEncoder, FastAPI with Depends() auth injection and Pydantic model validation, React with DOMPurify + CSP headers. Framework-native patterns you can actually deploy.Two configs to load:
from datasets import load_dataset
baseline = load_dataset("scthornton/securecode-v2.1", "v2.0-baseline") # 1,216
additions = load