🆕 New skill: Security Specialist v2.0! 6-phase pipeline, 9 attack classes and adversarial validation.View skill →
Skill Evaluation
Overview
Section titled “Overview”Evaluate any agent skill against 12 best-practice criteria from Anthropic and agentskills.io. Produces a structured markdown scorecard with per-criterion scores (0–100), category classification, bonus patterns, and prioritized improvement actions.
When to Use
Section titled “When to Use”- Evaluate skill quality before publishing
- Audit existing skills for improvement opportunities
- Compare two skills side-by-side
- Check compliance with industry best practices
Installation
Section titled “Installation”npx skills add https://github.com/fabricioctelles/skills -s skill-evaluation12 Scored Criteria
Section titled “12 Scored Criteria”| # | Criterion | Weight |
|---|---|---|
| 1 | Don’t state the obvious | 2x |
| 2 | Gotchas section | 2x |
| 3 | Progressive disclosure | 2x |
| 4 | Avoids railroading | 1x |
| 5 | Setup flow | 1x |
| 6 | Description for trigger | 2x |
| 7 | Memory mechanism | 1x |
| 8 | Scripts & libraries | 1x |
| 9 | On-demand hooks | 1x |
| 10 | Conciseness | 2x |
| 11 | Coherent scope | 1x |
| 12 | Grounded in expertise | 2x |
4 Bonus Patterns (measured, not scored)
Section titled “4 Bonus Patterns (measured, not scored)”- Validation loops
- Output templates
- Procedures over declarations
- Defaults over menus
How it differs from agentskills.io evals
Section titled “How it differs from agentskills.io evals”| Skill Evaluation | agentskills.io evals | |
|---|---|---|
| Evaluates | Skill structure quality | Skill output quality in use |
| Method | Static inspection | Test cases + benchmark |
| When | Is the skill well-built? | Does it work in practice? |
| Output | Scorecard 0-100 + grade | pass_rate + tokens + time |
Use this skill first for solid structure, then run evals to validate real-world performance.
Example Output
Section titled “Example Output”# Skill Evaluation — skill-evaluation
> Evaluated: 2026-06-27> Evaluator: skill-evaluation v1.0.0
## Summary
| Metric | Value ||--------|-------|| Overall Score | 62/100 || Grade | B || Category | code-quality-and-review || Files | 2 || Has references/ | yes || Has scripts/ | no |
## Scorecard
| # | Criterion | Score | Notes ||---|-----------|-------|-------|| 1 | Don't state the obvious | 85 | Framework is specific, not generic || 2 | Gotchas section | 0 | Absent — no pitfall warnings || 3 | Progressive disclosure | 55 | 1 reference file, template inline || 6 | Description for trigger | 90 | Multiple concrete triggers || 10 | Conciseness | 70 | 223 lines, output template could be ref || 11 | Coherent scope | 95 | Does ONE thing well || 12 | Grounded in expertise | 80 | 3 authoritative sources |
## Top 3 Improvements
1. Gotchas (0) — Add multi-client evaluation pitfalls2. Scripts (0) — Create quick-check.sh for measurable criteria3. Memory (0) — Keep evaluations.log for progress tracking