I started building an agentic-ready data warehouse (GitHub.com/mathisdrn/orca) and was thinking that my skills could be optimized by benchmarking them. Turns out there is a better way of optimizing and building them using model languages themselves as evaluator and skill builder. See DsPy and GEPA.
I am wondering whether Anthropic and OpenAI skill-creator skill is themselves optimize to optimize skills efficiency on various tasks.