new-benchmark-challenges-ai-s-ability-to-learn

draws-on arxiv-2606-03565v1 draws-on arxiv-2606-03024v1 headline **"New Benchmark Challenges AI's Ability to Learn New Skills."** A new benchmark, R3-Skill, has been developed to test the ability of large language models (LLMs) to learn new skills. finds R3-Skill features realistic agent skill routing and spans four language directions. finds The benchmark has been verified through multi-expert cross-checking. finds A two-stage retrieval system, R3-Embedding + R3-Reranker, has been built to tackle the benchmark, achieving high performance on skill retrieval and compatibility. finds The R3-Embedding + R3-Reranker pipeline attains Hit@1 = 0.7714 and NDCG@10 = 0.8327 on R3-Skill. finds Meanwhile, a new permission framework, SkillGuard, has been proposed to improve the security and privacy of agent skill ecosystems. finds SkillGuard treats skills as permission-bearing executable artifacts and regulates context influence and action side effects through skill manifests and runtime access control. finds The framework has been evaluated on 315 real-world skills and has been shown to reduce attack success by 23.02% for contextual injections.

Also as Turtle and JSON-LD.

← all subjects