new-benchmark-challenges-ai-s-ability-to-learn
draws-on arxiv-2606-03565v1
draws-on arxiv-2606-03024v1
headline **"New Benchmark Challenges AI's Ability to Learn New Skills."** A new benchmark, R3-Skill, has been developed to test the ability of large language models (LLMs) to learn new skills.
finds R3-Skill features realistic agent skill routing and spans four language directions.
finds The benchmark has been verified through multi-expert cross-checking.
finds A two-stage retrieval system, R3-Embedding + R3-Reranker, has been built to tackle the benchmark, achieving high performance on skill retrieval and compatibility.
finds The R3-Embedding + R3-Reranker pipeline attains Hit@1 = 0.7714 and NDCG@10 = 0.8327 on R3-Skill.
finds Meanwhile, a new permission framework, SkillGuard, has been proposed to improve the security and privacy of agent skill ecosystems.
finds SkillGuard treats skills as permission-bearing executable artifacts and regulates context influence and action side effects through skill manifests and runtime access control.
finds The framework has been evaluated on 315 real-world skills and has been shown to reduce attack success by 23.02% for contextual injections.
Also as Turtle and JSON-LD.
← all subjects