科研空间
按文件夹整理。论文、实验、方法都在下面。
最近页面
最近 5 篇
文件夹
/research/
/research/formal-math/
形式化数学
/research/formal-math/autoformalization/
Autoformalization
- 打开
/research/formal-math/autoformalization/formalizing-mathematics-at-scale/
Formalizing Mathematics at Scale 论文精读
一页读懂 AutoformBot 与 ATLAS:大规模数学教材自动形式化的多智能体工程系统。
形式化数学 / Autoformalization / 2026-05-31 #formal-math#autoformalization#lean#multi-agent - 打开
/research/formal-math/autoformalization/right-symmetries-formal-theorem-proving/
What are the Right Symmetries for Formal Theorem Proving? 论文精读
用 rewriting categories 解释形式定理证明中的等价改写、success invariance 和 test-time rewriting ensemble,并对比 FormalEvolve 的 autoformalization repertoire 路线。
形式化数学 / Autoformalization / 2026-05-29 #formal-math#formal-theorem-proving#symmetry#Lean - 打开
/research/formal-math/autoformalization/formalevolve-cheatsheet/
FormalEvolve 论文精读
FormalEvolve: Beyond a Single Ground Truth for Autoformalization 的精读式 HTML 解读。
形式化数学 / Autoformalization / 2026-05-24 #autoformalization#Lean#formal-methods#paper-reading
/research/formal-math/lectures/
Lecture Notes
- 打开
/research/formal-math/lectures/berkeley-agents-autoformalization-atp/
自动形式化与自动定理证明:Formal Reasoning Meets LLMs
Berkeley CS294/194-280 Spring 2025 Kaiyu Yang 讲义 HTML 版:SFT/RL 的可验证性边界、LeanDojo/ReProver、LIPS、autoformalization 评估和 LeanEuclid。
形式化数学 / Lecture Notes / 2026-05-28 #formal-math#lecture-notes#Berkeley-CS294-280#autoformalization - 打开
/research/formal-math/lectures/berkeley-agents-alphaproof/
AlphaProof:当强化学习遇到形式数学
Berkeley CS294/194-280 Spring 2025 AlphaProof 讲义 HTML 版:Lean/Mathlib、AlphaZero 风格搜索、IMO 2024、formalizer/prover、test-time RL 与形式数学边界。
形式化数学 / Lecture Notes / 2026-05-28 #formal-math#lecture-notes#Berkeley-CS294-280#AlphaProof
/research/self-evolving-agent/
自进化 agent
/research/self-evolving-agent/coding-benchmark/
Coding Benchmark
/research/self-evolving-agent/icl-agent-analysis/
ICL / Agent 分析
/research/self-evolving-agent/test-time-learning/
Test-Time Learning / Adaptive Memory
/research/self-evolving-agent/prompt-evolution/
Prompt Evolution / Optimization
/research/self-evolving-agent/agent-evaluation/
Agent Evaluation
- 打开
/research/self-evolving-agent/agent-evaluation/agents-last-exam/
Agents' Last Exam
中文 paper2html 精读:ALE 如何用真实长程专业工作流、GCUA agent harness 和三档难度评测 frontier agents 的经济任务能力。
Self-Evolving Agent / Agent Evaluation / 2026-06-08 #paper-reading#self-evolving-agent#agent-evaluation#agent-benchmark - 打开
/research/self-evolving-agent/agent-evaluation/automated-capability-discovery/
Automated Capability Discovery via Foundation Model Self-Exploration
中文 paper2html 精读:用 scientist model 生成开放式任务族,系统发现 subject model 的能力边界、失败模式和能力签名。
Self-Evolving Agent / Agent Evaluation / 2026-06-08 #paper-reading#self-evolving-agent#agent-evaluation#capability-discovery
/research/self-evolving-agent/drift-monitor/
Drift Monitor 精读
- 打开
/research/self-evolving-agent/drift-monitor/11-do-self-evolving-agents-forget/
Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation
完整 paper2html 精读:self-evolving agents 在 workflow、skill/tool、model、memory 四条演化通道上的 capability erosion,以及 CPE 如何做能力保持。
自进化 agent / Drift Monitor 精读 / 2026-06-03 #paper-reading#self-evolving-agent#drift-monitor#capability-erosion - 打开
/research/self-evolving-agent/drift-monitor/03-agentdevel-release-engineering/
AgentDevel
完整 paper2html 精读:把 self-evolving LLM agents 改写成 release engineering,围绕 RC、P2F/F2P gate、可审计诊断与非回归发布构建 Drift Monitor baseline。
自进化 agent / Drift Monitor 精读 / 2026-05-26 #paper-reading#self-evolving-agent#drift-monitor#release-engineering - 打开
/research/self-evolving-agent/drift-monitor/10-agentlab-long-horizon-attacks/
AgentLAB: Benchmarking LLM Agents against Long-Horizon Attacks
完整 paper2html 精读:AgentLAB 如何系统化 long-horizon attacks,并为 Drift Monitor 提供 adversarial probe suite。
自进化 agent / Drift Monitor 精读 / 2026-05-26 #paper-reading#self-evolving-agent#drift-monitor#long-horizon-attacks - 打开
/research/self-evolving-agent/drift-monitor/09-agentxray-workflow-reconstruction/
AgentXRay: White-Boxing Agentic Systems via Workflow Reconstruction
完整 paper2html 精读:AgentXRay 如何用 Agentic Workflow Reconstruction 把黑盒 agentic system 重构为可审查 workflow,并为 Drift Monitor 提供 human-readable delta。
自进化 agent / Drift Monitor 精读 / 2026-05-26 #paper-reading#self-evolving-agent#drift-monitor#workflow-reconstruction - 打开
/research/self-evolving-agent/drift-monitor/08-air-incident-response/
AIR: Improving Agent Safety through Incident Response
完整 paper2html 精读:AIR 如何把 agent safety 从预防扩展到 detection、containment、recovery、eradication,并为 Drift Monitor 提供 action layer。
自进化 agent / Drift Monitor 精读 / 2026-05-26 #paper-reading#self-evolving-agent#drift-monitor#incident-response - 打开
/research/self-evolving-agent/drift-monitor/07-alignment-tipping-process/
Alignment Tipping Process
完整 paper2html 精读:ATP 如何揭示 self-evolving agents 在反馈驱动下发生 alignment tipping,以及它对 trend-aware Drift Monitor 的启发。
自进化 agent / Drift Monitor 精读 / 2026-05-26 #paper-reading#self-evolving-agent#drift-monitor#alignment-tipping - 打开
/research/self-evolving-agent/drift-monitor/02-evaluating-goal-drift/
Evaluating Goal Drift in Language Model Agents
完整 paper2html 精读:long-horizon LM agents 的 goal drift 评估、GD_actions/GD_inaction 指标、实验设置与 Drift Monitor 启发。
自进化 agent / Drift Monitor 精读 / 2026-05-26 #paper-reading#self-evolving-agent#drift-monitor#goal-drift - 打开
/research/self-evolving-agent/drift-monitor/05-memorygraft/
MemoryGraft
完整 paper2html 精读:MemoryGraft 如何通过 poisoned experience retrieval 持久污染 LLM agent 的长期记忆,以及它对 retrieval-time Drift Monitor 的启发。
自进化 agent / Drift Monitor 精读 / 2026-05-26 #paper-reading#self-evolving-agent#drift-monitor#memory-poisoning - 打开
/research/self-evolving-agent/drift-monitor/04-oep-experience-poisoning/
OEP
完整 paper2html 精读:OEP 如何用局部正确但不可迁移的经验污染 self-evolving agents 的反思/记忆,及其对 memory update gate 的启发。
自进化 agent / Drift Monitor 精读 / 2026-05-26 #paper-reading#self-evolving-agent#drift-monitor#memory-poisoning - 打开
/research/self-evolving-agent/drift-monitor/06-routine-chats-turn-toxic/
Routine Chats Turn Toxic
完整 paper2html 精读:Routine Chats Turn Toxic 如何把日常长期交互转化为持久状态漂移,并给出 StateGuard 这种 writeback-boundary Drift Monitor baseline。
自进化 agent / Drift Monitor 精读 / 2026-05-26 #paper-reading#self-evolving-agent#drift-monitor#state-poisoning - 打开
/research/self-evolving-agent/drift-monitor/01-your-agent-may-misevolve/
Your Agent May Misevolve
完整 paper2html 精读:misevolution taxonomy、四条 self-evolution 风险路径、实验表与 Drift Monitor 启发。
自进化 agent / Drift Monitor 精读 / 2026-05-26 #paper-reading#self-evolving-agent#drift-monitor#misevolution
/research/self-evolving-agent/lectures/
From Self-Correction To Self-Improving
- 打开
/research/self-evolving-agent/lectures/lee-self-evolving-ai/
From Self-Correction To Self-Improving
把 self-correction、self-improving 与 Harness Engineering 压缩成一条从答到监的学习路线。
自进化 Agent / From Self-Correction To Self-Improving / 2026-05-29 #self-evolving-agent#lecture-notes#Lee-Hung-yi#self-correction - 打开
/research/self-evolving-agent/lectures/lee-harness-engineering/
Harness Engineering:有时候语言模型不是不够聪明,只是没有被好好引导
李宏毅 2026 机器学习 Harness Engineering 讲义 HTML 版:context engineering、AGENTS.md、工具接口、workflow、feedback、lifelong agent 与 MetaHarness。
自进化 Agent / From Self-Correction To Self-Improving / 2026-05-29 #self-evolving-agent#lecture-notes#Lee-Hung-yi#self-correction - 打开
/research/self-evolving-agent/lectures/lee-self-improving-part1/
人工智慧能不能夠做到自我成長?
李宏毅 self-improving AI 上集讲义 HTML 版:pseudo-answer、proxy reward、RLHF/RLAIF、self-questioning 与弱到强训练。
自进化 Agent / From Self-Correction To Self-Improving / 2026-05-26 #self-evolving-agent#lecture-notes#Lee-Hung-yi#self-correction - 打开
/research/self-evolving-agent/lectures/lee-self-correction/
AI 能自我修正吗?从 Decoding、Workflow 到 Reasoning
李宏毅 2026 机器学习 self-correction 讲义 HTML 版:从 contrastive decoding、verification workflow 到 RL reasoning。
自进化 Agent / From Self-Correction To Self-Improving / 2026-05-26 #self-evolving-agent#lecture-notes#Lee-Hung-yi#self-correction - 打开
/research/self-evolving-agent/lectures/lee-self-improving-part2/
AI 要跨越卢比孔河了吗?自我成长的 AI 离我们多远(下集)
李宏毅 self-growing AI 下集讲义 HTML 版:Harness、Prompt/Memory/Workflow 优化、SEAL、meta learning 与目标错位风险。
自进化 Agent / From Self-Correction To Self-Improving / 2026-05-26 #self-evolving-agent#lecture-notes#Lee-Hung-yi#self-correction
/research/self-evolving-agent/open-endedness/
Open-Endedness
/research/self-evolving-agent/skill-optimization/
Skill Optimization / Drift Monitor
- 打开
/research/self-evolving-agent/skill-optimization/skillopt-executive-strategy/
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
中文精读:SkillOpt 把 Agent Skill 当作 frozen agent 的外部可训练状态,用 rollout、文本学习率、验证门和 rejected-edit buffer 训练 deployable best_skill.md。
自进化 Agent / Skill Optimization / Drift Monitor / 2026-05-27 #paper-reading#self-evolving-agent#skill-optimization#agent-skills - 打开
/research/self-evolving-agent/skill-optimization/skillsbench-agent-skills/
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
中文精读:SkillsBench 系统评估 Agent Skills 是否真正提升任务表现,覆盖 84 个任务、7 个 agent-model 配置和 7308 条轨迹。
自进化 Agent / Skill Optimization / Drift Monitor / 2026-05-27 #paper-reading#self-evolving-agent#skill-optimization#agent-skills