Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

The paper correctly prioritizes out-of-distribution rank transfer, but its proposed score conflates predictive validity with risk-adjusted utility.

June 21, 2026 · Sai Boorlagadda

From Static Templates to Dynamic Runtime Graphs

The paper offers a useful template/realized-graph/trace distinction and reporting protocol, but lacks a reproducible survey methodology.

June 21, 2026 · Sai Boorlagadda