Paper Notes

These notes capture more than a paper’s abstract. Each summary covers the research question, experimental evidence, limitations, and implications for building real systems.

Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents

The paper correctly prioritizes out-of-distribution rank transfer, but its proposed score conflates predictive validity with risk-adjusted utility.

From Static Templates to Dynamic Runtime Graphs

The paper offers a useful template/realized-graph/trace distinction and reporting protocol, but lacks a reproducible survey methodology.

DataComp-LM: In Search of the Next Generation of Training Sets for Language Models

DataComp-LM establishes a controlled benchmark for dataset research and finds that aggressive model-based quality filtering is more effective than conventional source mixing.

Neural Machine Translation of Rare Words with Subword Units

The paper adapts byte pair encoding to learn variable-length subword units, enabling open-vocabulary neural translation without an external dictionary fallback.

Training Compute-Optimal Large Language Models

The Chinchilla paper shows that model parameters and training tokens should scale in approximately equal proportions, enabling smaller, better-trained models.