Training Compute-Optimal Large Language Models
The Chinchilla paper shows that model parameters and training tokens should scale in approximately equal proportions, enabling smaller, better-trained models.
The Chinchilla paper shows that model parameters and training tokens should scale in approximately equal proportions, enabling smaller, better-trained models.