[T]he Chinchilla scaling law for training Transformer language models suggests that when given an increased budget (in FLOPs), to achieve compute-optimal, the number of model parameters (N) and the number of tokens for training the model (D) should scale in approximately equal proportions.
Yes, it’s BS, like most of the AI takes here.
The kernel of truth is scaling laws: