• General_Effort@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    7 months ago

    Yes, it’s BS, like most of the AI takes here.

    The kernel of truth is scaling laws:

    [T]he Chinchilla scaling law for training Transformer language models suggests that when given an increased budget (in FLOPs), to achieve compute-optimal, the number of model parameters (N) and the number of tokens for training the model (D) should scale in approximately equal proportions.