• Zos_Kia@lemmynsfw.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    4 months ago

    That’s what smaller models do, but it doesn’t yield great performance because there’s only so much stuff available. To get to gpt4 levels you need a lot more data, and to break the next glass ceiling you’ll need even more.

    • KevonLooney@lemm.ee
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      4 months ago

      Then these models are stupid. Humans don’t start as a blank slate. They have an inherent aptitude for language and communication. These models should start out with basics of language, so they don’t have to learn it from the ground up. That’s the next step. Right now they’re just well read idiots.

      • Zos_Kia@lemmynsfw.com
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        4 months ago

        Then these models are stupid

        Yup that is kind of the point. They are math functions designed to approximate human tasks.

        These models should start out with basics of language, so they don’t have to learn it from the ground up. That’s the next step. Right now they’re just well read idiots.

        I’m not sure what you’re pointing at here. How they do it right now, simplified, is you have a small model designed to cut text into tokens (“knowledge of syllables”), which are fed into a larger model which turns tokens into semantic information (“knowledge of language”), which is fed to a ridiculously fat model which “accomplishes the task” (“knowledge of things”).

        The first two models are small enough that they can be trained on the kind of data you describe, classic books, movie scripts etc… A couple hundred billion words maybe. But the last one requires orders of magnitude more data, in the trillions.