• Fubarberry@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    27
    ·
    5 days ago

    China has a huge advantage in AI models because of how lax they are on intellectual property rights. US companies are fighting over API licensing costs, while china is just going to scrape everything and use it for free.

    The US has a lead now, but I don’t think they can maintain it without giving up on ethical training. Then again it may not matter if the US models are ethical if everyone will eventually just uses the superior unethically trained chinese models instead.

    • just_an_average_joe@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      11
      ·
      5 days ago

      The US companies already scraped the data while they could. If anything, data scraping is far far more difficult now for everyone due to technical reasons.

      Most of the new models are trained on synthetic data or higher quality of data or with RLHF. The reason deepseek is able to perform is likely because LLMs are very very new things, there are many low hanging fruits. Its no longer just about the data we already hit that limit for quite some time.

      • Naia@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 days ago

        Honestly, even from the beginning it’s pretty obvious scraped data is going to have a ton of issues. There’s too much nonsense out there, both from misinformation and people just not able to communicate.

        That’s before you get into the ethical aspects of stealing other people’s content and the way these things are being misused.