This has nothing to do with centralization. AI companies are already scraping the web for everything useful. If you took the content from SO and split it into 1000 federated sites, it would still end up in a AI model. Decentralization would only help if we ever manage to hold the AI companies accountable for the en masse copyright violations they base their industry on.
Imagine you were asked to start speaking a new language, eg Chinese. Your brain happens to work quite differently to the rest of us. You have immense capabilities for memorization and computation but not much else. You can’t really learn Chinese with this kind of mind, but you have an idea that plays right into your strengths. You will listen to millions of conversations by real Chinese speakers and mimic their patterns. You make notes like “when one person says A, the most common response by the other person is B”, or “most often after someone says X, they follow it up with Y”. So you go into conversations with Chinese speakers and just perform these patterns. It’s all just sounds to you. You don’t recognize words and you can’t even tell from context what’s happening. If you do that well enough you are technically speaking Chinese but you will never have any intent or understanding behind what you say. That’s basically LLMs.