AI's Data Problem Creates a Digital Monstrosity

Jul 30, 2024

People have been criticizing generative AI for a long time. One person who does it really well is Gary Marcus. He’s repeatedly called on AI companies to employ neurosymbolic architecture instead, which combines neural networks (NNs) with classical symbolic algorithms. This approach is often compared to Tversky and Kahneman’s dual-process theory, integrating both fast, pattern-based learning (through NNs) and rule-based, logical learning (through symbolic AI). I agree with Marcus, not just because of the widely-known lack of common sense that our current LLMs such as ChatGPT display, but also because a new article in Nature just confirmed a very big problem for the future of generative AI.

The new and pressing problem is that AI-generated media is already flooding the internet, making it harder to distinguish from actual media. When LLMs run out of data to feed their internal hunger, AI companies won’t be able to tell human-generated and AI-generated content apart. AI will hinder its own development because models being fed on synthetic images can lead to model collapse (Shumailov et al., 2024). This will result in AI-generated dogs that look like this:

That…is not a dog. It doesn’t even resemble anything like a dog. It actually bears a shocking resemblance to the monster/pig with a Nixon mask in this I Think You Should Leave sketch. So unless you want that thing coming into your house/search result, get off the generative AI hype train…it’s rapidly screeching to a halt. Unless we acknowledge the fundamental shortcomings of current AI technologies and shift towards a hybrid architecture that incorporates symbolic reasoning, we risk a future full of nonsensical outputs and monstrous dogs.

Theo’s Substack

Discussion about this post