We’ve heard about Web 3.0 for years, but when the cryptocurrency hype faded nothing really changed. The web isn’t built on a blockchain, the tech giants haven’t been dethroned, and more importantly content creators are still struggling to monetize their work with a combination of ads and private memberships.
Well, OpenAI and the flood of similar large language models may have finally signaled the beginning of the end for the web as we’ve known it for over 20 years.
Lets start with a bit of history, I’ll try to keep this short but no promises.
tl;dr; Iterations of the internet are defined by the business model driving the machine. We’re entering a new phase where advertising is being replaced with scrubbing content to train large language models (LLMs).
I usually distinguish this as the time when content online was largely a one-way street - authors of a site could publish new content to their site but readers couldn’t really interact or respond to it. Sites often had hit counters showing how many times the page was requested, but that was pretty much the extent of what you knew about your audience.
Web 1.0 was the wild west of the web. HTML and CSS were simple enough that anyone interested could learn how to get their ideas online. We hadn’t defined what a “good” user experience was and people took that freedom to try out wacky ideas. We even had the now defunct
<marquee> element to scroll banners of text across the screen. AKA the good old days.
Web 2.0 came along right around in the early 00’s, after the dot-com bubble when everyone realized that the web still does in fact cost money and online businesses eventually have to make money.
No, not the search engine. Google the advertising platform. Sure Google built a better mousetrap with their search algorithm, but it only exists to sell ads and collect extremely detailed and personal data on its users (which then helps sell more valuable ads).
This opened the door for online businesses to actually create a business with an actual revenue model. This revenue model was, you guessed it, ads. Countless websites popped up creating new content online with the main goal of bringing readers back to view more ads and unknowingly providing more personal data.
This is the world we’ve lived in for a couple decades, culminating in both users and some browser vendors implementing features that try to protect users from the very ad model that was Web 2.0. That battle has been brewing for a while, but it seems as though we’re finally seeing a light at the end of that tunnel with a new business model hitting the streets.
OpenAI opened Pandora’s box in more ways than they’re given credit for. Not only did they make impressive (and dangerous) gains of function compared to previous machine learning tools, they opened the door for a new way of monetizing online content.
Online advertising isn’t the same business it used to be. It’s even harder to really track value gained from online ads and the game of cat and mouse with ad blockers will never end. Luckily for online businesses, OpenAI created value where it didn’t previously exist.
If you can throw up walls around user-generated content and control access, you can sell that to anyone wanting to train an LLM. Today LLMs are mostly trained on data from a couple years ago, but eventually that will change and when they do Twitter and Reddit will be sitting on a gold mine of user content heavily focused on current events and trends.
Where Web 2.0 focused on content creators selling ads by knowing intimate details of every visitor, Web 3.0 will focus on getting users to create as much content as possible inside of a walled garden.
Where do we go from here?
I’m honestly not sure whether I think this new model is better or worse than the advertising model. I never liked the ad model, though I could fight back by trying to block ads I really had no chance of semblance of privacy online. User tracking is woven so deeply into the fundamentals of the modern web that it’s effectively impossible to truly be anonymous online. At the end of the day I may be able to make targeting me for ads a bit harder, ironically I may only be making ads less interesting to me without escaping ads all together.
What I can say, though, is that I don’t like the idea of content I post online being used to train for-profit machine learning tools. I rarely use social media and whenever possible I try to go the IndieWeb approach of writing on my own site and syndicating out to those other platforms.
This might have to change going forward though, and as much as I like the open web we may have finally made it untenable to post content publicly. I may change nothing in the short term, but I have toyed with hiding all of my old posts and only publishing to a private RSS feed. To that end, I’m curious what the LLM bots actually scrape and whether publishing only to RSS and avoiding an HTML page would dodge their algorithms for now.