What AI knows about you

Ina Fried

Illustration of a brain surrounded by circuits and glitchy shapes. — Illustration: Maura Losch/Axios

Most AI builders don't say where they are getting the data they use to train their bots and models — but legally they're required to say what they are doing with their customers' data.

The big picture: These data-use disclosures open a window onto the otherwise opaque world of Big Tech's AI brain-food fight.

In this new Axios series, we'll tell you, company by company, what all the key players are saying and doing with your personal information and content.

Why it matters: You might be just fine knowing that picture you just posted on Instagram is helping train the next generative AI art engine. But you might not — or you might just want to be choosier about what you share.

Zoom out: AI makers need an incomprehensibly gigantic amount of raw data to train their large language and image models.

The industry's hunger has led to a data land grab: Companies are vying to teach their baby AIs using information sucked in from many different sources — sometimes with the owner's permission, often without it — before new laws and court rulings make that harder.

Zoom in: Each Big Tech giant is building generative AI models, and many of them are using their customer data, in part, to train them.

In some cases it's opt-in, meaning your data won't be used unless you agree to it. In other cases it is opt-out, meaning your information will automatically get used unless you explicitly say no.
These rules can vary by region, thanks to legal differences. For instance, Meta's Facebook and Instagram are "opt-out" — but you can only opt out if you live in Europe or Brazil.
In the U.S., California's data privacy law is among those responsible for requiring firms to say what they do with user data. In the EU, it's the GDPR.

Between the lines: AI makers' data-use practices typically vary based on whether a firm operates in the consumer realm or the enterprise business.

On the consumer side, especially with free services, options to avoid allowing your data to be used for AI training are often more limited, while businesses and organizations generally expect their data won't be used.
Adobe, for example, ignited a firestorm with changes to its terms of service that left the impression it was using business customers' data to train its generative AI systems. In response, the company put its pledge not to do so in writing.

Where companies get the data they use to train their models — essentially, the "teaching" phase — is separate but related to what they do with customer data that's shared with AI once the training is done and customers are using a service.

Apple, for example, is making extensive use of personal data for Apple Intelligence.

But the company has committed to a new architecture that it says will ensure the data remains private.
Personal information will be processed on-device (like your own phone) — or, if it needs to be sent to a cloud data center, Apple says it will insure that no one other than the user (even Apple) will have access.

Microsoft, meanwhile, has several times delayed the Recall feature of its Copilot+ PCs because of data-privacy questions.

Although the work is being done on-device, it initially was stored in a way that other software could easily access.
Microsoft's approach also preserves tons of screenshots, which can include an array of sensitive information — although the company has settings to turn off the feature for specific apps and websites.

OpenAI has an array of different policies and options that vary based on the type of customer and whether they are using free or paid services.

What's next: Over the coming weeks, this series will look company-by-company at what customer data is being used to train AI and what happens to your data when you use an AI service.

We will talk to the tech giants and the big AI companies as well as other consumer and enterprise software companies whose policies or practices have garnered attention.
We'll dig into their policies and the options available to customers who don't want their data used for AI training.

The bottom line: In tech's social media era, the industry built vast global networks that transmuted our posts and clicks into rivers of profit by monetizing users' personal information.

AI is giving that information new value and giving us new reasons to provide it — but at least this time around, we should know what we're getting into.

In this series:

Add Axios on Google

What AI knows about you

What to read next