Big Tech firms are scrambling for AI training data and Meta seems to have one big advantage over its rivals: using Instagram and Facebook photos.
Meta’s chief product officer, Chris Cox, told Bloomberg’s Tech Summit on Thursday that it uses publicly available photos and text from the platforms to train its text-to-image generator model called Emu.
“We don’t train on private stuff, we don’t train on stuff that people share with their friends, we do train on things that are public,” he said.
Meta’s text-to-image model can produce “really amazing quality images” because Instagram has many photos of “art, fashion, culture and also just images of people and us,” Cox added.
Users can create images on Meta AI by typing a prompt starting with the word “imagine,” and it will generate four images, according to its website.
AI models need to be fed and trained on data for them to be effective. It’s been a contentious issue as there’s almost no way to prevent copyrighted content from being scraped from the internet and used to create an LLM.
However, the US Copyright Office has been attempting to deal with this issue since early last year and is considering updating its laws to address it.
One way companies are trying to obtain data is by joining forces with other firms. OpenAI, for example, has partnered with several media outlets to license their content and develop its models.
Meta even considered acquiring the publisher Simon & Schuster in a bid to get more data to train its models, The New York Times reported last month.
As well as raw data sets, companies use “feedback loops” — data that is collected from past interactions and outputs that are analyzed to improve future performance — to train their models. It includes algorithms that inform AI models when there’s an error so it can learn from it.
Meta CEO Mark Zuckerberg last month told The Verge that feedback loops will be “more valuable” than any “upfront corpus.”
Meta didn’t immediately respond to a request for comment from Business Insider, made outside normal working hours.