Contractors training Meta's AI say they read intimate talks with its chatbot — and see data that identifies users – Business Insider


That talk you had with an AI about your breakup? Or the chatbot therapy session you didn’t tell anyone else about? A human may have read it — and in some cases, they can see your personal information.
Meta, like many tech giants, uses contract workers to improve its AI by reading real conversations between users and its chatbots. While rating and reviewing the AI’s responses, a common industry training technique, some contractors see personal information that would allow them to identify individual users, Business Insider has learned.
Four contract workers hired through training platforms Outlier (which is owned by Scale AI) and Alignerr told Business Insider that they routinely encountered chats between users and Meta’s AI containing full names, phone numbers, email addresses, gender, hobbies, and other personal details. One estimated that personally identifiable information appeared in more than half of the thousands of chats they typically reviewed each week. Two contractors said they encountered chats in which users sent the chatbot selfies. The users were located around the world, including the US and India.
Some of the personally identifiable information was placed by Meta alongside chat histories to help contractors personalize the AI’s response, according to documents for one of the three outsourced Meta projects reviewed by Business Insider. In other cases, users disclosed their personal data to the AI chatbot via text within the conversation, which Meta’s privacy policy warns users not to do.
Two of the four contractors worked on similar projects for other Big Tech clients, and estimated that the inclusion of unredacted personal data was more common for the Meta projects they worked on.
The project documents, two of which were in use last month, told contractors that the interactions they’re assessing are real conversations between users and the chatbot.
Contractors who spoke to Business Insider described many of the conversations they saw as deeply personal. Conversations with Meta AI took the form of therapy-like sessions, private conversations as though speaking to a friend, or intimate exchanges with a romantic partner, the contractors said.
Users would open up about their fantasies, flirt with the chatbot, or rant about people and problems in their lives, and ask for advice. The contractors said that users sometimes included their contact information, job titles, locations, or details about children in these exchanges.
One contractor who worked on an Alignerr-run project called Omni, which aimed to improve retention and engagement on Meta’s AI Studio, told Business Insider that some users were sharing intimate information with the chatbot, including selfies and explicit photos.
Meta’s AI terms of service states that it “may review” user interactions with its AI, and that this could be “automated” or conducted by humans.
A Meta spokesperson told Business Insider that the company has “strict policies” regarding who can access personal data, both for employees and contractors.
“While we work with contractors to help improve training data quality, we intentionally limit what personal information they see, and we have processes and guardrails in place instructing them how to handle any such information they may encounter,” the spokesperson said.
A Scale AI spokesperson said that “contributors are authorized to process personal data only as required for the project,” must adhere to Outlier’s security standards, and are “instructed to flag any responses containing PII and skip tasks with such content.”
“At no point does the user generated data leave the customer’s platform,” the spokesperson added. “Many projects are conducted on the customer’s labeling platform, which offers additional safeguards by keeping all user generated data contained within their system.”
Alignerr did not respond to a request for comment.
It’s the latest example of security gaps and privacy issues arising as companies recruit armies of humans to improve this nascent and fast-moving technology. Scale AI locked down public Google Docs after Business Insider reported they contained confidential client data for contractors. Some of those documents detailed how xAI and Meta were training their AI models. Training platform Surge also used a public spreadsheet that told gig workers which websites to use while fine-tuning Anthropic’s AI, Business Insider reported.
The stakes are high should personal data shared with chatbots fall into the wrong hands, because it “opens the door to manipulation, fraud, and other misuse,” Miranda Bogen, the director of the AI Governance Lab at the Center for Democracy and Technology, a nonprofit focusing on digital rights and privacy, told Business Insider.
Bogen advises users worried about privacy to “never assume chatbots are private by nature, especially since practices differ so much across companies.”
Bogen added that AI tools like chatbots are “used very differently from other online tools” and that “people may not realize their conversations are still being used to develop the product and may be accessed by humans.”
Many tech companies, including Meta, OpenAI, and Google, state in their privacy policies that they may use conversations between users and their AI to train and fine-tune models.
“Data that people don’t expect humans to see has been an issue for a long time in the industry,” Bogen added. “This is just the next version. The difference is that the context feels more intimate to people engaging with these tools, sometimes to get support that can feel like therapy, or where they’re providing confidential information about a job.”
Business Insider was unable to determine exactly how often contractors encountered personally identifiable information. However, the contractors who spoke to Business Insider said they came across it regularly while working on Meta AI projects.
One contractor who worked on Project Omni said they could complete as many as 5,000 AI training tasks a week. They were told to flag and reject chats containing personally identifiable information. The contractor estimated they encountered personal data, such as user phone numbers or Instagram usernames, in “60% to 70%” of them.
A separate project with Outlier, called PQPE, aimed to make conversations between users and Meta’s AI feel more personalized by referencing things it knows about the user, such as their first name, gender, location, and hobbies. Each user chat log came with a list of facts about the user that the AI is expected to reference. These facts are based on previous conversations between the user and AI, and the user’s “social profile activity,” the project documents said. For this project, contractors were unable to reject chats containing personal information.
Meta’s spokesperson said that in AI personalization projects, “contractors are permitted in the course of their work to access certain personal information,” as long as it aligns with Meta’s privacy policies and AI terms.
In some cases, contractors may encounter personal information that’s required for the task, like location data, if a user asks for help finding a nearby coffee shop, the Meta spokesperson said.
The contractor added that someone could “absolutely” find a user’s real identity if they combined a few of the user descriptions provided in the training tasks.
The Meta spokesperson said that all contractors complete an assessment to ensure they meet its standards for cybersecurity and privacy risk management.
The user data accompanying one sexually explicit chat history was specific enough for Business Insider to find a Facebook profile containing a matching first name, city, gender, and list of hobbies. Business Insider’s search took less than five minutes.
The contractor assigned to that chat, which Business Insider has seen but chosen not to describe, said that its content was so disturbing that they had to end their tasking for the day.
“It’s rough. I’ve had to put it down for the night,” they said.
Tech companies like Meta are racing to create personalized AI, which means handling increasing amounts of personal and sensitive user data.
Meta CEO Mark Zuckerberg recently outlined his vision for “personal superintelligence” — AI that he said “knows us deeply, understands our goals, and can help us achieve them.”
Today Mark shared Meta’s vision for the future of personal superintelligence for everyone.
Read his full letter here: https://t.co/2p68g36KMj pic.twitter.com/Hpzf77jAiG
In separate recent incidents, users appear to have not realized they were making their chats with AI public. Business Insider reported in June that users were unwittingly sharing personal information, including medical questions, career advice, and relationship issues, in conversations within Meta’s AI to a public feed on Meta’s AI app. Some contained identifying information like phone numbers, email addresses, or full names.
In response, Meta introduced a new warning in the Meta AI app.
The app still allows users to share their chats, which means they can be indexed and show up in Google searches, Business Insider reported last week. That same week, OpenAI removed a feature that meant shared ChatGPT conversations were being indexed by Google.
Sara Marcucci, the founder of AI + Planetary Justice Alliance, a global collective of researchers and activists, said Business Insider’s reporting on contractors seeing personal information “suggests that data minimization, redaction, and user control remain uneven and poorly enforced across the industry.”
Bogen said that while automated filters can detect and remove personally identifiable information, they can’t catch everything, and that’s where human reviewers can step in.
“Just because there is a process for humans to mark and redact it doesn’t mean that nothing is already happening, but it does indicate that whatever system is in place — if there is one — is imperfect, and it’s recognized to be imperfect,” Bogen added.

Jump to

source

Jesse
https://playwithchatgtp.com