The Hidden Workforce That Helped Filter Violence and Abuse Out of … – The Wall Street Journal

ChatGPT is one of the most successful tech products ever launched. And crucial to that success is a group of largely unknown data workers in Kenya. By reviewing disturbing, grotesque content, often for wages of just two to three dollars an hour, they helped make the viral chatbot safe. WSJ’s Karen Hao traveled to Kenya to meet those workers and hear about what the job cost them. 

Further Reading:
What Is ChatGPT? What to Know About the AI Chatbot 
The Contradictions of Sam Altman, AI Crusader 

Further Listening:
The Company Behind ChatGPT 

This transcript was prepared by a transcription service. This version may not be in its final form and may be updated.
Kate: Hey, it's Kate. Today our producer Annie Minoff is going to bring you a story. It's about a little known part of the AI workforce, the people whose job it was to help filter out references to violence and sexual abuse on what would become ChatGPT. Here's Annie.
Annie Minoff: My colleague Karen Hao covers artificial intelligence. And earlier this year she found herself doing an interview in a kind of unusual place.
Karen Hao: We're currently walking through, what is this? A field? Is this vegetables that people are growing?
Annie Minoff: Karen was in a vegetable patch on the outskirts of Nairobi, Kenya.
Karen Hao: So I was there to meet this worker named Alex, and we had originally planned to meet in one of his friend's apartments, but there was construction work happening, so we were looking for another place to record.
Annie Minoff: That's why they ended up in the veggie patch.
Karen Hao: Do you want to describe more what you're seeing?
Alex Kairu: Yeah, I'm seeing a lot of houses, some grasses, some people in our right side, watching us. So, yeah, it is a perfect scenario to get this podcast going.
Annie Minoff: Karen wanted to talk to Alex Cairo because Alex helped make possible one of the most viral tech products of all time, ChatGPT, the AI chatbot created by the company OpenAI. When you use ChatGPT and it doesn't spit out hate speech or extreme violent or pornographic content, it's partly thanks to Alex and his colleagues in Kenya.
Karen Hao: What their contribution was was basically making ChatGPT safe for tens of millions of users. They went through and reviewed really toxic grotesque content day in and day out to make sure that no one else would ever have to see it.
Annie Minoff: Now, Alex and his coworkers are ready to talk about what they say that job has cost them.
Bill Mulinya: It was very graphical. You can't start reading the text and ignore thinking of what is happening.
Alex Kairu: I had nightmares. I feared people. I feared working in the dark.
Mophat Okinyi: I'm very proud that I participated in that project now to help keep ChatGPT safe. But now the question I always ask myself, "Was my input worth what I received in return?"
Annie Minoff: Welcome to The Journal, our show about money, business and power. I'm Annie Minoff. It's Tuesday, July 11th. Coming up on the show, Kenyan data workers on the price they paid for safe AI. OpenAI is not the first company to release an AI chatbot. But for many of the bots that came before ChatGPT, there was a pretty consistent problem. The chatbots did not always behave.
Karen Hao: There's been this really long history of chatbots going off the rails really quickly after launch. It's pretty much become expected behavior at this point. In 2016, there was Microsoft's Tay, which just started spewing toxic remarks days after the launch.
Speaker 9: In less than 24 hours, Tay went from saying things like, "Can I just say that I'm stoked to meet you?" To saying things like, "I (censored) hate feminists and they should all die and burn in hell."
Karen Hao: There was a South Korean chatbot, Lee Luda, in 2021. Again, hate speech towards the LGBTQ community. And then most recently in 2022, there was Meta's BlenderBot 3, which same thing, just a few days after launch started saying these really racist things.
Annie Minoff: BlenderBot also had some choice words about Meta boss Mark Zuckerberg.
Speaker 10: BlenderBot called Zuckerberg, "Too creepy and manipulative." And in another it said, "I don't like him very much. He's a bad person."
Annie Minoff: For tech companies, a botched launch like this can be a disaster. Microsoft, for example, announced it was taking its chatbot Tay offline just days after its debut. OpenAI was aware of the risk. In fact, the company had been thinking about it before ChatGPT, back when it was developing earlier iterations of its technology. And the fix that OpenAI came up with required an extra bit of engineering. It required building a filter.
Karen Hao: If you imagine your own brain, we always use, metaphorically, a filter to make sure that you're socially acceptable. It's basically the same thing, but the AI version. There has to be a final check on the output before the AI model generates what it's going to generate.
Annie Minoff: Okay, so OpenAI needed that filter. They needed that filter built.
Karen Hao: Yeah, OpenAI needed the content moderation filter built. And you cannot do that without humans.
Annie Minoff: Among the humans who would help build OpenAI's filter, were about 50 data workers in Kenya. And why Kenya?
Karen Hao: First of all, Kenya is a low income country and it has a very high unemployment rate. Wages are really low, which is very attractive to tech companies that are trying to increase their profit margins. And it's also a highly educated workforce that speaks English because of colonization and there's good wifi infrastructure.
Annie Minoff: One of those Kenyan workers was Alex Cairo, who Karen met in that vegetable patch.
Karen Hao: Can you introduce yourself?
Alex Kairu: Yeah, yeah, sure. My name is Alex. I was born in Nairobi in Kilimani Estate.
Annie Minoff: Alex is 28. He lives with his wife and his brother on the outskirts of Nairobi. And when he started working on OpenAI's safety filter, he wasn't working for OpenAI directly. He was working for another American company called Sama. Sama is an outsourcing company. Its workers in Kenya have done projects for a bunch of big US tech firms, including removing offensive posts from social media. Alex says he was excited to join.
Alex Kairu: I just applied for the job, as people do, so I was hired as a culture analyst in May 2021. When I came into Sama, I was promised that this is the future company for me. They promised me skills, some training, so career growth, education. I knew I will grow with this company.
Annie Minoff: Also at Sama was Bill Mullina.
Bill Mulinya: I saw an advertisement on LinkedIn. They were looking for a team leader.
Annie Minoff: Bill was a level above Alex. He led a team of a few dozen people at Sama. And at first he says they weren't working on OpenAI's filter. He told Karen they were doing another kind of AI work.
Bill Mulinya: They first started with data annotation and image labeling.
Karen Hao: What kinds of data annotation were you working on?
Bill Mulinya: We were labeling images. For example, you're given an image, it has traffic signs, cars, roads, trees, skies. So our work was to make sure we label everything on the image.
Annie Minoff: Data annotation basically means labeling images or text passages so that AI systems can learn from them. For example, labeling thousands of pictures of street scenes so that an AI system can learn what a stop sign or a tree looks like. But Bill's team wouldn't be labeling images for long, because in November of 2021 the job changed. Sama had a new client, OpenAI.
Karen Hao: OpenAI had basically tens of thousands of text passages that they needed labeled. So they would deliver these on a regular basis to Sama and workers would read each text passage one by one and then assign a label to it.
Annie Minoff: OpenAI wanted a system where if you asked the AI to write something awful, like a description of a child being abused or a method for ending your own life, the system would refuse to write that. It would filter out those bad responses before they got to you. But to do that, the AI has to know what child abuse and suicide are. Humans have to teach it. And that was the Sama worker's job, to read descriptions of extreme violence, rape, suicide, and to categorize those texts for the AI. Here's Bill, the team leader.
Bill Mulinya: Their main work was to read the text and then label the data accordingly. For example, if you read a text that is about sexual content, there was a subcategory to determine whether it's incest, and those kind of categories.
Annie Minoff: Bill and Alex weren't given much information about the project. At first, they didn't even know they were working for OpenAI. They also didn't know where these texts were coming from. But according to an OpenAI research paper, they came from a few sources. Some were written by humans sourced from the darkest corners of the internet. Others were generated by AI systems themselves. OpenAI researchers would review the texts and send them on to Sama for labeling. Here's Alex.
Alex Kairu: My experience on those four months was the worst experience I've ever had working in a company because the content which I did was the worst content you can ever read. Say someone is stabbing himself, someone is committing suicide, you're reading something like that. So every situation was very disturbing in the content we were reading.
Annie Minoff: Alex was part of the team at Sama that labeled the violent content from OpenAI. Another worker who Karen talked to, Mofat Okini, was on the sexual content team.
Mophat Okinyi: We will read about text of maybe a child having sexual intercourse with their father or maybe a mother or maybe all of them. A child having a sexual intercourse with an animal. We also had kids trying sexual advances with each other, so we also had rape. Yeah, such like things, but they were very graphical, though they were in the form of text. But if you are reading the text, it becomes very graphic in your mind.
Annie Minoff: At first, the passages coming in from OpenAI were short, no more than two sentences. But over time they got longer, as long as five or six paragraphs. Workers might read hundreds of these passages a day. People on the team were paid from around a $1.50 to $3.75 an hour. OpenAI paid Sama an hourly service fee of $12.50 for the moderation work. An OpenAI spokesman said that the company wasn't aware that the workers reviewing the texts were only getting a small fraction of that. A Sama spokeswoman said that that $12.50 fee also covered other things, like infrastructure expenses, and that content moderators were paid according to a recognized formula for determining a living wage. Alex told Karen that the money wasn't nearly enough to compensate for the psychological toll that the work began to take.
Karen Hao: So when you would go home at night, what would you think about after eight hours of reading all of that stuff?
Alex Kairu: Oh, my mental state was very bad. I had nightmares. I feared people. Maybe I see too many people coming. I see violence. If I see someone holding up fork or a razor blade, I see people cutting himself or something like that at night. I will dream. I will have nightmares. Even I will tell my brother, "Okay, just come here, sleep with me for five hours before I go to sleep because I need someone to talk to before I go to sleep. Because if I go to sleep, I'll start screaming or something like that. So many things are going a lot in my mind." Yeah, yeah.
Annie Minoff: Alex says he'd always been outgoing and social, but as the project ground on, he drew inward. He didn't want to be around people. For Mofat, the worker who was on the sexual content team. The impact from doing this work was even greater. Mofat is a soft-spoken 28 year old, and Karen says when he started working on the open AI project, things had been going pretty well for him.
Karen Hao: He had actually just met a woman who actually lived next door to him and he was living in this neighborhood called Pipeline, and he just immediately fell in love with her. They just had a whirlwind romance and got married in a few months. She already had a daughter and he very much doted on this girl and called her his baby girl. And to this day still says daughter instead of stepdaughter, in describing her.
Annie Minoff: When Mofat started working on open AI safety filter, he didn't tell his wife about it. The texts he was reading were so grotesque. He says he didn't want to scare her, but he couldn't hide the effect that the work was having on him.
Mophat Okinyi: I had a baby girl. It reached the point that I didn't want to get so much close to my baby girl because of the text I used read. Now if you see that child, you reflect what you read in the text. Yeah, the good time you had with your wife, it's taken away. So you remain like someone who doesn't feel anymore for their partner.
Annie Minoff: Moat became increasingly distant from his wife who grew increasingly frustrated. He knew he was struggling and he wanted help. Specifically, he wanted psychological counseling. Sama did provide wellness sessions with counselors, but Mofat told Karen that the sessions were inadequate.
Mophat Okinyi: If you even go for a session, what they ask your basic questions like, "How was your day?" You spent the entire eight hours working on these text the entire day. And when you go for 30 minutes or an hour canceling session, someone asks you how your day was or maybe what are your future plans? Those are basic questions that doesn't help. So you also need professionals.
Annie Minoff: A Sama spokeswoman said that the company's leadership was unaware of the psychological impact that the project was having on workers, and that apart from counseling, the company also offered workers access to prayer and meditation rooms. In Mofat's case, the counseling didn't work. His isolation only got worse until his relationship with his wife reached a breaking point.
Karen Hao: His wife texts him and says, "Can you bring some fish home for dinner?" And he bought three pieces of fish for him, his wife and the daughter. And when he got home, they were gone and all their stuff was gone.
Annie Minoff: Wow.
Karen Hao: And he asked, "What's going on?"
Mophat Okinyi: And then I asked her, "Why will you not come back and why have you left?" And then she said, "You've changed. You've changed. I don't see you. You are not the man I married, things have changed. I don't understand you anymore. You don't love my kid. (inaudible)"
Annie Minoff: Mofat told Karen that he doesn't expect her to come back and she declined our request for comment. Mofat, Alex and Bill worked on open AI's filter project for five months. All the while they weren't even sure what they were helping to build or what its significance would be, but they and the world were about to find out because ChatGPT was coming. That's after the break. ChatGPT went live in November of last year.
Karen Hao: ChatGPT took over the world. You were just seeing people trying and testing it for all kinds of things. And as the days wore on, the things that they tried got more and more sophisticated.
Annie Minoff: People asked ChatGPT to come up with recipes based on whatever they happen to have in the fridge.
Speaker 11: Jesus, that's actually good.
Annie Minoff: They asked it to help pick their next vacation destination.
Speaker 12: ChatGPT gave me five great ideas, and I went with Palm Springs, California.
Annie Minoff: They asked it all kinds of things.
Speaker 13: I asked Chad GT to write me a song called Sexy Bus just to see how it would go.
Speaker 14: Then I saw it shining bright (inaudible) Shiny silver bus, built it so damn cold. It's sexy.
Karen Hao: It went from, "Oh, let me write a poem." And, "Holy crap. It's so good at writing that poem" to, "Oh, let me try coding an entire website with this thing. Holy crap. It can also do that too." And there was such a profound shifting of the earth beneath us in a way of this has never been possible before, and we suddenly have this technology that is unlocking a completely different universe of potential.
Annie Minoff: Another way in which ChatGPT was a big leap forward. It was generally pretty safe.
Karen Hao: One of the reasons why ChatGPT was able to become so virally popular and continued to sustain popularity is because it is largely not spewing really awful things. People feel comfortable using the product knowing that it's not going to do that.
Annie Minoff: At least it won't do it in English. If Alex wanted to use ChatGPT in his native language, Swahili, would he be able to do that?
Karen Hao: You can interact with ChatGPT in Swahili, but ChatGPT was developed primarily to work in English. So a lot of the scrubbing, the content moderation, the important safety measures within the chatbot were done in English. So when you prompt it in Swahili, you'll get more misinformation. You'll get more confusing sentences that don't make sense, and you will potentially get more of this content that they worked so hard to filter out because they were only filtering it in English.
Annie Minoff: Wow. I hadn't even thought of that. That the kind of filter that you would build to detect hate speech and violence and sexual assault in English, it would not work as well necessarily in Swahili.
Karen Hao: Exactly.
Annie Minoff: By the time ChatGPT took over the world, the Sama moderators in Kenya had already been off the filter project for eight months. In fact, their work had wrapped early and abruptly. Sama's contract with OpenAI was supposed to last for a year, but a Sama spokeswoman said that Sama canceled it after just five months because of a dispute with OpenAI over a related project. After the filter project ended, Alex Bill and Mofat went on to other work at Sama before ultimately leaving the company altogether at the end of last year. Bill and Mofat still do data work at other companies. They say their new jobs don't involve reviewing toxic content. Alex is currently unemployed, but while they no longer work at Sama, they say they continue to struggle with what happened during those months on the filter project, and now they're trying to change things for all the people who continue to do this work. Bill and Mofat say they've started organizing.
Karen Hao: How did you come up with the idea of a union?
Bill Mulinya: We came up with the idea because we noticed that it was not just Sama (inaudible), it was the whole country going through the same kinds of experiences. We met people at other companies that they're doing content moderation work, and we realized the experiences are just so like the same. So we decided instead of one person fighting his or our battles alone, we join as a team and then we form a union.
Mophat Okinyi: We just want to ensure that everyone who's doing content moderation right now or in future, they have better working conditions. They have better pay, their rights are respected. So we are fighting for the entire generation to come and us as well.
Annie Minoff: So far, their union includes over 150 data workers at multiple companies in Kenya. Bill, Mofat and Alex are also pushing for legislative change today with their Kenyan lawyer and the backing of a UK nonprofit called Foxglove. They filed a petition with the Kenyan parliament. In the petition, they urged the government to regulate the AI industry and bulk up worker protections. It's now up to the parliament to decide whether to take up those suggestions. As for OpenAI, the company said in the statement that data annotation is challenging work that should be done humanely and willingly. It said workers' efforts to ensure the safety of AI systems has been quote, "Immensely valuable." A Sama Spokeswoman said that the company supports its workers in every way possible and that it no longer takes content moderation projects. She said that the work had never been a core part of the company's business and that it had made a strategic decision to exit it entirely. When ChatGPT launched, there was a lot of excitement about what AI might achieve. There was also a lot of conversation about the jobs it might replace. But what flew under the radar is the work that AI is already creating all over the world. AI workers are reviewing all kinds of content. They're helping make AI more accurate, more helpful, and safer. The stuff that they're reviewing is often benign. They're labeling pictures of traffic signs or trees, but sometimes it isn't. They're labeling hate speech, violence and extreme sexual abuse. And Karen says that that part of the work isn't going away anytime soon.
Karen Hao: This is work that is going to continue to grow in terms of its demand. This is something that researchers who build these chatbot systems described to me as persistent necessary work because the more that these systems are put in the hands of more users, the more creative they get and the more abuses, basically, that these companies have to account for. So this is like an iterative process where the safety filter has to continuously be updated over time and every time it's updated that means more work to be done in the vein of what Alex, Mofat and Bill did.
Annie Minoff: That's all for today, Tuesday, July 11th. The Journal is a co-production of Gimlet and the Wall Street Journal. Additional reporting in this episode by (inaudible) Ramen. Thanks for listening. See you tomorrow.
Kate Linebaugh is the co-host of The Journal. She has worked at The Wall Street Journal for 15 years, most recently as the deputy U.S. news coverage chief. Kate started at the Journal in Hong Kong, stopping in Detroit and coming to New York in 2011. As a reporter, she covered everything from post-9/11 Afghanistan to the 2004 Asian tsunami, from Toyota’s sudden acceleration recall to General Electric. She holds a bachelor degree from the University of Michigan in Ann Arbor and went back to campus in 2007 for a Knight-Wallace fellowship.
Ryan Knutson is the co-host of The Journal, The Wall Street Journal’s flagship daily podcast.
He has worked at the Journal since 2013. Before joining the podcast, Ryan covered the wireless industry, and was responsible for a string of scoops including Verizon’s $130 billion buyout of Vodafone’s stake in their joint venture, Sprint and T-Mobile’s never ending courtship, and a hack of the 911 emergency system that spread virally on Twitter. He also spent a year on the business side of Dow Jones, helping lead the company’s strategic relations with tech companies like Apple and Google. Before WSJ, he reported for ProPublica, PBS Frontline and OPB, the NPR affiliate station in Portland, Ore. He grew up in Aloha, Ore. and graduated from the University of Oregon.

source

Jesse
https://playwithchatgtp.com