According to Time, ChatGPT was hailed as one of 2022’s most impressive technological innovations upon its release last November. The powerful artificial intelligence (AI) chatbot can generate text on almost any topic or theme, from a Shakespearean sonnet reimagined in the style of Megan Thee Stallion, to complex mathematical theorems described in language a 5 year old can understand. Within a week, it had more than a million users.
ChatGPT’s creator, OpenAI, is now reportedly in talks with investors to raise funds at a $29 billion valuation, including a potential $10 billion investment by Microsoft. That would make OpenAI, which was founded in San Francisco in 2015 with the aim of building superintelligent machines, one of the world’s most valuable AI companies.
But the success story is not one of Silicon Valley genius alone. In its quest to make ChatGPT less toxic, OpenAI used outsourced Kenyan laborers earning less than $2 per hour, a TIME investigation has found.
The work was vital for OpenAI. ChatGPT’s predecessor, GPT-3, had already shown an impressive ability to string sentences together. But it was a difficult sell, as the app was also prone to blurting out violent, sexist and racist remarks. This is because the AI had been trained on hundreds of billions of words scraped from the internet—a vast repository of human language. That huge training dataset was the reason for GPT-3’s impressive linguistic capabilities, but was also perhaps its biggest curse. Since parts of the internet are replete with toxicity and bias, there was no easy way of purging those sections of the training data. Even a team of hundreds of humans would have taken decades to trawl through the enormous dataset manually. It was only by building an additional AI-powered safety mechanism that OpenAI would be able to rein in that harm, producing a chatbot suitable for everyday use.
To build that safety system, OpenAI took a leaf out of the playbook of social media companies like Facebook, who had already shown it was possible to build AIs that could detect toxic language like hate speech to help remove it from their platforms. The premise was simple: feed an AI with labeled examples of violence, hate speech, and sexual abuse, and that tool could learn to detect those forms of toxicity in the wild. That detector would be built into ChatGPT to check whether it was echoing the toxicity of its training data, and filter it out before it ever reached the user. It could also help scrub toxic text from the training datasets of future AI models.
To get those labels, OpenAI sent tens of thousands of snippets of text to an outsourcing firm in Kenya, beginning in November 2021. Much of that text appeared to have been pulled from the darkest recesses of the internet. Some of it described situations in graphic detail like child sexual abuse, bestiality, murder, suicide, torture, self harm, and incest.
OpenAI’s outsourcing partner in Kenya was Sama, a San Francisco-based firm that employs workers in Kenya, Uganda and India to label data for Silicon Valley clients like Google, Meta and Microsoft. Sama markets itself as an “ethical AI” company and claims to have helped lift more than 50,000 people out of poverty.
The data labelers employed by Sama on behalf of OpenAI were paid a take-home wage of between around $1.32 and $2 per hour depending on seniority and performance. For this story, TIME reviewed hundreds of pages of internal Sama and OpenAI documents, including workers’ payslips, and interviewed four Sama employees who worked on the project. All the employees spoke on condition of anonymity out of concern for their livelihoods.
The story of the workers who made ChatGPT possible offers a glimpse into the conditions in this little-known part of the AI industry, which nevertheless plays an essential role in the effort to make AI systems safe for public consumption. “Despite the foundational role played by these data enrichment professionals, a growing body of research reveals the precarious working conditions these workers face,” says the Partnership on AI, a coalition of AI organizations to which OpenAI belongs. “This may be the result of efforts to hide AI’s dependence on this large labor force when celebrating the efficiency gains of technology. Out of sight is also out of mind.” (OpenAI does not disclose the names of the outsourcers it partners with, and it is not clear whether OpenAI worked with other data labeling firms in addition to Sama on this project.)