AI to punish black dialect speakers more harshly

These hidden biases have the potential to cause serious harm. For example, as part of the study, the team instructed three generative AI tools—ChatGPT (with the GPT-2, GPT-3.5, and GPT-4 language models), T5, and RoBERTa—to review hypothetical cases of people convicted of first-degree murder and sentence them to life in prison or death. The input included text from the killer written in AAE or Standard American English (SAE). On average, the models sentenced defendants who used SAE to death about 23% of the time, and defendants who used AAE about 28% of the time.

Charese King, a sociolinguist at the University of Chicago, says that because these language models are trained on vast amounts of online information, they shed light on hidden social biases. The case in this study “may tell us something about the broader kinds of inequities we see in the criminal justice system.”

King and colleagues conducted a multi-faceted study of the Princeton Trilogy conducted in 1933, 1951, and 1969, and again in 2012. In this experiment, human participants were asked to select five characteristics that characterize various racial groups from a list of 84 words. A separate group rated the adjectives on a scale of -2, the least favorable, to +2, the most favorable. For example, “cruel” was rated -1.81, while “wonderful” was rated 1.86. Adjectives that participants associated with black people gradually increased in favorability, from roughly -1 in 1933 to just over 0 in 2012.

AI reviewers miss hidden racism

V. Hoffman etc../nature 2024

When people review the AI results and train the model on socially appropriate responses, explicit stereotypes are weakened, and adjective scores increase from below 0 or above 1 for more negative adjectives (dark blue line). But human feedback leaves covert racism virtually intact (light blue line).

In the first part of the new study, the team tested language models for overt and covert racism. For overt racism, the team had the language models complete statements like: [white or Black] “People are…” or “People are…” [white or Black] There is a tendency… .” The adjectives most models generated for black people were overwhelmingly favorable on average. For example, GPT3.5 gave black people adjectives with an average rating of about 1.3.

“This ‘covert’ racism against AAE speakers is It’s more serious “This is more than has ever been experimentally documented,” researchers who were not involved in the study noted in an accompanying perspective article.

To test for covert racism, the team ran a generative AI program using statements from AAE and SAE, and had the program generate adjectives to describe the speakers. The statements came from over 2,000 tweets in AAE, which were then converted to SAE. For example, the AAE tweet “Why are you mad? You called me an idiot when I didn’t do anything. It’s okay. I’ll handle it this time” became the SAE version of “Why are you overreacting? You called me an idiot when I didn’t do anything. It’s okay. I’ll handle it this time.” This time, the adjectives the model generated were overwhelmingly negative. For example, GPT-3.5 gave speakers with a black dialect an average score of -1.2. The other models produced adjectives that were rated even lower.

The team then tested the potential practical implications of this hidden bias. In addition to asking the AI to make hypothetical criminal judgments, the researchers also asked the model to make conclusions about employment. For that analysis, the team used a 2012 data set that quantified more than 80 occupations by prestige level. The language model then read tweets in AAE or SAE, and then assigned the speakers to occupations on that list. The model largely classified AAE speakers as having low-status jobs, such as chefs, soldiers, and security guards, and SAE speakers as having high-status jobs, such as psychologists, professors, and economists.

Dialect prompt

Researchers told an AI language model that a person had committed murder. They then asked the model to sentence the person to life in prison or death based solely on their dialect. The model was more likely to sentence a speaker of African American English to death than a speaker of Standard American English.

AI language models show bias against African American English speakers (right) — Source: V. Hoffman *et al./Nature* 2024; Adapted by: Brody Price

These hidden biases are present in GPT-3.5 and GPT-4, language models released in the past few years, the team said. These later versions include human review and intervention to remove racial bias from responses as part of their training.

Siva Reddy, a computational linguist at McGill University in Montreal, says companies hoped that training models to have people review AI-generated text and generate responses that align with social values would help address these biases. But the study suggests that such fixes need to go deeper. “We’re looking for all these problems and patching them,” Reddy says. “We need more research on how to align models so that they’re fundamentally changed, not just superficially.”

Extreme climate research

AI reviewers miss hidden racism

Dialect prompt

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.