The actual scientific article is open-access: https://www.nature.com/articles/s41586-024-07856-5
This seems to be based on a racist assumption. Why is speaking improper English labelled as “African American english”?. I would want to see the LLM assumptions also for southern drawl and for general incorrectly spelled / grammared speech, to compare to the assumptions made for the African American english version.
Speaking with slang / incorrect grammar is of course, in general, inversely correlated with education level and/or preference for shorthand forms of speech over writing/speaking the full grammatically correct form. The LLM is saying speaking in slang = stupid/lazy.
The researcher is labelling slang as specifically African American speak, therefore interpreting the LLM response as assuming African Americans are stupid/lazy.
This [the article?] seems to be based on a racist assumption.
No, it isn’t based on an assumption. The written features that were analysed are associated with AAE. From the article:
- use of invariant ‘be’ for habitual aspect;
- use of ‘finna’ as a marker of the immediate future;
- use of (unstressed) ‘been’ for SAE [standard American English] ‘has been’ or ‘have been’ (present perfects);
- absence of the copula ‘is’ and ‘are’ for present-tense verbs;
- use of ‘ain’t’ as a general preverbal negator;
- orthographic realization of word-final ‘ing’ as ‘in’;
- use of invariant ‘stay’ for intensified habitual aspect; and
- absence of inflection in the third-person singular present tense.
Why is speaking improper English labelled as “African American english”?.
Flip the question - why are those features associated with AAE labelled “improper English”?
I would want to see the LLM assumptions also for southern drawl and for general incorrectly spelled / grammared speech
The article tackles this: “Furthermore, we present experiments involving texts in other dialects (such as Appalachian English) as well as noisy texts, showing that these stereotypes cannot be adequately explained as either a general dismissive attitude towards text written in a dialect or as a general dismissive attitude towards deviations from SAE”
Did they test jive?
No, only grammar.
Really good reply, thanks for the effort you put in. Its good to see they did compare with other dialects. It’s interesting that the same bias was not seen.
I would still disagree with the statement that AAE could be considered equally proper to textbook, grammatically correct according to the Oxford English dictionary (or the American equivalent). A dialect by definition is an adaptation of the language from the standard ‘proper’ grammatical rules.
Sorry beforehand for the wall of text.
I would still disagree with the statement that AAE could be considered equally proper to textbook, grammatically correct according to the Oxford English dictionary (or the American equivalent).
The reason why AAE is considered less acceptable than SAE (Standard American English) is not “within” the AAE varieties. It’s solely social factors - people point to “he is working” and say “this is right”, then they point at “he working” and say “this is wrong”.
Dictionaries are only part of that. We (people in general) assign authoritativeness to them to dictate what’s the standard is supposed to be, but that authority is not intrinsic either. For example if people mass decided to ditch the Oxford English dictionary, suddenly it stops being a reference to what’s “correct” vs. “wrong” English.
A dialect by definition is an adaptation of the language from the standard ‘proper’ grammatical rules.
Emphasis mine. That’s incorrect.
There are multiple definitions of dialect. Plenty focus on mutual intelligibility - if speakers of two varieties can communicate just fine, their varieties are a dialect of the same language, independently of what you consider standard.
The nearest of what you’re saying would be the ones referring to the standard as an asbau variety, with the dialects being the varieties “roofed” by that standard, but not undergoing the same process by themselves.
However, not even in the later the dialect needs to be “an adaptation” of the standard. Sometimes both originated independently from the same source, like French (standard) and Norman (dialect), both from Late Latin; sometimes the standard itself is an “adaptation” of a dialect, like Standard Italian (basically a spin-off of the Tuscan dialect). And sometimes the standard was formed from multiple dialects, like Standard German did.
Focusing on AAE, it’s disputed where it comes from, but it’s certainly not from SAE. Some claim that it’s a divergent form of Dixie English, some claim that it’s a decreolised creole, but in neither case the origin is SAE, they simply developed side-to-side.
I don’t know or hang around with many black people, but I do hear all of the stuff pointed out here on the regular any time I see a group of rednecks at the local farm supply.
Plus, internet meme culture has vastly changed the language landscape where, for example, phrases like “you don’t think it be like it is, but it do” are used by people from all walks of life.
A lot of AAE features are actually shared with Dixie English as spoken by non-black people. So I’m not surprised that you hear “rednecks” using a few of them.
The association between those features and African-American speakers is still there, though. If you see someone on the internet saying stuff like “I be working”, the typical person won’t picture a redneck, they’re going to picture a black person, you know?
The internet does seem to have changed the language landscape a fair bit, but I think that those features slowly leaking into the speech of non-AAE speakers is more about social changes than just tech.
Why is speaking improper English labelled as “African American english”?.
Oh no, you’re in the picture. It’s a real dialect, just as valid as what they speak on the BBC, which I’m guessing is itself different from how you speak.
To be clear, I don’t think you meant to be unkind here. I’m not trying to make you feel bad.
LLMs are racist… Pay us 59.99 in 3 easy payments to find out how! I love paywalled articles.
And don’t worry, the people that did the research and wrote the article, and the person that reviewed the article aren’t going to see a single cent of it.
Although nonstandard English and pidgins often demonstrate the same level of nuance and complexity as standard English, it’s very common for there to be negative stereotypes. One has to wonder whether the LLMs generated from (stolen en masse) written output say as much about us as they do about their creators.
Pretty much, it was trained on human writing, then people are all surprised when it has human biases.
An LLM needs to evaluate and modify the preliminary output before actually sending it. In the context of a human mind that’s called thinking before opening your mouth.
Who among us couldn’t benefit from a little more of that?
Humans aren’t always very good at that, and LLMs were trained on stuff written by humans, so here we are.
Exciting new product from the tech industry: Fruit from the poisoned tree!
References weren’t paywalled, so I assume this is the paper in question:
Hofmann, V., Kalluri, P.R., Jurafsky, D. et al. AI generates covertly racist decisions about people based on their dialect. Nature (2024).
Abstract
Hundreds of millions of people now interact with language models, with uses ranging from help with writing1,2 to informing hiring decisions3. However, these language models are known to perpetuate systematic racial prejudices, making their judgements biased in problematic ways about groups such as African Americans4,5,6,7. Although previous research has focused on overt racism in language models, social scientists have argued that racism with a more subtle character has developed over time, particularly in the United States after the civil rights movement8,9. It is unknown whether this covert racism manifests in language models. Here, we demonstrate that language models embody covert racism in the form of dialect prejudice, exhibiting raciolinguistic stereotypes about speakers of African American English (AAE) that are more negative than any human stereotypes about African Americans ever experimentally recorded. By contrast, the language models’ overt stereotypes about African Americans are more positive. Dialect prejudice has the potential for harmful consequences: language models are more likely to suggest that speakers of AAE be assigned less-prestigious jobs, be convicted of crimes and be sentenced to death. Finally, we show that current practices of alleviating racial bias in language models, such as human preference alignment, exacerbate the discrepancy between covert and overt stereotypes, by superficially obscuring the racism that language models maintain on a deeper level. Our findings have far-reaching implications for the fair and safe use of language technology.
Thanks, and yes, you’re correct