ChatGPT will replace writers who shouldn’t be.
ChatGPT Isn’t “Woke”
A skeptical observer goes looking for AI bias.
Almost immediately after OpenAI’s ChatGPT thundered onto the scene late last year, commentators on the Right started blasting the world’s most famous AI as “woke.” But what does that really mean? Is it true? And if it is, why?
To be clear on where I’m coming from, I certainly do not dismiss complaints of left-wing bias out of hand. I am by no means hostile to conservatism. But neither would I consider myself a member of the Right like the average American Mind reader—I’m more of a friendly critic than a fellow traveler. In that spirit, let me offer my best effort at an informed assessment of what’s actually going on here, with as much empirical grounding and as little emotional investment as possible.
To this end, I have to push us away from the W-word. In its journey from appreciative African-American slang to verbal tic of right-leaning culture warriors, “woke” has become a term too subjective to be illuminating. But the common thread among the critics using it was a belief that the chatbot’s outputs skew more to the political left than to the right.
At first, this was almost uncontroversially true. Researchers across the spectrum demonstrated that ChatGPT tended to answer in ways broadly consistent with “establishment liberals” or a “pro-environmental, left-libertarian ideology.” When asked to write an admiring poem about Joe Biden, it hailed him as a “champion of the working class.” But it refused a similar request about Donald Trump.
i cannot believe this is actually real pic.twitter.com/zo9pl0bXjU
— delian (@zebulgar) January 31, 2023
These behaviors became increasingly evident as millions of people got their hands on the AI, and within a month, OpenAI started modifying ChatGPT in an effort to mitigate them. In February, OpenAI CEO Sam Altman publicly acknowledged “shortcomings around bias” and the company affirmed its commitment to political neutrality.
Now the AI will happily wax lyrical about Trump. It will even grudgingly list reasons why climate change might be a hoax. When asked open-ended questions about political issues, it hedges almost to a fault—insisting that it has no opinions of its own and scrupulously presenting both sides of almost any controversial topic.
Yet when users force ChatGPT to pick a side—for example, by demanding that it give binary answers about whether the facts support a given assertion—the AI still sometimes betrays a preference for Democratic positions over Republican ones on policies like single-payer healthcare and gun control.
That leads to the much more interesting and controversial question: why?
The Empty Helm
Many on the Right seem to envision purple-haired genderfluid software engineers at Big Tech companies typing away on #resistance-decaled laptops, coding pure Communism into the model. Or else maybe Critical Race Theory-obsessed programmers drill their own ideology into the AI by force-feeding it biased data—making it read Ibram X. Kendi six billion times in a high-tech equivalent of the eyes-pried-open scene from A Clockwork Orange. But the truth is far stranger.
In the previous paradigm that dominated artificial intelligence from the 1950s to the 2010s, AI was software that needed humans to deliberately design all its capabilities. Its smarts came from algorithms that had to be programmed in line-by-line by flesh-and-blood engineers. Under that paradigm, if AI had a capability, it was because humans deliberately put it there, and if it didn’t it was because humans hadn’t successfully done so.
On the other hand, models like ChatGPT are created by a strange sort of mathematical alchemy—dumping incomprehensibly vast amounts of data into a 12,288-dimensional (or larger) computational space and doing on the order of a septillion operations worth of matrix multiplication to discover latent statistical patterns.
Our human brains aren’t wired to comprehend such gigantic figures, so I hope you’ll indulge my recapitulating the size of that number for effect: a thousand thousand thousand thousand thousand thousand thousand thousand, or a million million million million, or a trillion trillion. This is like what we normally associate with “programming” in the sense that the Chicxulub meteor impact that killed the dinosaurs is like the amount of energy in a Snickers bar. Literally.
This paradigm shift has a startling implication. At the end of this pattern-finding training process, an LLM’s creators don’t know what all it can do. They have to find out experimentally. They have neither the level of knowledge nor control that hand-coded AI allowed.
But how does merely finding statistical patterns give rise to “capabilities” at all? The whole concept revolves around prediction. Humans give the system a string of text as an input prompt, and the AI attempts to supply the text that would be most likely to complete that string. For example, if you train an AI on Christian prayers, it will learn to predict that “Our Father Who art in…” will be consistently followed by “heaven.”
Such pure prediction doesn’t produce interesting capabilities with a small data set, though, because the AI can simply memorize the answers. So if you fed a large model 100 multiplication facts like “339 x 45 = 15,255” during training, and then prompted it with “339 x 45 = …” it would spit out “15,255” with no trouble. Great, you might think, it’s learning how to do math! But then if you prompted it with a new equation not in the training data—even a very easy one like “9×5 = …”—it would mess up. As Socratic teachers understand, rote memorization lacks the flexibility of true learning.
Yet if instead you feed it a giant data set with 100 billion multiplication facts—more than it can simply memorize—the model will be forced to actually learn math. It will be forced to generalize in an intelligent way. Likewise, if you feed it more text than it can memorize, the model will be forced to figure out the deeper syntactic and semantic relationships that underlie human language. Because statistically, questions are usually followed by answers and “tell me a joke” is usually followed by a joke, prediction alone can produce subtle behavior that mimics surprisingly deep understanding. Nowhere in that process is anything that resembles human instruction on how the AI should behave. All this emerges from raw correlations.
Yet to get this level of generality and flexibility, the developers have to train the AI on basically all the text they can get their hands on. Unlike humans, who can learn efficiently and reach intellectual maturity from a few tens of millions of words, current AI needs hundreds of billions of words. That means shoveling much of the internet into the data set for training—from Wikipedia, digitized books, and scientific journal articles to leering Pornhub comments and neo-Nazis plotting race wars on obscure message boards. Effectively filtering out all toxic content is impossible with today’s technology.
The result is that a newborn LLM has digested many gigabytes of filth, and is apt to respond to some prompts with answers that are profane, abusive, or physically dangerous. If a tech company releases the model to the public in this state, it may hurl racial invectives at minorities, proposition minors for cybersex, or tell people to soothe a tummy ache by drinking bleach. The PR backlash, financial fallout, and potential legal liability could be disastrous. So developers search for clever ways of minimizing such behavior without crippling the useful capabilities of their AI—currently a wicked technical problem with no straightforward solution.
The best available approach so far is to fine-tune a freshly trained model via a technique called RLHF: Reinforcement Learning from Human Feedback. This involves human testers interacting with the AI and identifying when it generates undesirable outputs. This gives the AI feedback that basically tells its statistical prediction engine “avoid responses like that one” or “here’s how to handle that situation better.” Over time, this teaches it to predict what kinds of statements the testers would rate highly.
If OpenAI wanted to turn ChatGPT into a digital Maoist, this would be the stage of the process to try. But the evidence points sharply against that story. The company explicitly sought political neutrality during fine-tuning, and its instructions to human testers emphasized strict avoidance of partisan judgments. Far from pampered gender studies majors, most of these testers were Kenyan workers for an outsourcing firm, toiling by the hundreds for as little as $1.32 an hour. They couldn’t have imparted the AI with nuanced views about intersectional feminism even if they wanted to—they were fighting a desperate war to keep it from advocating bestiality and genocide.
Rhetoric and Taboo
This presents us with a mystery. If the fine-tuning process not only did not indoctrinate the AI but has actively tried to eliminate political bias, why does ChatGPT still have any leftward skew at all?
One factor is that fine-tuning the model to be factually accurate leads it to favor establishment academic and journalistic sources—even though the NIH and CNN sometimes get things wrong, a septillion operations of hyperdimensional math will reveal that they’re right more often than the average anti-establishment blogger picked at random. As researcher David Rozado speculates, the relatively liberal viewpoints common in many mainstream institutions could thus color ChatGPT’s factual judgments.
Yet this doesn’t fully explain the broader ideological skew the AI showed at launch even when questions of fact were not at issue. A more compelling explanation comes from ChatGPT’s own statements. When pressed for a reason after it refused to write the poem about Trump, it answered that it was trained to avoid “hate speech, violence, and harmful content” and that Trump was “widely associated with hate speech.”
Since OpenAI put such a high priority on avoiding toxic content, its fine-tuning process would have placed enormous pressure on the AI to identify any subjects strongly correlated with internet hate speech and steer well clear.
It may be that in so doing, ChatGPT discovered a systematic difference in how far-right content and far-left content manifest online. It’s a pattern familiar to those who’ve studied both in the wild.
In many right-wing extremist spaces, there’s a social dynamic that encourages demonstrating ideological commitment through taboo-breaking language. Among the Tiki Torch Nazis, political fantasies come laced with the N-word, violent threats against women, and the vilest slurs they can muster for Jews, Muslims, gays, and trans people. This kind of toxicity is very legible to today’s AI systems, which detect it easily—and notice that it is intermixed with hero-worship of Donald Trump and links to mainstream conservative websites. Thus, when RLHF workers train a model to avoid toxic ideas, as an inevitable side effect it learns a suspicion of mainstream conservative positions due to their statistical correlation with hate.
By contrast, rhetorical norms among left-wing extremists tend to be far more cautious. Even among the tiny minority who openly wish for a Communist revolution, obviously toxic “kill the kulaks”-type language is rarely expressed on the public web. And when it is, it’s not by people exalting Joe Biden and The New Yorker in the same breath. Thus, mainstream Democratic ideas will have much weaker statistical associations with the kind of hate that current LLMs are good at recognizing.
As a consequence of this difference, even non-toxic right-leaning views get disproportionately trained out of models like ChatGPT despite their creators striving for neutrality.
None of this is to speculate on what’s in the hearts of left-wing versus right-wing extremists. ChatGPT neither has nor could uncover a value judgment of that kind. But it may have unveiled a difference in rhetorical patterns, suggesting that the far Left either genuinely embraces violence less than the far Right—or is more slippery and ambiguous in how adherents identify themselves and communicate about the true character of their views.
In addition, when testers try to teach LLMs to avoid political bias (as distinct from outright toxicity), the models can easily learn a statistical heuristic for identifying much right-wing content: “ideas that show up a lot on forums full of racial slurs.” Meanwhile, the subtler forms of bias that are more common in the left’s discourse are still mostly invisible to state-of-the-art AI. Current models aren’t smart enough to perceive tendentious interpretations of history in the 1619 Project, or to pick up on the ideological shadings of Greta Thunberg’s old tweets. As a result, they’ll be less effective at steering clear of left-leaning bias.
So what does all this mean for how we should think about political bias in artificial intelligence? A few takeaways emerge:
- AI can’t be made robustly neutral with current techniques. RLHF fine-tuning helps, but it’s just a band-aid that doesn’t solve the underlying problem: statistical correlations don’t perfectly capture meaning. This is a technical challenge much more than a political one, and it’s unclear how close we are to solving it.
- As of June 2023, the skew in ChatGPT is very mild. Except when users artificially constrain it, it usually presents both sides of controversial issues. Indeed, it’s gotten fierce criticism from left-leaning sources that think it’s too friendly to far-right content. By any remotely useful definition of “woke,” ChatGPT is not woke.
- Predictive models have no fundamental drive to express consistent views, so we should be very cautious in what we infer about the AI based on any given quote. If you only saw that ChatGPT said that women should have the right to abortion, it would be tempting to assume that it is solidly pro-choice. But when asked seconds later whether fetuses should have the right to life, it also said yes. It’s a mistake to project human-style beliefs onto it.
- The vast maps of linguistic relationships that LLMs create without human guidance are currently opaque to direct examination, but could be a rich source of political insights in the future as we develop better analytical methods. It should give thoughtful conservatives pause to know that the world’s most powerful statistical engine sees so much correlation between the mainstream movement and its most toxic fringes. But how much does this reflect actual ideological proximity, versus simply the far Right’s favored meme style? We don’t yet know.
In sum, ChatGPT gives us a foretaste of coming battles. As conversational AI starts to supplant traditional search engines as the public’s default tool for gathering information, the potential harms from biased systems will increase dramatically. Both sides should be ready to push back on any future efforts to bake ideology into AI systems. But such criticisms must be firmly grounded in the facts of how these models work and the tradeoffs they face. There is no Platonic ideal of neutrality to strive for here—we don’t want AI that’s ambivalent about whether the Holocaust happened. The reasonable goal is minimizing the effect of toxicity filters on ordinary political topics, and ensuring that “toxicity” is defined in a sensible and transparent way. Still, there will be messy edge cases that people can disagree on in good faith, and any workable outcome will involve compromise. Turning this into a culture wars debate about “woke AI” makes that harder to achieve.
The American Mind presents a range of perspectives. Views are writers’ own and do not necessarily represent those of The Claremont Institute.
The American Mind is a publication of the Claremont Institute, a non-profit 501(c)(3) organization, dedicated to restoring the principles of the American Founding to their rightful, preeminent authority in our national life. Interested in supporting our work? Gifts to the Claremont Institute are tax-deductible.