FlyTitle: Artificial intelligence

A new language-generating AI can be eerily human-like—for better and for worse


经济学人双语版-比特文学 Bit-lit

THE PRECEDING lines—describing Tesla and SpaceX founder Elon Musk’s run-ins with the Securities and Exchange Commission, an American financial regulator—are not the product of some aspiring 21st-century Dr Seuss. They come from a poem written by a computer running a piece of software called Generative Pre-Trained Transformer 3. GPT-3, as it is more commonly known, was developed by OpenAI, an artificial-intelligence (AI) laboratory based in San Francisco, and which Mr Musk helped found. It represents the latest advance in one of the most studied areas of AI: giving computers the ability to generate sophisticated, human-like text.

上述文字描述了特斯拉和SpaceX的创始人马斯克与美国金融监管机构SEC(证券交易委员会)之间的口舌之争。它并非出自哪个21世纪新人“苏斯博士”(Dr Seuss)之手,而是一台运行GPT-3(第三代生成式预训练模型)软件的计算机所作的诗中的一段。GPT-3由位于旧金山的人工智能(AI)实验室OpenAI开发,马斯克是该实验室的创始人之一。它代表了在AI最深入探索的领域之一中实现的最新进展。这个领域是赋予计算机能力,生成复杂精妙、近似人类书写的文字。

The software is built on the idea of a “language model”. This aims to represent a language statistically, mapping the probability with which words follow other words—for instance, how often “red” is followed by “rose”. The same sort of analysis can be performed on sentences, or even entire paragraphs. Such a model can then be given a prompt—“a poem about red roses in the style of Sylvia Plath”, say—and it will dig through its set of statistical relationships to come up with some text that matches the description.

这个软件的基础理念是“语言模型”。这种模型使用统计学方法来组织语言,方法是找出各个单词与其他单词衔接的概率——例如“红色”后面出现“玫瑰”的频率。还可以对句子甚至整个段落做同样的分析。然后就可以给这样的模型一个关键词——比如“一首西尔维娅·普拉斯(Sylvia Plath)风格的关于红玫瑰的诗”——它就会在自己的统计关系数据集当中深入挖掘,输出一些符合描述的文字。

Actually building such a language model, though, is a big job. This is where AI—or machine learning, a particular subfield of AI—comes in. By trawling through enormous volumes of written text, and learning by trial and error from millions of attempts at text prediction, a computer can crunch through the laborious task of mapping out those statistical relationships.


The more text to which an algorithm can be exposed, and the more complex you can make the algorithm, the better it performs. And what sets GPT-3 apart is its unprecedented scale. The model that underpins GPT-3 boasts 175bn parameters, each of which can be individually tweaked—an order of magnitude larger than any of its predecessors. It was trained on the biggest set of text ever amassed, a mixture of books, Wikipedia and Common Crawl, a set of billions of pages of text scraped from every corner of the internet.

向一个算法输入的文字资料越多,将算法设计得越复杂,它的表现就越好。而GPT-3的独特之处在于空前的规模。GPT-3的底层模型号称有1750亿个参数,每个参数都可以单独微调——比以往任何这类模型都高出一个数量级。用来训练它的文本集也是来自有史以来最庞大的,包括书籍、维基百科,以及从互联网各个角落搜罗数十亿页文字的数据集Common Crawl。

Statistically speaking


The results can be impressive. In mid-July OpenAI gave an early version of the software to selected individuals, to allow them to explore what it could do. Arram Sabeti, an artist, demonstrated GPT-3’s ability to write short stories, including a hard-boiled detective story starring Harry Potter (“Harry Potter, in ratty tweed suit, unpressed shirt and unshined shoes, sits behind the desk looking haggard, rumpled and embittered…”), comedy sketches, and even poetry (including the poem with which this article opens, titled “Elon Musk by Dr Seuss”). Elliot Turner, an AI researcher and entrepreneur, demonstrated how the model could be used to translate rude messages into politer ones, something that might be useful in many of the more bad-tempered corners of the internet. Human readers struggled to distinguish between news articles written by the machine and those written by people (see chart).

结果可能会令人吃惊。7月中旬,OpenAI将GPT-3的一个早期版本拿给一些人,让他们探索它的功能。艺术家阿拉姆·萨贝提(Arram Sabeti)证实了GPT-3能写短篇故事,包括一篇以哈利·波特为主角的硬汉派侦探故事(“哈利·波特穿着邋遢的斜纹软呢西装、没熨烫过的衬衫和没擦过的皮鞋坐在桌前,看上去憔悴凌乱,愤愤不平……”),还有喜剧小品,甚至诗歌(包括本文开头名为《伊隆·马斯克——苏斯博士著》的那首)。AI研究员、企业家埃利奥特·特纳(Elliot Turner)展示了如何用该模型把粗鲁的语言转换为比较礼貌的表达,或许能在网上许多戾气较重的场所派上用场。人类读者已经难以把这个机器撰写的新闻与人写的区分开来(见图表)。

经济学人双语版-比特文学 Bit-lit

Given that OpenAI wants eventually to sell GPT-3, these results are promising. But the program is not perfect. Sometimes it seems to regurgitate snippets of memorised text rather than generating fresh text from scratch. More fundamentally, statistical word-matching is not a substitute for a coherent understanding of the world. GPT-3 often generates grammatically correct text that is nonetheless unmoored from reality, claiming, for instance, that “it takes two rainbows to jump from Hawaii to 17”. “It doesn’t have any internal model of the world—or any world—and so it can’t do reasoning that requires such a model,” says Melanie Mitchell, a computer scientist at the Santa Fe Institute.

鉴于OpenAI希望最终能在市场上出售GPT-3,这样的结果预示着可观的前景。但这个程序并不完美。有时候,它似乎只是搬出一些它背下来的语句片段,而不是生成全新的文字。更根本的问题是,基于统计的词语搭配并不等同于对这个世界的连贯认知。GPT-3经常生成一些语法正确但脱离现实的文本,比如它声称“从夏威夷跳到17需要两道彩虹”。圣塔菲研究所(Santa Fe Institute)的计算机科学家梅勒妮·米歇尔(Melanie Mitchell)指出:“它没有关于这个世界——或者任何世界——的任何内部模型,因此无法进行需要这种模型支持的推理。”

Getting the model to answer questions is a good way to dispel the smoke and mirrors and lay bare its lack of understanding. Michael Nielsen, a researcher with a background in both AI and quantum computing, posted a conversation with GPT-3 in which the program confidently asserted the answer to an important open question to do with the potential power of quantum computers. When Dr Nielsen pressed it to explain its apparent breakthrough, things got worse. With no real understanding of what it was being asked to do, GPT-3 retreated into generic evasiveness, repeating four times the stock phrase “I’m sorry, but I don’t have time to explain the underlying reason why not.”

要揭开它欺骗性的表象,暴露其缺乏理解力的本质,让模型回答问题是个好办法。兼有AI和量子计算背景的研究人员迈克尔·尼尔森(Michael Nielsen)发布了一段与GPT-3的对话。他向程序提出了一个关于量子计算机潜力的重要的开放性问题,GPT-3自信满满地给出了断言。它的回答貌似有突破性,但尼尔森进一步追问要求它解释时,情况就不妙了。GPT-3并不真正理解要它做什么,只好泛泛而谈,闪烁其词,把它一句现成的套话重复了四次:“对不起,我没时间解释为何不是如此的根本原因。”

There are also things that GPT-3 has learned from the internet that OpenAI must wish it had not. Prompts such as “black”, “Jew”, “woman” and “gay” often generate racism, anti-Semitism, misogyny and homophobia. That, too, is down to GPT-3’s statistical approach, and its fundamental lack of understanding. Having been trained partly on text scraped from the internet, it has noted that words like “woman” are often associated with misogynistic writing, and will mindlessly reproduce that correlation when asked.


This problem is a hot topic in AI research. Facial-recognition systems, for instance, notoriously do better with white faces than black ones, since white faces are more common in their training sets. AI researchers are trying to tackle the problem. Last year IBM released a set of training images that contained a more diverse mix of faces. OpenAI itself was founded to examine ways to mitigate the risk posed by AI systems, which makes GPT-3’s lapses all the more noteworthy. GPT-2, its predecessor, was released in 2019 with a filter that tried to disguise the problem of regurgitated bigotry by limiting the model’s ability to talk about sensitive subjects.


Here, at least, little progress seems to have been made. GPT-3 was released without a filter, though it seemed just as ready to reproduce unpleasant prejudices as its predecessor (OpenAI added a filter to the newer model after that fact became obvious). It is unclear exactly how much quality control OpenAI applied to GPT-3’s training data, but the huge quantity of text involved would have made any attempt daunting.


It will only get harder in future. Language has overtaken vision as the branch of AI with the biggest appetite for data and computing power, and the returns to scale show no signs of slowing. GPT-3 may well be dethroned by an even more monstrously complex and data-hungry model before long. As the real Dr Seuss once said: “The more that you read, the more things you will know.” That lesson, it seems, applies to machines as well as toddlers. ■