The cost of training machines is becoming a problem
Exact figures on how much this all costs are scarce. But a paper published in 2019 by researchers at the University of Massachusetts Amherst estimated that training one version of “Transformer”, another big language model, could cost as much as $3m. Jerome Pesenti, Facebook’s head of AI, says that one round of training for the biggest models can cost “millions of dollars” in electricity consumption.
确切报道所有这一切花了多少钱的数字很少见。但是，马萨诸塞州阿默斯特大学（University of Massachusetts Amherst）的研究人员在2019年发表的一篇论文估计，为另一个大型语言模型“变形金刚”（Transformer）训练一个版本可能要花费多达300万美元。Facebook的AI负责人杰罗姆·佩森蒂（Jerome Pesenti）说，为那些最大型的模型进行一轮训练可能会耗费数百万美元的电力。
Help from the cloud
Facebook, which turned a profit of $18.5bn in 2019, can afford those bills. Those less flush with cash are feeling the pinch. Andreessen Horowitz, an influential American venture-capital firm, has pointed out that many AI startups rent their processing power from cloud-computing firms like Amazon and Microsoft. The resulting bills—sometimes 25% of revenue or more—are one reason, it says, that AI startups may make for less attractive investments than old-style software companies. In March Dr Manning’s colleagues at Stanford, including Fei-Fei Li, an AI luminary, launched the National Research Cloud, a cloud-computing initiative to help American AI researchers keep up with spiralling bills.
Facebook在2019年实现了185亿美元的利润，付得起这些账单。那些现金没那么充裕的公司就感到肉痛了。颇具影响力的美国风险投资公司安德森·霍洛维茨（Andreessen Horowitz）指出，许多AI创业公司都是向亚马逊和微软等云计算公司租用处理能力的。它说，由此产生的账单（有时占收入的25%或更多）是投资AI创业公司可能不如投资传统软件公司有吸引力的原因之一。3月，曼宁在斯坦福的同事（包括AI大咖李飞飞）发起了云计算计划“国家研究云”（National Research Cloud），旨在帮助美国的AI研究人员负担不断攀升的费用。
The growing demand for computing power has fuelled a boom in chip design and specialised devices that can perform the calculations used in AI efficiently. The first wave of specialist chips were graphics processing units (GPUs), designed in the 1990s to boost video-game graphics. As luck would have it, GPUs are also fairly well-suited to the sort of mathematics found in AI.
Further specialisation is possible, and companies are piling in to provide it. In December, Intel, a giant chipmaker, bought Habana Labs, an Israeli firm, for $2bn. Graphcore, a British firm founded in 2016, was valued at $2bn in 2019. Incumbents such as Nvidia, the biggest GPU-maker, have reworked their designs to accommodate AI. Google has designed its own “tensor-processing unit” (TPU) chips in-house. Baidu, a Chinese tech giant, has done the same with its own “Kunlun” chips. Alfonso Marone at KPMG reckons the market for specialised AI chips is already worth around $10bn, and could reach $80bn by 2025.
还可以进一步专门化，并且有许多公司正致力于此。去年12月，芯片制造巨头英特尔以20亿美元收购了以色列公司哈瓦那实验室（Habana Labs）。成立于2016年的英国公司Graphcore在2019年的估值达20亿美元。最大的GPU制造商英伟达等成熟公司也调整了设计以适应AI。谷歌自行设计了自有的“张量处理单元”（TPU）芯片。中国科技巨头百度也开发了自己的“昆仑”芯片。毕马威会计师事务所的阿方索·马龙（Alfonso Marone）认为，AI专用芯片市场的规模已达100亿美元左右，到2025年可能达到800亿美元。
“Computer architectures need to follow the structure of the data they’re processing,” says Nigel Toon, one of Graphcore’s co-founders. The most basic feature of AI workloads is that they are “embarrassingly parallel”, which means they can be cut into thousands of chunks which can all be worked on at the same time. Graphcore’s chips, for instance, have more than 1,200 individual number-crunching “cores”, and can be linked together to provide still more power. Cerebras, a Californian startup, has taken an extreme approach. Chips are usually made in batches, with dozens or hundreds etched onto standard silicon wafers 300mm in diameter. Each of Cerebras’s chips takes up an entire wafer by itself. That lets the firm cram 400,000 cores onto each.
Graphcore的联合创始人之一奈杰尔·图恩（Nigel Toon）说：“计算机体系的结构需要符合它们所处理的数据的结构。” AI工作负荷的最基本特征是“极易并行”，也就是说它们可以被切成数千块并同时处理。例如，Graphcore的芯片具有1200多个独立的数字运算“核心”，可以链接在一起以提供更强的能力。加州创业公司Cerebras采用了一种极端的方法。芯片通常是分批制造的，把数十或数百个芯片蚀刻在直径300毫米的标准晶圆上。Cerebras的每块芯片都占据了整个晶圆，这样就可以在里面挤进40万个核。
Other optimisations are important, too. Andrew Feldman, one of Cerebras’s founders, points out that AI models spend a lot of their time multiplying numbers by zero. Since those calculations always yield zero, each one is unnecessary, and Cerebras’s chips are designed to avoid performing them. Unlike many tasks, says Mr Toon at Graphcore, ultra-precise calculations are not needed in AI. That means chip designers can save energy by reducing the fidelity of the numbers their creations are juggling. (Exactly how fuzzy the calculations can get remains an open question.)
All that can add up to big gains. Mr Toon reckons that Graphcore’s current chips are anywhere between ten and 50 times more efficient than GPUs. They have already found their way into specialised computers sold by Dell, as well as into Azure, Microsoft’s cloud-computing service. Cerebras has delivered equipment to two big American government laboratories.
Such innovations will be increasingly important, for the AI-fuelled explosion in demand for computer power comes just as Moore’s law is running out of steam. Shrinking chips is getting harder, and the benefits of doing so are not what they were. Last year Jensen Huang, Nvidia’s founder, opined bluntly that “Moore’s law isn’t possible any more”.
Quantum solutions and neuromantics
Other researchers are therefore looking at more exotic ideas. One is quantum computing, which uses the counter-intuitive properties of quantum mechanics to provide big speed-ups for some sorts of computation. One way to think about machine learning is as an optimisation problem, in which a computer is trying to make trade-offs between millions of variables to arrive at a solution that minimises as many as possible. A quantum-computing technique called Grover’s algorithm offers big potential speed-ups, says Krysta Svore, who leads the Architectures and Computation Group at Microsoft Research.
因此，其他研究人员正在寻找更多新奇的创意。一个是量子计算，它利用量子力学的反直觉特性大幅加速某些类型的计算。关于机器学习的一种思考方式是把它看作优化问题——计算机试图在数百万个变量之间折衷取舍，以求出误差尽可能小的解。微软研究院量子架构与计算小组负责人克里斯塔·斯沃尔（Krysta Svore）说，一种称为葛洛沃算法（Grover’s algoritm）的量子计算技术有可能大大提高速度。
Another idea is to take inspiration from biology, which proves that current brute-force approaches are not the only way. Cerebras’s chips consume around 15kW when running flat-out, enough to power dozens of houses (an equivalent number of GPUs consumes many times more). A human brain, by contrast, uses about 20W of energy—about a thousandth as much—and is in many ways cleverer than its silicon counterpart. Firms such as Intel and IBM are therefore investigating “neuromorphic” chips, which contain components designed to mimic more closely the electrical behaviour of the neurons that make up biological brains.
For now, though, all that is far off. Quantum computers are relatively well-understood in theory, but despite billions of dollars in funding from tech giants such as Google, Microsoft and IBM, actually building them remains an engineering challenge. Neuromorphic chips have been built with existing technologies, but their designers are hamstrung by the fact that neuroscientists still do not understand what exactly brains do, or how they do it.
That means that, for the foreseeable future, AI researchers will have to squeeze every drop of performance from existing computing technologies. Mr Toon is bullish, arguing that there are plenty of gains to be had from more specialised hardware and from tweaking existing software to run faster. To quantify the nascent field’s progress, he offers an analogy with video games: “We’re past Pong,” he says. “We’re maybe at Pac-Man by now.” All those without millions to spend will be hoping he is right. ■