The cost of training machines is becoming a problem
训练机器的成本正成为一个问题【技术季刊《AI及其局限:比预期更陡峭》系列之五】

【双语】机器学习中 Machine, learning-书迷号 shumihao.com

Exact figures on how much this all costs are scarce. But a paper published in 2019 by researchers at the University of Massachusetts Amherst estimated that training one version of “Transformer”, another big language model, could cost as much as $3m. Jerome Pesenti, Facebook’s head of AI, says that one round of training for the biggest models can cost “millions of dollars” in electricity consumption.

确切报道所有这一切花了多少钱的数字很少见。但是,马萨诸塞州阿默斯特大学(University of Massachusetts Amherst)的研究人员在2019年发表的一篇论文估计,为另一个大型语言模型“变形金刚”(Transformer)训练一个版本可能要花费多达300万美元。Facebook的AI负责人杰罗姆·佩森蒂(Jerome Pesenti)说,为那些最大型的模型进行一轮训练可能会耗费数百万美元的电力。

Help from the cloud

来自云的帮助

Facebook, which turned a profit of $18.5bn in 2019, can afford those bills. Those less flush with cash are feeling the pinch. Andreessen Horowitz, an influential American venture-capital firm, has pointed out that many AI startups rent their processing power from cloud-computing firms like Amazon and Microsoft. The resulting bills—sometimes 25% of revenue or more—are one reason, it says, that AI startups may make for less attractive investments than old-style software companies. In March Dr Manning’s colleagues at Stanford, including Fei-Fei Li, an AI luminary, launched the National Research Cloud, a cloud-computing initiative to help American AI researchers keep up with spiralling bills.

Facebook在2019年实现了185亿美元的利润,付得起这些账单。那些现金没那么充裕的公司就感到肉痛了。颇具影响力的美国风险投资公司安德森·霍洛维茨(Andreessen Horowitz)指出,许多AI创业公司都是向亚马逊和微软等云计算公司租用处理能力的。它说,由此产生的账单(有时占收入的25%或更多)是投资AI创业公司可能不如投资传统软件公司有吸引力的原因之一。3月,曼宁在斯坦福的同事(包括AI大咖李飞飞)发起了云计算计划“国家研究云”(National Research Cloud),旨在帮助美国的AI研究人员负担不断攀升的费用。

The growing demand for computing power has fuelled a boom in chip design and specialised devices that can perform the calculations used in AI efficiently. The first wave of specialist chips were graphics processing units (GPUs), designed in the 1990s to boost video-game graphics. As luck would have it, GPUs are also fairly well-suited to the sort of mathematics found in AI.

对计算能力不断增长的需求推动了芯片设计和专用设备的蓬勃发展,这些设备可以高效地执行AI中使用的计算。第一波专用芯片是图形处理单元(GPU),它在1990年代被设计出来,用于增强视频游戏图形。幸运的是,GPU也非常适合AI中用到的那种数学。

Further specialisation is possible, and companies are piling in to provide it. In December, Intel, a giant chipmaker, bought Habana Labs, an Israeli firm, for $2bn. Graphcore, a British firm founded in 2016, was valued at $2bn in 2019. Incumbents such as Nvidia, the biggest GPU-maker, have reworked their designs to accommodate AI. Google has designed its own “tensor-processing unit” (TPU) chips in-house. Baidu, a Chinese tech giant, has done the same with its own “Kunlun” chips. Alfonso Marone at KPMG reckons the market for specialised AI chips is already worth around $10bn, and could reach $80bn by 2025.

还可以进一步专门化,并且有许多公司正致力于此。去年12月,芯片制造巨头英特尔以20亿美元收购了以色列公司哈瓦那实验室(Habana Labs)。成立于2016年的英国公司Graphcore在2019年的估值达20亿美元。最大的GPU制造商英伟达等成熟公司也调整了设计以适应AI。谷歌自行设计了自有的“张量处理单元”(TPU)芯片。中国科技巨头百度也开发了自己的“昆仑”芯片。毕马威会计师事务所的阿方索·马龙(Alfonso Marone)认为,AI专用芯片市场的规模已达100亿美元左右,到2025年可能达到800亿美元。

“Computer architectures need to follow the structure of the data they’re processing,” says Nigel Toon, one of Graphcore’s co-founders. The most basic feature of AI workloads is that they are “embarrassingly parallel”, which means they can be cut into thousands of chunks which can all be worked on at the same time. Graphcore’s chips, for instance, have more than 1,200 individual number-crunching “cores”, and can be linked together to provide still more power. Cerebras, a Californian startup, has taken an extreme approach. Chips are usually made in batches, with dozens or hundreds etched onto standard silicon wafers 300mm in diameter. Each of Cerebras’s chips takes up an entire wafer by itself. That lets the firm cram 400,000 cores onto each.

Graphcore的联合创始人之一奈杰尔·图恩(Nigel Toon)说:“计算机体系的结构需要符合它们所处理的数据的结构。” AI工作负荷的最基本特征是“极易并行”,也就是说它们可以被切成数千块并同时处理。例如,Graphcore的芯片具有1200多个独立的数字运算“核心”,可以链接在一起以提供更强的能力。加州创业公司Cerebras采用了一种极端的方法。芯片通常是分批制造的,把数十或数百个芯片蚀刻在直径300毫米的标准晶圆上。Cerebras的每块芯片都占据了整个晶圆,这样就可以在里面挤进40万个核。

Other optimisations are important, too. Andrew Feldman, one of Cerebras’s founders, points out that AI models spend a lot of their time multiplying numbers by zero. Since those calculations always yield zero, each one is unnecessary, and Cerebras’s chips are designed to avoid performing them. Unlike many tasks, says Mr Toon at Graphcore, ultra-precise calculations are not needed in AI. That means chip designers can save energy by reducing the fidelity of the numbers their creations are juggling. (Exactly how fuzzy the calculations can get remains an open question.)

其他优化也很重要。Cerebras的创始人之一安德鲁·费尔德曼(Andrew Feldman)指出,AI模型花费大量时间将数字乘以零。由于这些计算的结果总是为零,因此都是不必要的,而Cerebras的芯片都能避免执行这些计算。Graphcore的图恩说,AI并不需要超级精确的计算,这一点与许多任务不同。这意味着芯片设计师可以通过降低芯片处理的数字精度来节省能源。(计算结果到底可能模糊到什么程度仍然是个悬而未决的问题。)

All that can add up to big gains. Mr Toon reckons that Graphcore’s current chips are anywhere between ten and 50 times more efficient than GPUs. They have already found their way into specialised computers sold by Dell, as well as into Azure, Microsoft’s cloud-computing service. Cerebras has delivered equipment to two big American government laboratories.

所有这一切加起来可以获得巨大的收益。图恩认为Graphcore现有芯片的效率是GPU的10到50倍。这些芯片已经进入了戴尔出售的专用计算机以及微软的云计算服务Azure。Cerebras已为两个大型美国政府实验室提供了设备。

Such innovations will be increasingly important, for the AI-fuelled explosion in demand for computer power comes just as Moore’s law is running out of steam. Shrinking chips is getting harder, and the benefits of doing so are not what they were. Last year Jensen Huang, Nvidia’s founder, opined bluntly that “Moore’s law isn’t possible any more”.

此类创新将变得越来越重要,因为AI引发的计算能力需求激增正值摩尔定律逐渐枯竭之际。缩小芯片变得越来越困难,收益也不可与当年同日而语。去年,英伟达的创始人黄仁勋直言不讳地说:“摩尔定律已不再适用。”

Quantum solutions and neuromantics

量子解决方案和神经动力学

Other researchers are therefore looking at more exotic ideas. One is quantum computing, which uses the counter-intuitive properties of quantum mechanics to provide big speed-ups for some sorts of computation. One way to think about machine learning is as an optimisation problem, in which a computer is trying to make trade-offs between millions of variables to arrive at a solution that minimises as many as possible. A quantum-computing technique called Grover’s algorithm offers big potential speed-ups, says Krysta Svore, who leads the Architectures and Computation Group at Microsoft Research.

因此,其他研究人员正在寻找更多新奇的创意。一个是量子计算,它利用量子力学的反直觉特性大幅加速某些类型的计算。关于机器学习的一种思考方式是把它看作优化问题——计算机试图在数百万个变量之间折衷取舍,以求出误差尽可能小的解。微软研究院量子架构与计算小组负责人克里斯塔·斯沃尔(Krysta Svore)说,一种称为葛洛沃算法(Grover’s algoritm)的量子计算技术有可能大大提高速度。

Another idea is to take inspiration from biology, which proves that current brute-force approaches are not the only way. Cerebras’s chips consume around 15kW when running flat-out, enough to power dozens of houses (an equivalent number of GPUs consumes many times more). A human brain, by contrast, uses about 20W of energy—about a thousandth as much—and is in many ways cleverer than its silicon counterpart. Firms such as Intel and IBM are therefore investigating “neuromorphic” chips, which contain components designed to mimic more closely the electrical behaviour of the neurons that make up biological brains.

另一个想法是从生物学中获得启发,这证明了当前的蛮力计算不是唯一的出路。全速运行时,Cerebras的芯片消耗约15千瓦的功率,足以为数十座房屋供电(同等数量的GPU消耗的功率还要多很多倍)。相比之下,人脑消耗的能量是20瓦左右(约是其千分之一),并且在许多方面都比硅基技术更聪明。因此,英特尔和IBM等公司正在研究“神经形态”芯片,其中包含的元件将能更贴切地模仿组成生物大脑的神经元的电行为。

For now, though, all that is far off. Quantum computers are relatively well-understood in theory, but despite billions of dollars in funding from tech giants such as Google, Microsoft and IBM, actually building them remains an engineering challenge. Neuromorphic chips have been built with existing technologies, but their designers are hamstrung by the fact that neuroscientists still do not understand what exactly brains do, or how they do it.

但是,就目前而言,这一切还很遥远。对量子计算机的理解在理论上比较清晰,但是尽管谷歌、微软和IBM等技术巨头提供了数十亿美元的资金,实际打造它仍然是工程上的挑战。神经形态芯片是使用现有技术构建的,但神经科学家仍然不了解大脑究竟在做什么或是如何做到的,这让设计困难重重。

That means that, for the foreseeable future, AI researchers will have to squeeze every drop of performance from existing computing technologies. Mr Toon is bullish, arguing that there are plenty of gains to be had from more specialised hardware and from tweaking existing software to run faster. To quantify the nascent field’s progress, he offers an analogy with video games: “We’re past Pong,” he says. “We’re maybe at Pac-Man by now.” All those without millions to spend will be hoping he is right. ■

这意味着,在可预见的未来,AI研究人员将不得不从现有的计算技术中挤出每一点性能。图恩很乐观,认为通过更专门的硬件和调整现有软件来加速还会有很大的收益。为了量化这个尚处于初期的领域的发展,他用电子游戏来类比:“我们已经走过了《乓》,”他说,“现在可能是在《吃豆人》的阶段。” 所有那些没有百万美元可以挥霍的人肯定希望他是对的。