FlyTitle: Infrastructure

Should data-crunching be done at the centre or at the edge?

数据处理应该在中心还是边缘进行?【专题报道《数据经济》系列之三】

经济学人双语版-铺开 Spreading out

ONCE A YEAR the computing cloud touches down in Las Vegas. In early December tens of thousands of mostly male geeks descend on America’s gambling capital in hope not of winnings but of wisdom about Amazon Web Services (AWS), the world’s biggest cloud-computing provider. Last year they had the choice of more than 2,500 different sessions over a week at the shindig, which was called “Re:Invent”. The high point was the keynote featuring AWS’s latest offerings by Andy Jassy, the firm’s indefatigable boss, who paced the stage for nearly three hours.

计算云一年一度降落在拉斯维加斯。12月初,上万名多为男性的极客来到美国这座赌城,不是为了赢钱,而是要汲取关于世界最大的云计算提供商亚马逊云计算服务(AWS)的智慧。去年,在为期一周的名为“Re:Invent”的盛大聚会上有2500多场讲座。最精彩的部分是该公司不知疲倦的老板安迪·贾西(Andy Jassy)介绍AWS最新产品的主题演讲,他在舞台走来走去了近三个小时。

But those who dare to walk the long city blocks of Las Vegas to the conference venues can connect to the cloud, and thus the mirror worlds, in another way. Push a button to request a green light at one of thousands of intersections and this will trigger software from SWIM.AI, a startup, to perform a series of calculations that may influence the traffic flow in the entire city. These intersections do not exist just in the physical realm, but live in the form of digital twins in a data centre. Each takes in information from its environment—not just button-pushing pedestrians, but every car crossing a loop in the road and every light change—and continually predicts what its traffic lights will do two minutes ahead of time. Ride-hailing firms such as Uber, among others, can then feed these predictions into their systems to optimise driving routes.

但是,那些敢于走过拉斯维加斯长长的街区来到会议地点的人,也可以通过另一种方式连接到云,从而连接到镜像世界。在成千上万个十字路口中随便选一个,按下按钮请求绿灯,就会触发创业公司SWIM.AI的软件执行一系列可能会影响整个城市交通流动的计算。这些路口不仅存在于真实世界,还有一个孪生的数字路口存在于数据中心。每个路口都从环境中获取信息——不仅是按下按钮的行人,还有每辆越过嵌在道路中的一个线圈的汽车,以及每一次交通信号灯变化——并提前两分钟持续预测信号灯会如何变化。然后,优步等网约车公司可以将这些预测输入自己的系统中来优化行驶路线。

AWS represents a centralised model where all the data are collected and crunched in a few places, namely big data centres. SWIM.AI, on the other hand, is an example of what is being called “edge computing”: the data are processed in real time as close as possible to where they are collected. It is between these two poles that the infrastructure of the data economy will stretch. It will be, to quote a metaphor first used by Brian Arthur of the Santa Fe Institute, very much like the root system of an aspen tree. For every tree above the ground, there are miles and miles of interconnected roots underground, which also connect to the roots of other trees. Similarly, for every warehouse-sized data centre, there will be an endless network of cables and connections, collecting data from every nook and cranny of the world.

AWS代表了一种集中模式,所有数据都在几个地点(即大型数据中心)收集和处理。另一方面,SWIM.AI则是所谓的“边缘计算”的一个示例:数据在尽可能接近收集位置的地点实时处理。数据经济的基础设施将在这两极之间铺开。用圣达菲研究所(Santa Fe Institute)的布莱恩·亚瑟(Brian Arthur)最早使用的比喻来说,这会很像白杨树的根系。对于地面上的每一棵树,地下交缠的根系都会绵延数里,并与其他树的根相连。同样,对于每个仓库大小的数据中心,将有数不胜数的电缆和连接构成网络,收集来自世界各个角落的数据。

To grasp how all this may work, consider the origin and journey of a typical bit and how both will change in the years to come. Today the bit is often still created by a human clicking on a website or tapping on a smartphone. Tomorrow it will more often than not be generated by machines, collectively called the “Internet of Things” (IOT): cranes, cars, washing machines, eyeglasses and so on. And these devices will not only serve as sensors, but act on the world in which they are embedded.

要想搞清楚这一切是怎么工作的,我们来考虑一个典型的比特的起源和旅程,以及这两点在今后几年中将如何变化。时至今日,比特仍然经常通过人们单击网站或智能手机生成。未来,它会更常由机器产生,这些机器统称为“物联网”(IOT):起重机、汽车、洗衣机、眼镜等。这些设备不仅可以充当传感器,而且可以在它们嵌入的世界中发挥作用。

Ericsson, a maker of network gear, predicts that the number of IOT devices will reach 25bn by 2025, up from 11bn in 2019. Such an estimate may sound self-serving, but this explosion is the likely outcome of a big shift in how data is collected. Currently, many devices are tethered by cable. Increasingly, they will be connected wirelessly. 5G, the next generation of mobile technology, is designed to support 1m connections per square kilometre, meaning that in Manhattan alone there could be 60m connections. Ericsson estimates that mobile networks will carry 160 exabytes of data globally each month by 2025, four times the current amount.

网络设备制造商爱立信预测,到2025年,物联网设备的数量将从2019年的110亿增加到250亿。这样的预估听起来可能只是为自我利益服务,但若数据收集的方式发生重大转变,设备数量的爆炸是很可能发生的结果。现在许多设备都是用电缆连接的,日后使用无线连接的会越来越多。下一代移动技术5G设计支持每平方公里100万个连接,这意味着仅在曼哈顿就可以有6000万个连接。爱立信估计,到2025年,移动网络每月将在全球范围内承载160 EB的数据,是当前的四倍。

The destination of your average bit is changing, too. Historically, most digital information stayed home, on the device where it was created. Now, more and more data flow into the big computing factories operated by AWS, but also its main competitors, Microsoft Azure, Alibaba Cloud and Google Cloud. These are, in most cases, the only places so far with enough computing power to train algorithms that can, for instance, quickly detect credit-card fraud or predict when a machine needs a check-up, says Bill Vass, who runs AWS’s storage business—the world’s biggest. He declines to say how big, only that it is 14 times bigger than that of AWS’s closest competitor, which would be Azure (see chart).

那个典型比特的目的地也在改变。过去,大多数数字信息都停留在创建它的设备上。现在,越来越多数据流入由AWS运营的大型计算工厂,还有其主要竞争对手Microsoft Azure、阿里云和Google Cloud。规模列全球最大的AWS存储业务的负责人比尔·瓦斯(Bill Vass)说,在大多数情况下,世界上只有这几个地方拥有足够的计算能力来训练某些算法,比如快速检测信用卡欺诈,或是预测机器何时需要检修。他拒绝透露AWS的存储规模具体有多大,只说是最接近的竞争对手(也就是Azure,见图表)的14倍。

经济学人双语版-铺开 Spreading out

What Mr Vass also prefers not to say, is that AWS and other big cloud-computing providers are striving mightily to deepen this centralisation. AWS provides customers with free or cheap software that makes it easy to connect and manage IOT devices. It offers no fewer than 14 ways to get data into its cloud, including several services to do this via the internet, but also offline methods, such as lorries packed with digital storage which can hold up to 100 petabytes to ferry around data (one of which Mr Jassy welcomed on stage during his keynote speech in 2016).

还有一件事也是瓦斯不太想说的,就是AWS和其他大型云计算提供商正在全力以赴来深化这种集中模式。AWS为客户提供免费或廉价的软件来轻松连接和管理物联网设备。它提供了不少于14种把数据导入它的云的方式,不但有通过互联网执行这一操作的多种服务,还提供了离线方法,例如用装满数字存储器的卡车来运输数据,可容纳多达100 PB(贾西在2016年的主题演讲中曾让一辆这样的卡车登台亮相)。

The reason for this approach is no secret. Data attract more data, because different sets are most profitably mined together—a phenomenon known as “data gravity”. And once a firm’s important data are in the cloud, it will move more of its business applications to the computing skies, generating ever more revenue for cloud-computing providers. Cloud providers also offer an increasingly rich palette of services which allow customers to mine their data for insights.

使用这种模式的原因不是秘密。数据会吸引更多数据,因为将不同的数据集放在一起挖掘最有利可图——这种现象称为“数据引力”。而一旦一家公司的重要数据存储在云中,它就会把更多的业务应用转移到计算天空,为云计算提供商带来更多的收入。云提供商还提供了越来越丰富的服务选择,让客户可以挖掘自己的数据获得洞见。

Yet such centralisation comes with costs. One is the steep fees firms have to pay when they want to move data to other clouds. More important, concentrating data in big centres could also become more costly for the environment. Sending data to a central location consumes energy. And once there, the temptation is great to keep crunching them. According to OpenAI, a startup-cum-think-tank, the computing power used in cutting-edge AI projects started to explode in 2012. Before that it closely tracked Moore’s law, which holds that the processing power of chips doubles roughly every two years; since then, demand has doubled every 3.4 months.

然而,这种集中化也有代价。其中之一是企业要将数据移至其他云时必须支付高昂的费用。更重要的是,将数据集中到大型中心的环境成本也可能增加。将数据发送到中心位置会消耗能量。而一旦送达,持续分析处理它们的诱惑就很强烈了。根据创业公司兼智囊团OpenAI的说法,尖端人工智能项目中使用的计算能力在2012年开始爆发。在此之前,它很好地贴合摩尔定律,即认为芯片的处理能力大约每两年翻一番;而从那时起,需求每3.4个月翻一番。

Happily, a counter-movement has already started—toward the computing “edge”, where data are generated. It is not just servers in big data centres that are getting more powerful, but also smaller local centres and connected devices themselves, thus allowing data to be analysed closer to the source. What is more, software now exists to move computing power around to where it works best, which can be on or near IOT devices.

令人高兴的是,一种反向运动已经开始出现——朝着生成数据的计算“边缘”方向。变得更加强大的不仅是大型数据中心里的服务器,还有较小的本地中心和互联的设备本身,让数据可以在更接近源头的地方被分析处理。而且,现在有软件可将计算能力转移到效用最大的地方,而这可能就在物联网设备上或附近。

Applications such as self-driving cars need very fast-reacting connections and cannot afford the risk of being disconnected, so computing needs to happen in nearby data centres or even in the car itself. And in some cases the data flows are simply too large to be sent to the cloud, as with the traffic lights in Las Vegas, which together generate 60 terabytes a day (a tenth of the amount Facebook collects in a day).

无人车之类的应用需要能做出快速反应的连接,且无法承受断开连接的风险,因此计算需要在附近的数据中心甚至在汽车本身中进行。在某些情况下,数据流实在是太大了,无法发送到云端,比如拉斯维加斯的交通信号灯每天要产生60 TB的数据(相当于Facebook每天收集的数据量的十分之一)。

How far will the pendulum swing back? The answer depends on whom you ask. The edge is important, concedes Matt Wood, who is in charge of AI at AWS, but “at some point you need to aggregate your data together so that you can train your models”. Sam George, who leads Azure’s IOT business, expects computing to be equally spread between the cloud and its edge. And Simon Crosby, the chief technologist at SWIM.AI, while admitting that his firm’s approach “does not apply everywhere”, argues that too much data are generated at the edge to send to the cloud, and there will never be enough data scientists to help train all the models centrally.

这个钟摆会往回摆多远呢?答案取决于你问的是谁。AWS的人工智能负责人马特·伍德(Matt Wood)承认,“边缘”很重要,但“到某些时候,你还是得把数据汇总到一起来训练模型”。领导Azure物联网业务的山姆·乔治(Sam George)认为计算会在云和边缘之间均匀分布。SWIM.AI的首席技术专家西蒙·克罗斯比(Simon Crosby)虽承认自己公司的方法“并不适用于所有地方”,但他认为,在边缘生成的数据太多,无法都发送到云中,并且永远不会有足够的数据科学家来帮助集中训练所有模型。

Even so, this counter-movement may not go far enough. Given the incentives, big cloud providers will still be tempted to collect too much data and crunch them. One day soon, debates may rage over whether data generation should be taxed, if the world does not want to drown in the digital sea.■

即使这样,这种反向运动可能终究不够。鉴于强大的激励,大型云提供商仍然会试图收集过多的数据并处理它们。如果世界不想淹没在数字海洋中,很快有一天,关于是否应该对生成数据征税的辩论可能会非常激烈。■