Should data-crunching be done at the centre or at the edge?
ONCE A YEAR the computing cloud touches down in Las Vegas. In early December tens of thousands of mostly male geeks descend on America’s gambling capital in hope not of winnings but of wisdom about Amazon Web Services (AWS), the world’s biggest cloud-computing provider. Last year they had the choice of more than 2,500 different sessions over a week at the shindig, which was called “Re:Invent”. The high point was the keynote featuring AWS’s latest offerings by Andy Jassy, the firm’s indefatigable boss, who paced the stage for nearly three hours.
But those who dare to walk the long city blocks of Las Vegas to the conference venues can connect to the cloud, and thus the mirror worlds, in another way. Push a button to request a green light at one of thousands of intersections and this will trigger software from SWIM.AI, a startup, to perform a series of calculations that may influence the traffic flow in the entire city. These intersections do not exist just in the physical realm, but live in the form of digital twins in a data centre. Each takes in information from its environment—not just button-pushing pedestrians, but every car crossing a loop in the road and every light change—and continually predicts what its traffic lights will do two minutes ahead of time. Ride-hailing firms such as Uber, among others, can then feed these predictions into their systems to optimise driving routes.
AWS represents a centralised model where all the data are collected and crunched in a few places, namely big data centres. SWIM.AI, on the other hand, is an example of what is being called “edge computing”: the data are processed in real time as close as possible to where they are collected. It is between these two poles that the infrastructure of the data economy will stretch. It will be, to quote a metaphor first used by Brian Arthur of the Santa Fe Institute, very much like the root system of an aspen tree. For every tree above the ground, there are miles and miles of interconnected roots underground, which also connect to the roots of other trees. Similarly, for every warehouse-sized data centre, there will be an endless network of cables and connections, collecting data from every nook and cranny of the world.
AWS代表了一种集中模式，所有数据都在几个地点（即大型数据中心）收集和处理。另一方面，SWIM.AI则是所谓的“边缘计算”的一个示例：数据在尽可能接近收集位置的地点实时处理。数据经济的基础设施将在这两极之间铺开。用圣达菲研究所（Santa Fe Institute）的布莱恩·亚瑟（Brian Arthur）最早使用的比喻来说，这会很像白杨树的根系。对于地面上的每一棵树，地下交缠的根系都会绵延数里，并与其他树的根相连。同样，对于每个仓库大小的数据中心，将有数不胜数的电缆和连接构成网络，收集来自世界各个角落的数据。
To grasp how all this may work, consider the origin and journey of a typical bit and how both will change in the years to come. Today the bit is often still created by a human clicking on a website or tapping on a smartphone. Tomorrow it will more often than not be generated by machines, collectively called the “Internet of Things” (IOT): cranes, cars, washing machines, eyeglasses and so on. And these devices will not only serve as sensors, but act on the world in which they are embedded.
Ericsson, a maker of network gear, predicts that the number of IOT devices will reach 25bn by 2025, up from 11bn in 2019. Such an estimate may sound self-serving, but this explosion is the likely outcome of a big shift in how data is collected. Currently, many devices are tethered by cable. Increasingly, they will be connected wirelessly. 5G, the next generation of mobile technology, is designed to support 1m connections per square kilometre, meaning that in Manhattan alone there could be 60m connections. Ericsson estimates that mobile networks will carry 160 exabytes of data globally each month by 2025, four times the current amount.
The destination of your average bit is changing, too. Historically, most digital information stayed home, on the device where it was created. Now, more and more data flow into the big computing factories operated by AWS, but also its main competitors, Microsoft Azure, Alibaba Cloud and Google Cloud. These are, in most cases, the only places so far with enough computing power to train algorithms that can, for instance, quickly detect credit-card fraud or predict when a machine needs a check-up, says Bill Vass, who runs AWS’s storage business—the world’s biggest. He declines to say how big, only that it is 14 times bigger than that of AWS’s closest competitor, which would be Azure (see chart).
那个典型比特的目的地也在改变。过去，大多数数字信息都停留在创建它的设备上。现在，越来越多数据流入由AWS运营的大型计算工厂，还有其主要竞争对手Microsoft Azure、阿里云和Google Cloud。规模列全球最大的AWS存储业务的负责人比尔·瓦斯（Bill Vass）说，在大多数情况下，世界上只有这几个地方拥有足够的计算能力来训练某些算法，比如快速检测信用卡欺诈，或是预测机器何时需要检修。他拒绝透露AWS的存储规模具体有多大，只说是最接近的竞争对手（也就是Azure，见图表）的14倍。
What Mr Vass also prefers not to say, is that AWS and other big cloud-computing providers are striving mightily to deepen this centralisation. AWS provides customers with free or cheap software that makes it easy to connect and manage IOT devices. It offers no fewer than 14 ways to get data into its cloud, including several services to do this via the internet, but also offline methods, such as lorries packed with digital storage which can hold up to 100 petabytes to ferry around data (one of which Mr Jassy welcomed on stage during his keynote speech in 2016).
The reason for this approach is no secret. Data attract more data, because different sets are most profitably mined together—a phenomenon known as “data gravity”. And once a firm’s important data are in the cloud, it will move more of its business applications to the computing skies, generating ever more revenue for cloud-computing providers. Cloud providers also offer an increasingly rich palette of services which allow customers to mine their data for insights.
Yet such centralisation comes with costs. One is the steep fees firms have to pay when they want to move data to other clouds. More important, concentrating data in big centres could also become more costly for the environment. Sending data to a central location consumes energy. And once there, the temptation is great to keep crunching them. According to OpenAI, a startup-cum-think-tank, the computing power used in cutting-edge AI projects started to explode in 2012. Before that it closely tracked Moore’s law, which holds that the processing power of chips doubles roughly every two years; since then, demand has doubled every 3.4 months.
Happily, a counter-movement has already started—toward the computing “edge”, where data are generated. It is not just servers in big data centres that are getting more powerful, but also smaller local centres and connected devices themselves, thus allowing data to be analysed closer to the source. What is more, software now exists to move computing power around to where it works best, which can be on or near IOT devices.
Applications such as self-driving cars need very fast-reacting connections and cannot afford the risk of being disconnected, so computing needs to happen in nearby data centres or even in the car itself. And in some cases the data flows are simply too large to be sent to the cloud, as with the traffic lights in Las Vegas, which together generate 60 terabytes a day (a tenth of the amount Facebook collects in a day).
How far will the pendulum swing back? The answer depends on whom you ask. The edge is important, concedes Matt Wood, who is in charge of AI at AWS, but “at some point you need to aggregate your data together so that you can train your models”. Sam George, who leads Azure’s IOT business, expects computing to be equally spread between the cloud and its edge. And Simon Crosby, the chief technologist at SWIM.AI, while admitting that his firm’s approach “does not apply everywhere”, argues that too much data are generated at the edge to send to the cloud, and there will never be enough data scientists to help train all the models centrally.
这个钟摆会往回摆多远呢？答案取决于你问的是谁。AWS的人工智能负责人马特·伍德（Matt Wood）承认，“边缘”很重要，但“到某些时候，你还是得把数据汇总到一起来训练模型”。领导Azure物联网业务的山姆·乔治（Sam George）认为计算会在云和边缘之间均匀分布。SWIM.AI的首席技术专家西蒙·克罗斯比（Simon Crosby）虽承认自己公司的方法“并不适用于所有地方”，但他认为，在边缘生成的数据太多，无法都发送到云中，并且永远不会有足够的数据科学家来帮助集中训练所有模型。
Even so, this counter-movement may not go far enough. Given the incentives, big cloud providers will still be tempted to collect too much data and crunch them. One day soon, debates may rage over whether data generation should be taxed, if the world does not want to drown in the digital sea.■