Tricking the brain about where a recorded sound is coming from can enrich the listener’s experience
HUMAN BEINGS are good at locating the sources of sounds. Even when blindfolded, most people can point to within ten degrees of the true direction of a sound’s origin. This is a useful knack for evading danger. It is also an extraordinary cerebral feat. Partly, it is a matter of detecting minute differences of volume in each ear. Partly, it comes from tiny disparities in the time it takes a sound to reach two ears that are not equidistant from its source. The heavy lifting of sound-location, however, involves something else entirely.
Audio buffs call it the head-related transfer function. A sound is modulated by the body parts it encounters before it reaches the eardrums. In particular, the various tissues of the head attenuate higher frequencies, weakening the top notes of sound waves that have passed to an eardrum through the skull compared with those from the same source that have arrived directly through the air. The cartilaginous ridges, troughs and protuberances of the outer ear also alter sound before it is transduced into nerve signals. Sounds arriving from different angles are therefore modified in consistent ways that the brain learns to recognise.
For all of their acoustic spatial awareness, however, brains can still be fooled by appropriate technology into believing a sound is coming from somewhere that it is not. That sounds like the basis of a big business. And it is.
One way to simulate the “immersive” sound of reality through a pair of earbuds is by using a pair of recordings made with microphones embedded in the ear canals of a special dummy head. These heads are made to have the same shape and density as those of their flesh-and-blood counterparts. That means they modulate sound waves passing through them in a realistic manner. Recordings made using them therefore log what would arrive at the ear canals of someone listening to the sound in question for real. When they are played back, what a user hears recapitulates that experience, including the apparent directions from which the sounds are coming.
Dummy-based binaural recordings of this sort have been around for a while. But making them is clunky. It is also expensive. A good dummy head can cost $10,000, and time in a professional recording studio is hardly cheap. These days, though, the process can be emulated inside a computer. And that is leading to a creative explosion.
The trick that the emulator must master is a process called phase modulation. This involves retarding a sound’s high, medium and low frequencies by the slight but varying fractions of a second by which those frequencies would be delayed by different parts of the ears and head in reality. So writing the appropriate software starts by collecting a lot of data on how sound waves interact with a human head, and that means going back to the studio to conduct special binaural recordings, often using people instead of dummies. The resulting signals can then be decomposed into their component frequencies, which yields an understanding of how to modulate a given frequency to make it seem as if it is arriving from a particular location.
Demand for software to mix sound in this way has shot up, says Lars Isaksson of Dirac Research, a firm in Uppsala, Sweden. Dirac developed its own version of such software, known as Dirac 3D Audio, by usinga year’s worth of recordings it made that encompassed each degree of rotation, both side to side and up and down, around a listener’s head. This panaudicon provided, Mr Isaksson says, notable smoothness in the simulated movement of sound sources. Makers of video games are a big market for such stuff.
位于瑞典乌普萨拉（Uppsala）的Dirac Research公司的拉尔斯·伊萨克松（Lars Isaksson）表示，对这样的混音软件的需求已经激增。Dirac公司自己也开发了一款名为Dirac 3D Audio的混音软件，利用了它制作的长达一年的录音，其中的音源围绕听者头部上下左右各个旋转角度变换方位。伊萨克松表示，这种全景式声音在模拟声源运动时的流畅性非常出众。电子游戏开发商是这类软件的大客户。
Dirac is not alone. Half a dozen other firms, including Dolby Laboratories of America and Sennheiser of Germany, also now make immersive software. To use it, a sound engineer employs a graphic interface that includes a representation of a sphere surrounding an icon representing the listener. The engineer uses a mouse to move sound channels—vocals, percussion and so on, if the product is music—to the points in the sphere from which their outputs are intended to originate. Software of this sort provides a way to take any recording and “project it in 3D”, says Véronique Larcher, co-director of Sennheiser’s division for immersive audio.
Dirac并不是独此一家。包括美国杜比实验室（Dolby Laboratories）和德国森海塞尔（Sennheiser）在内的其他五六家公司也在设计沉浸式软件。在使用这类软件时，录音师打开一个图形界面，其中包括一个球体，包裹着一个代表听众的图标。录音师用鼠标把声道（如果制作的是音乐，那么这些声道就是人声、打击乐等）移动到球体中不同的点上，这样这些声道上的声音输出就会从这些位置发出。森海塞尔沉浸式音频部门的联合负责人维罗妮卡·拉尔谢（Véronique Larcher）说，使用这类软件可以处理任何录音，让它“以3D形式呈现”。
Sennheiser’s product is called AMBEO. Dolby’s is called Atmos. This has generated the soundtracks of more than 20 video games and 2,500 films and television shows, as well as many pieces of music. Immersive sound may even come to videoconferencing. Dirac is promoting software that makes the voices of participants seem to emerge from the spots on the screen where their images appear. The software uses a laptop’s camera to track listeners’ heads. To those who look, say, left, it will sound as though their interlocutors are off to the right. Dirac is in talks with videoconferencing firms including BlueJeans, Lifesize and Zoom.
Facebook, a social-media company, is also designing “spatialised audio” for video calls that use its Oculus virtual-reality headsets. Ravish Mehra, head of audio research at Facebook Reality Labs, is coy about how long it will take his team to perfect the aural illusion that this is intended to create. But he says software the firm has in development can modify the frequencies and volumes of sounds so that they match the virtual surroundings chosen for a call, as well as the speaker’s perceived position. The acoustics of a beach, he notes, are unlike those of a room.
社交媒体公司Facebook也在为自己的Oculus VR头显的视频通话设计“空间音频”。Facebook现实实验室（Facebook Reality Labs）的音频研究负责人拉维什·梅赫拉（Ravish Mehra）不愿透露他的团队需要多长时间才能完善“空间音频”想要营造的听觉幻象。但他表示，Facebook正在开发的软件可以调整声音的频率和音量，使之与通话者选择的虚拟环境以及通话者所显示的位置相匹配。他指出，在海滩和在室内的声学设计是不一样的。
Tin pan alley
Such stuff is for the professionals. But amateurs can play too. For the man or woman in the street who wants to jazz up a record collection, many simpler programs now permit people to give a more immersive feeling to their existing recordings by running them through software that modulates the sounds of those recordings to achieve that end.
Programs of this sort cannot handle different parts of a recording differently in the way that studio-based systems manage, but they do create an illusion of sonic space around the listener. Isak Olsson of Stockholm, who has put together two such packages, 8D Audio and Audioalter, describes them as seeming to increase the size of the room. This helps to overcome a phenomenon known as the “in-the-head experience”. And, as Michael Kelly, head of engineering at Xperi, an immersive-software firm based in California, observes, sounds that appear to come from outside the head are more comfortable.
尽管这类程序无法像录音棚级别的系统那样对唱片的不同部分采取不同的处理方式，但它们确实在听者周围营造了一种声音的空间幻象。斯德哥尔摩的伊萨克·奥尔森（Isak Olsson）把两个这样的软件包——8D音频和Audioalter——整合在一起用，他说这似乎让房间变大了。这有助于克服一种叫作“头中效应”的现象。并且，正如加州的沉浸式软件公司Xperi的工程主管迈克尔·凯利（Michael Kelly）所说，感觉上来自头脑之外的声音听起来更舒服。
At the other end of the technological scale from such do-it-yourself kits, a number of firms, Dirac, Dolby, Facebook, Sony and Xperi among them, are working on a bespoke approach to sonic immersion. They are tailoring it, in other words, to an individual listener’s anatomy.
One method, that being used by Sony, is to ask potential customers to upload photographs of their ears. Another, which may be adopted by Xperi, is to repurpose data from the face-recognition systems that now unlock many people’s smartphones. If this way of thinking works, it will bring with it the ultimate in high fidelity. This is a recognition that, in the real world, even if what they are hearing is the same set of sound waves, every listener’s experience is different—and that this needs to be replicated in the world of recorded sound, too. With that realisation, acknowledgment of the head-related transfer function’s importance has reached its logical conclusion. And the term “headbanging” may take on a new and positive meaning.■