Tricking the brain about where a recorded sound is coming from can enrich the listener’s experience
迷惑大脑,为它营造出虚幻的声音方位,可以丰富听者的体验

听觉盛宴 An auricular spectacular-书迷号 shumihao.com

HUMAN BEINGS are good at locating the sources of sounds. Even when blindfolded, most people can point to within ten degrees of the true direction of a sound’s origin. This is a useful knack for evading danger. It is also an extraordinary cerebral feat. Partly, it is a matter of detecting minute differences of volume in each ear. Partly, it comes from tiny disparities in the time it takes a sound to reach two ears that are not equidistant from its source. The heavy lifting of sound-location, however, involves something else entirely.
人类擅长定位声音的来源。即使被蒙上眼睛,大多数人也能准确判断声音的方位——误差不超过十度。这是一项实用的避险本领,也是大脑的一个非凡技艺。它部分是基于察觉音量在两耳中的细微差异,部分是因为两耳与声源的距离不等,导致声音在到达两耳时有细小的时间差。但声源定位中最难的部分完全缘于另一个因素。

Audio buffs call it the head-related transfer function. A sound is modulated by the body parts it encounters before it reaches the eardrums. In particular, the various tissues of the head attenuate higher frequencies, weakening the top notes of sound waves that have passed to an eardrum through the skull compared with those from the same source that have arrived directly through the air. The cartilaginous ridges, troughs and protuberances of the outer ear also alter sound before it is transduced into nerve signals. Sounds arriving from different angles are therefore modified in consistent ways that the brain learns to recognise.
音频行家称之为“头相关传输函数”。声音在到达鼓膜前会先被它经过的人体部位调制。尤其是头部的各种组织会削弱较高频的声音。因此,相比来自同一声源但通过空气直接传输到达鼓膜的声波,那些经过颅骨到达的声波的最高音部分会被减弱。在声音被转换成神经信号前,外耳的软骨嵴、耳窝和隆起也会改变声音。因此,来自不同角度的声音以始终如一的方式被调节,而大脑学会了识别这些差异。

For all of their acoustic spatial awareness, however, brains can still be fooled by appropriate technology into believing a sound is coming from somewhere that it is not. That sounds like the basis of a big business. And it is.
然而,尽管大脑具备感知听觉空间的能力,但恰当的技术仍能欺骗大脑,让它错误判断声音的源头。这听上去像是能发展出一门大生意。事实也的确如此。

Sounds good
听起来不错

One way to simulate the “immersive” sound of reality through a pair of earbuds is by using a pair of recordings made with microphones embedded in the ear canals of a special dummy head. These heads are made to have the same shape and density as those of their flesh-and-blood counterparts. That means they modulate sound waves passing through them in a realistic manner. Recordings made using them therefore log what would arrive at the ear canals of someone listening to the sound in question for real. When they are played back, what a user hears recapitulates that experience, including the apparent directions from which the sounds are coming.
若要用一副耳机来模拟“沉浸式”实境音效,方法之一是在一个专用的仿真人头的耳道里植入一对麦克风,用这对麦克风录制一组录音。这些仿真人头的形状和密度都制作得和有血有肉的真人头部一样,这样它们就会像真人头部那样调制通过其中的声波。由此得到的录音记录了在现实环境中声音传到听者耳道时的状况。当这些录音通过耳机回放时,那种听觉体验就会在听者耳中重现,包括这些声音听起来是从哪些方向传来的。

Dummy-based binaural recordings of this sort have been around for a while. But making them is clunky. It is also expensive. A good dummy head can cost $10,000, and time in a professional recording studio is hardly cheap. These days, though, the process can be emulated inside a computer. And that is leading to a creative explosion.
这种基于人体模型的双声道录音已经面市一段时间了。但制作这种录音既繁重又昂贵。一个像样的仿真人头的造价可能要一万美元,而且租用专业录音棚也不便宜。不过,现在可以用计算机模拟这个过程。这带来了创新的激增。

The trick that the emulator must master is a process called phase modulation. This involves retarding a sound’s high, medium and low frequencies by the slight but varying fractions of a second by which those frequencies would be delayed by different parts of the ears and head in reality. So writing the appropriate software starts by collecting a lot of data on how sound waves interact with a human head, and that means going back to the studio to conduct special binaural recordings, often using people instead of dummies. The resulting signals can then be decomposed into their component frequencies, which yields an understanding of how to modulate a given frequency to make it seem as if it is arriving from a particular location.
用计算机模拟必须掌握一种叫作相位调制的方法。它需要把声音的高、中、低频部分以不同的幅度各做几分之一秒的短暂延迟,和现实中耳朵和头的不同部位对不同频率声音的延迟一样。因此,要编写出合适的软件首先要收集大量声波与人类头部相互作用情况的数据,这意味着还是要回到专业录音棚完成专门的双声道录音,这往往使用的是真人而不是仿真人。生成的信号而后被分解成不同的频率,从而得知应该如何调制某个频率,使它听上去似乎来源于一个特定的位置。

Demand for software to mix sound in this way has shot up, says Lars Isaksson of Dirac Research, a firm in Uppsala, Sweden. Dirac developed its own version of such software, known as Dirac 3D Audio, by usinga year’s worth of recordings it made that encompassed each degree of rotation, both side to side and up and down, around a listener’s head. This panaudicon provided, Mr Isaksson says, notable smoothness in the simulated movement of sound sources. Makers of video games are a big market for such stuff.
位于瑞典乌普萨拉(Uppsala)的Dirac Research公司的拉尔斯·伊萨克松(Lars Isaksson)表示,对这样的混音软件的需求已经激增。Dirac公司自己也开发了一款名为Dirac 3D Audio的混音软件,利用了它制作的长达一年的录音,其中的音源围绕听者头部上下左右各个旋转角度变换方位。伊萨克松表示,这种全景式声音在模拟声源运动时的流畅性非常出众。电子游戏开发商是这类软件的大客户。

Dirac is not alone. Half a dozen other firms, including Dolby Laboratories of America and Sennheiser of Germany, also now make immersive software. To use it, a sound engineer employs a graphic interface that includes a representation of a sphere surrounding an icon representing the listener. The engineer uses a mouse to move sound channels—vocals, percussion and so on, if the product is music—to the points in the sphere from which their outputs are intended to originate. Software of this sort provides a way to take any recording and “project it in 3D”, says Véronique Larcher, co-director of Sennheiser’s division for immersive audio.
Dirac并不是独此一家。包括美国杜比实验室(Dolby Laboratories)和德国森海塞尔(Sennheiser)在内的其他五六家公司也在设计沉浸式软件。在使用这类软件时,录音师打开一个图形界面,其中包括一个球体,包裹着一个代表听众的图标。录音师用鼠标把声道(如果制作的是音乐,那么这些声道就是人声、打击乐等)移动到球体中不同的点上,这样这些声道上的声音输出就会从这些位置发出。森海塞尔沉浸式音频部门的联合负责人维罗妮卡·拉尔谢(Véronique Larcher)说,使用这类软件可以处理任何录音,让它“以3D形式呈现”。

Sennheiser’s product is called AMBEO. Dolby’s is called Atmos. This has generated the soundtracks of more than 20 video games and 2,500 films and television shows, as well as many pieces of music. Immersive sound may even come to videoconferencing. Dirac is promoting software that makes the voices of participants seem to emerge from the spots on the screen where their images appear. The software uses a laptop’s camera to track listeners’ heads. To those who look, say, left, it will sound as though their interlocutors are off to the right. Dirac is in talks with videoconferencing firms including BlueJeans, Lifesize and Zoom.
森海塞尔的产品叫AMBEO。杜比的叫Atmos。除了许多音乐作品,还有超过20款电子游戏、2500部电影和电视节目使用了这种软件来生成配音。沉浸式声音甚至可能用于视频会议。Dirac正在推广一款软件,可以让与会者的声音听起来好像是从他们在屏幕上的图像位置发出来的。该软件使用笔记本电脑的摄像头跟踪听者的头部。比如,在那些往左边看的人听来,他们的谈话对象好像在向右边移动。目前Dirac正在与BlueJeans、Lifesize和Zoom等视频会议公司洽谈。

Facebook, a social-media company, is also designing “spatialised audio” for video calls that use its Oculus virtual-reality headsets. Ravish Mehra, head of audio research at Facebook Reality Labs, is coy about how long it will take his team to perfect the aural illusion that this is intended to create. But he says software the firm has in development can modify the frequencies and volumes of sounds so that they match the virtual surroundings chosen for a call, as well as the speaker’s perceived position. The acoustics of a beach, he notes, are unlike those of a room.
社交媒体公司Facebook也在为自己的Oculus VR头显的视频通话设计“空间音频”。Facebook现实实验室(Facebook Reality Labs)的音频研究负责人拉维什·梅赫拉(Ravish Mehra)不愿透露他的团队需要多长时间才能完善“空间音频”想要营造的听觉幻象。但他表示,Facebook正在开发的软件可以调整声音的频率和音量,使之与通话者选择的虚拟环境以及通话者所显示的位置相匹配。他指出,在海滩和在室内的声学设计是不一样的。

Tin pan alley
叮砰巷

Such stuff is for the professionals. But amateurs can play too. For the man or woman in the street who wants to jazz up a record collection, many simpler programs now permit people to give a more immersive feeling to their existing recordings by running them through software that modulates the sounds of those recordings to achieve that end.
这些软件是为专业人士设计的,但业余爱好者也可以玩。对于那些想为自己收藏的唱片增添趣味的普通人来说,现在有很多更简单些的程序可以做到这一点,它们能给唱片调音,带来更多身临其境的感觉。

Programs of this sort cannot handle different parts of a recording differently in the way that studio-based systems manage, but they do create an illusion of sonic space around the listener. Isak Olsson of Stockholm, who has put together two such packages, 8D Audio and Audioalter, describes them as seeming to increase the size of the room. This helps to overcome a phenomenon known as the “in-the-head experience”. And, as Michael Kelly, head of engineering at Xperi, an immersive-software firm based in California, observes, sounds that appear to come from outside the head are more comfortable.
尽管这类程序无法像录音棚级别的系统那样对唱片的不同部分采取不同的处理方式,但它们确实在听者周围营造了一种声音的空间幻象。斯德哥尔摩的伊萨克·奥尔森(Isak Olsson)把两个这样的软件包——8D音频和Audioalter——整合在一起用,他说这似乎让房间变大了。这有助于克服一种叫作“头中效应”的现象。并且,正如加州的沉浸式软件公司Xperi的工程主管迈克尔·凯利(Michael Kelly)所说,感觉上来自头脑之外的声音听起来更舒服。

At the other end of the technological scale from such do-it-yourself kits, a number of firms, Dirac, Dolby, Facebook, Sony and Xperi among them, are working on a bespoke approach to sonic immersion. They are tailoring it, in other words, to an individual listener’s anatomy.
不同于这样的DIY设备,在技术“音阶”的另一端,包括Dirac、杜比、Facebook、索尼以及Xperi等在内的许多公司正在研发沉浸式定制音效。换言之,它们正在根据听者各自不同的身体构造为他们量身定做沉浸式音效。

One method, that being used by Sony, is to ask potential customers to upload photographs of their ears. Another, which may be adopted by Xperi, is to repurpose data from the face-recognition systems that now unlock many people’s smartphones. If this way of thinking works, it will bring with it the ultimate in high fidelity. This is a recognition that, in the real world, even if what they are hearing is the same set of sound waves, every listener’s experience is different—and that this needs to be replicated in the world of recorded sound, too. With that realisation, acknowledgment of the head-related transfer function’s importance has reached its logical conclusion. And the term “headbanging” may take on a new and positive meaning.■
索尼正在采用的方法是请潜在客户上传他们耳朵的照片。Xperi可能会采用另一种方法,把现在很多人用来解锁手机的人脸识别系统的数据改用于设计音效。如果这类思路奏效,它将带来最高的保真度。这也就等于承认在现实世界中,即使听到的是同一组声波,不同听者的体验也各不相同——而这一点也需要在录音的世界里加以复制。实现了这一点后,对“头相关传递函数”重要性的认知才达到了它唯一合理的结论。而“重金属乐迷的疯狂甩头”这个词可能会有新的正面含义。