By Miles Kempton, Sound Designer, Grand Central Recording Studios (GCRS)
This year, the two buzzwords - or "concepts" I'd say, are "3D Audio" and its implementation into Virtual Reality (VR). Being a sound designer I'm very interested in the idea of 3D Audio and want to start proceedings with an age-old riddle; "if a tree falls in a forest and no one is around to hear it, does it make a sound?"
To answer this takes you beyond the realm of physics and into the world of perceived reality.
The reality that you live in, outside of your head, has no smell, no colour, no taste and no sound. We go through our lives believing that colour, sound and smell are all occurring in the world around us and we are just witnessing them being created in real time. However, this is supposedly not the case. Our eyes and ears are not just piping in sights and sounds into the brain, instead photons of light and vibrations in air waves are being converted into electrical signals. These signals are what stimulate neurons in our brains to make us perceive and assemble our reality. Remember no light can enter the skull. The physical area that houses the brain is in complete darkness, so it is almost magical how you perceive a truly light and colourful world in front of you!
Anyway, before I digress too far away from the topic and title at hand, I want to quickly mention how we process all of these incoming signals which we use to perceive reality and how it becomes even more fascinating when I tell you that all these streams of different sensory data are processed by the brain at different speeds - and the way we make sense of our reality is by the brain doing some very quick and genius time editing tricks, which actually means we live in the past.
Let me explain - if I clap my hands together you hear and see them clapping in unison. But the brain is actually processing the sight and sound at different times, and the reason we don't see and hear them out if sync is because the brain waits to collect the data from the senses and then unifies them together to create the in-sync reality. Hence why we perceive the world about half a second in the past.
Now that I've hopefully convinced you that your perception of reality is an illusion generated by your brain and senses, I can hopefully persuade you that your primary sense sight (if you have it) can completely transform what you hear. i.e what we see overrides what we hear in the McGurk effect - see clip below:-
This takes us back to "3D Audio" - is there such a thing? Well, to clarify, we hear in stereo, or should I say binaural (our two ears -bi meaning two, and aural), but we perceive in 3D. So therefore when I put on a pair of headphones to listen to a "3D audio mix" why do I not perceive a 3D soundscape?
A lot of the stuff you will hear on the internet, presented as "3D Audio", is either binaural recordings or companies trying to imitate (often poorly) the effect of binaural recordings. Binaural recording methods have been around for ages, it's a makeshift dummy head equipped with a nose and ears in which two mics are placed inside the ears to mimic the way we hear sound waves. After recording these sound waves, playback is to occur with the use of headphones and should be interpreted by the brain exactly as it was heard and recorded and thus produce a perceived 3D sound field of the environment.
So, does it work? Not really. Well, half of it does I suppose. What it does show off fantastically is everything that was happening audibly behind you. You get brilliant depth, and height information that is completely realistic and has a wow-factor if you are hearing it for the first time. What it doesn't do well is that all the sound that was supposedly happening in front of you, is perceived to be coming from behind you. This is known as the front-back confusion.
So what is going on here? In my opinion and that of my colleagues Ivor and Steve, it is a construct of years of Darwinian evolution, coupled with the fact that I touched on earlier whilst I was digressing, which was the fact that the brain needs various other clues as well to create this illusion of reality. And if your primary sense of vision is not confirming the presence of an audio clue, then the illusion breaks down and presumes the audio must be coming from behind you. For example, I go into the Amazonian rain forest and record the jungle with the cicadas and monkeys. A jaguar walks past in front of me, growling from left to right. When I get back to London and playback the audio via my headphones in my office - it would be an unusual, almost hallucinogenic experience - if Kevin from accounts walked in front of me during playback and says 'good morning', to me he'll growl like a jaguar whilst I'm immersed in this 3D audio soundscape of a jungle. As I know Kevin could never realistically growl like a jaguar, my visual clue does not match the clue of the audio. So my brain does not make sense of the illusion so it puts the audio behind me. This is a survival technique because if our brains perceive a sound that we cannot see then it makes the safe guess that it must be coming from behind us - in order for us to try and make sense of the sounds we are perceiving.
So what happens I hear you say, if the audio clues on playback do match a visual response. Well, this I had access to when I first watched VR on an Occulus Rift. I was initially mesmerised and immersed by the visual experience - the audio being either binaural or with some "3D-esque" plugin, because the sound had good depth and solid audio clues behind me. However, when a character in the VR world appeared and stood in front of me and spoke, even though the sync dialogue was coming from slightly behind me - i.e with no externalised frontal localization, I was so immersed in the VR that even though I didn't perceive a true front localization of sound, I conversely didn't have any disconnect or a breakdown in the illusion. It kind of didn't bother me and I didn't initially pick up on it. Same for the people walking around in front of me in the VR world, their footsteps kind of worked (as I said even though I didn't actively get a true frontal image, I didn't need to question it). The only time the illusion broke down was when the characters walked out of the room in front of me and out of shot. Instead of hearing their voices coming from the front, I noticed that, in the mix without the visual clue, I heard them behind me when I knew the source of the audio should be in front. This goes to show that with visual clues the brain helps to steer the audio clues like in the Mcgurk effect.
However, you must remember it's all an illusion. The brain is making the best possible guess it can from all its sensory inputs. Below is a fascinating link showing the Hollow Mask illusion. Where our brain is highly attuned to recognise human faces, a hollow face is not something our brain has past experience recognising so you actually see your brain reprogramming the image. The perception of the real world is a mix of sensory data and opportunistic guesswork:
What we are trying to do with VR is an attempt to copy this illusion of reality and hopefully pull it off. Which at the moment is proving a lot trickier to do especially when it comes to the audio side of it. It's a lot easier to do in the real world... sorry, what real world?