Introduction: Designing Awesome Videogame Audio

I've been a videogame designer for the last several years - I've worked on a variety of games, from homebrew stuff for the Game Boy Advance, to really oddball weird stuff like Seaman, for the Sega Dreamcast, to big-budget blockbusters like the Sims 2 for consoles. Recently, I co-founded Self Aware Games with some friends - some game industry vets, and some people new to the game development scene. Our focus was to develop games for the new generation of mobile platforms - like the iPhone and the Palm Pre.

With each new generation of hardware, there are a whole host of weird things to learn about how to make effective games. With our first game, Taxiball, we ended up doing a lot of strange things while creating the soundtrack. Instead of your standard sound effects and musical score, we decided to do something pretty radically different - an all-vocal beatbox soundtrack that is highly responsive to user input.

For the Art of Sound contest, I thought it might be neat to give people a bit of insight into how we put together this unique take on in-game audio, and more importantly, why. While Taxiball isn't primarily a game about music, the music's an integral part of the game - not only does it respond to players' actions, but it also communicates some very specific information back to the player. The Art of Sound, in this case, is the way the audio in Taxiball responds to the player's interactions, and the meaning that it communicates back to the player.

Here's a video of Taxiball's gameplay - a preview video we made just before the game launched - but it's a good representation of the general style of the game's audio:

Taxiball from selfawaregames on Vimeo.

We're really happy with the way the game's turned out - and since we ended up learning so much during the development process, it seemed only sensible to share our experience with others. If you're interested in a bit of a discussion about the design and development process of a game, particularly about something that most people might not give a second thought, hopefully this will be a useful insight into the way things get built.

Step 1: Starting With Nothing...

There's a lot one can say about starting the development process. Self Aware started up in March of 2009 with the goal of developing games for the iPhone (and similar devices) that would allow people to interact with each other in interesting new ways. For the first project, the idea was simple - take something unique about the iPhone, make that really fun, and then integrate that with the first step towards building a rich experience online.

Getting to the core concept behind Taxiball was pretty straightforward - obviously, one of the big things that separates the iPhone from other mobile devices is the accelerometer. If you want to use the accelerometer, the simplest, easiest, and most obvious way to do it is to make a ball-rolling game. There are a lot of other examples of this kind of game on the App Store, and a number of them have been really successful. But we thought they were all missing something important - they were all very ... limited.

By that, I don't mean that they lacked features, or weren't necessarily fun. I mean that in almost all cases, the ball was a simulation of a real ball, and the surface was the simulation of a real surface. Your goal in most cases was to roll your ball to some destination, then switch levels and do the same thing again and again.

Why stop there?

In a videogame, there's no reason that you have to roll a ball around a surface that you'd normally roll a ball around. There's no reason that your destination has to be something, or that you have to fall into a hole, or that when you're done with one challenge, you have to stop and load a new level for another. We weren't constrained by reality! Why were all the games in this genre so boring?

There was a LOT more that could be done with the "tilt & roll" genre - and we intended to do it.

Step 2: A Page on Game Design

So, the opportunity was, in many ways, obvious. Take a familiar control scheme that appealed to people, and a genre of game people already understood, and make it something better than it was by making it less literal, and more fantastic.

By this point, we had a playable ball-rolling tech-demo up and running. It was awesomely un-pretty, but allowed us to make sure the core controls worked properly. Still, we had no setting, and at this point, little idea of what you'd actually *do* rolling the ball around.

There are a lot of possible settings for a game. Space, the future, the past, micro-scale environments, universe-spanning whatchamajigs... your only real limitation is your imagination. But that doesn't mean some settings aren't better than others.

If I tell you, for instance, that you'll be rolling a ball on asphalt, then you go from asphalt to grass, you can already imagine, without any additional information, how the ball will behave. If, on the other hand, I tell you that you'll be rolling on a surface made up of the Essence of Human Suffering, then transition to another surface made up of the tiny legs of the billion residents of the Floogleblornax Zone of the distant Galaxy Z-15 Beta, I have to go into a complex discussion of what the surface friction generated by Human Suffering is, and how the legs of those billion residents are lubricated by a silicon-based sweat induced by their transition from Z-15 Beta to our Solar System, and so that's why you speed up when you go from suffering to blornax. Obviously.

It's a giant mess. While videogames allow you to really go beyond what's physically capable in reality, that doesn't mean that you always should. Being able to leverage what people already know can make things a lot more accessible.

So, we wanted a familiar, understandable setting, but not something like your generic wooden box. We also needed something that you'd actually be *doing* other than pointlessly rolling toward your arbitrary destination.

There are times when you have to try a bunch of different concepts, have a bunch of false starts, and iterate a lot before finding an appropriate mix of setting and gameplay. A previous project I worked on spent a whole year on this process, and we figured it out only towards the end of that first year - then was canceled several months later. Taxiball, on the other hand, came together in about five minutes.

The exchange went something like this:
"How 'bout a city?"
"Ooh! Taxi - you can pick up people, roll them to their destination."
"A Taxi-ball?"

I'd like to say it was more difficult than that, but it wasn't - everyone on the team almost instantly understood what the basic jist of the game would be. Pick up and drop off fares as fast as possible before the clock runs out to earn as much money as you can.

And to this point, little thought had been given to the audio. Our only real thought was, "Hey, we know a guy..."

Step 3: Audiosplosion!

So, when making a game, you need a handful of skills. Maybe you've got all the skills to do it yourself, maybe they're distributed among a couple different people. You need to be able to design the game, write the code, create the art (often including animation), and create memorable audio. We had all these skills except the last one.

A lot of times, when you've got a small company, you do what you can with the resources available to you. When we started, I'd made a lot of the placeholder art. You can see that in the image below. It wasn't pretty, but it was enough to differentiate between surface types where necessary, and make sure the game functioned properly. Once we had a dedicated artist? The next picture shows what kind of difference that makes.

Now, I'm a competent musician - I can play a handful of instruments, and have even written a couple pieces of music. But that was long ago, and the difference between someone who's competent and someone who's excellent is vast.

Fortunately, we knew of a guy who had experience with music who *was* excellent. Wes Carroll's been a beatboxer for years. We'd known him prior to starting the project, and as a talented guy with audio experience, we knew that for Taxiball, we wanted to use his particular skills.

Now, if you listen to Taxiball's final soundtrack, you may think, "Of course - it's obvious that if they knew a beatboxer, this is the soundtrack that would result! It's just full of beatboxing!" But things aren't always as straightforward as they may seem.

Step 4: Thinking in the Box

So, a game needs sound. Not only does sound communicate information (a collision into a wall, or a celebratory cheer), but it adds a richness to the experience nothing else can. An sound designer at my last job claimed that sound accounted for 40% of the experience of a game. I don't know how he'd measure that, but in practice, it feels accurate.

The way we'd been thinking about the sound was pretty straightforward. We'd need the traditional "informative" sounds:

  • Collision with a wall
  • "Rolling" sound on multiple surface types (ice, asphalt, grass)
  • "Plonk" sound for falling into water
  • Fare successfully completed
  • Fare failed
  • Music
  • etc.

"Music" in this case meant a soundtrack to each level - given that a player was going to spend anywhere from a minute to 10 minutes on a level, the music had to be interesting enough to not get annoyingly repetitive in that time. Given the seven levels we'd planned for the game, that was a lot of sound.

The idea was to have Wes, who had a nice microphone and the appropriate sound processing software, do some mix of vocal noises that would add character to the game, and create the more basic sounds, like a ball rolling, using real-world stuff.

Here, you can hear a "rolling" sound - made simply by rolling a marble on a wood surface. It's functional, appropriate, and totally boring.

Step 5: Breaking the Box

When you're working on a game, whether it's something that you're doing on your own, or whether you're working on a 200 person team, one of the biggest issues you will always face is how much stuff to put in the game.

The scope of the game always spirals out of control. "It's just one little thing," may be true - but a hundred "little things" can add up to make even the smallest game huge. Once you get into the details, there's always a big pile of stuff that seemed easier or smaller than it actually is.

For a small startup developer working on their first project, making sure the scope of the game was properly managed was a big, big deal. And looking at the list of audio we needed - varieties of sounds for every possible surface, music for every level - we were way past the amount of time that we had available. So we looked at the list of sounds, and sat there for a little while, wondering what to do.

We'd started development with this idea that we could take a game mechanic and make it less literal than the other people who were doing similar things, and that by doing so, we could make it better. As we sat there that day, that theme came back to us. Maybe we don't need to think about the sound in the obvious way. The other thing - the really obvious one that seems really stupid to have missed in retrospect - hit us in the face. ALL the audio should be vocal, not just parts of the music. All of the music. All of the sound effects. There was no need to "make" any of the sounds in the real world at all.

Wes was a beatboxer, after all - he had a lot of experience making interesting sounds with his voice. Instead of a "realistic" rolling sound, maybe instead, what about a mumbling sound? So we went from a simple, normal sounding "ball rolling on wood" sound to a weird little scat-like bassline. The bassline sped up depending on how fast your were going - a simple side-effect of just replacing the default sound with something more interesting without "fixing" the way the code played the rolling sound.

When your ball rolls faster, the rolling sound was pitch-shifted up, because that's how the sound behaves in the real world. With the vocal track modified in the same way, it had this really interesting effect - the music, was now interactive! The more the player tilted and the faster the ball went, the high-pitched and faster the music played. A sudden change in direction, and the audio would slow down, then speed back up.

A quick change to the code later, and we had the "gain" - the overall volume of the sound - linked to the speed as well. This gave the audio a really unusual effect - almost like you were messing around with a turntable while you were playing.

The video below shows the effect in action.

And yes - it sounds bad in many ways. We'll get to that. :)

From this point on, we weren't thinking about the soundtrack as a literal effect of the things that were happening in the game, but rather, that the audio was this dynamic soundscape that your actions in the game remixed in real time. The link between your actions and the audio became the foundation for the soundtrack, and guided the way we moved forward.

More, the "human-ness" of the vocal beatbox soundtrack provided a really pleasant complement to the super-digital retro style that we'd all grown to love for the visual aesthetic. Digital, or even normal instrumentation made the visuals look very... digital. The contrast and tension between the sound and the graphics provided a direction we were all really psyched about.

The box, at this point, had been broken open.

Step 6: Picking Up the Pieces

Of course, once we broke the box, it meant we were veering off into new territory. And like any new territory, sometimes you get unexpectedly eaten by bears.

There were three major problems we ran into right away:

1.) Pitch shifting was problematic. You probably heard this in the previous clip. We wanted to have something other than just rhythm - a catchy melody of some sort. The problem is, if you're pitch shifting a melody constantly, it starts to sound really irritating - a pleasing melody becomes incredibly annoying when you're constantly messing around with the pitch. Your ear is used to hearing certain intervals as "pleasant" and others as "awful." I believe that's the technical term. And that's when you're dealing with actual notes. Once you start pitch-shifting, you're dealing with intervals between things that are between normal "notes" - the end result is, given the right circumstances, physically repulsive.

What was funny was that for the player, it was a mildly irritating thing - they're occupied playing the game, and since pitch was effectively tied to the physical action of tilting the iPhone, the fact that you'd move your body and the pitch would change "made sense" on some subconscious level. For anyone listening who wasn't playing, though, it sounded *awful*.

2.) Transitions were going to be a problem. We wanted to have the music change every time you picked up or dropped off a fare. With such a discrete event, you couldn't gracefully cross-fade from one track to the other, and if you made a "hard" transition, since you couldn't guarantee that it would happen at the downbeat of a new measure, it sounded really herky-jerky - measures would cut off unexpectedly and restart. Again, to the player, who can see the event that's causing the transition, it's not so bad - but to people who weren't playing, the "stuttering" audio was a mess.

3.) The difference between listening to the audio through the iPhone's headphone jack and through the device's external speakers was ENORMOUS. Things that sounded good on headphones were unintelligible and extremely harsh through the external speakers, and things that sounded good on the speakers were totally imbalanced and "dead" sounding on headphones.

Problems! Argh!

Step 7: Ditching Pitch Shifting

So, the obvious ways to solve the problems we were having with pitch shifting were to either remove pitch shifting, or remove the melodic parts.

On one hand, the pitch shifting worked really well with just a rhythm. It was nicely interactive, and "felt" really good. On the other hand, when all you had was a rhythm, even with pitch shifting, the soundtrack got boring quickly, and without any melodic parts at all, got irritating for a non-player to listen to in short order.

While I think we could have found a way to keep the interactive pitch in the soundtrack, the issue really at some point becomes one of expediency. Almost any problem is solvable with sufficient time - but what does that time actually cost you? This is, almost by itself, the most important lesson that you can learn in game development. Probably in almost any development process. It's not about whether you can do something or not. It's about whether you can do it in a reasonable time, in a reasonable budget.

It's not ideal - everyone wants to do everything the best way - but instead, you have to do them the best way *you can*. Learning that distinction, and being able to remain flexible, will be the difference between finishing a project and being crushed under a mountain of problems. What you have to do is figure out what is truly important to the core of the game, spend your time on that, and cut away things that don't serve that goal. For us, the interactivity was what was important. Not specifically pitch-shifting. It was a fun effect, but not the only one.

I really enjoy drum & bass music. I also really enjoy more traditional rock music. That may not seem relevant, on its face, but it set off a chain of thoughts that went something like this:

"Drum & bass always sounds really fast. It probably *is* really fast. But if you just take the underlying beat, don't change its speed, but add more notes, what kind of effect do you get?"

Well, it's easy enough to try in something like Garageband. Just take a standard rock beat, and add a bunch of looped drums, cymbals and hi-hats.

Here's the "rock" track:

Here's the "drum & bass" layer:

Here you can hear how the two change as you add one layer to the other:

By creating multiple layers, and having their relative volumes change depending on the speed at which the player was rolling around, we were able to keep that really interactive feel to the audio but still keep it locked to the same beat and pitch. This meant we could actually make a catchy melody that didn't shift all over the place, yet we still got that musical reinforcement when your speed increased! By letting go of the initial idea, but remembering why the idea was attractive, we were able to come up with a fast solution that we could spend time polishing - working out the kinks and making it function really well - and not sacrifice a huge amount of development time.

Now the only problem was when you dropped off or picked up a fare - the newly "steady" bassline cutting out of the mix was now really noticeable, and sounded really bad.

Step 8: Audio Spackle

We thought about a number of ways to deal with the audio transitions in the game. When you roll near a fare to pick them up, the hop on your ball. This starts up the "Fare" music, and specifically what audio plays depends on who you picked up. When you're done with the fare - you either drop them off, or fail to deliver them in time - they hop off, and the music returns to "default" mode, with only the basic rhythm playing.

Originally, when we were changing the pitch of the fare in time with the speed, when you stopped, you'd basically stop the track - it'd shift slower and slower until it was stopped. Combined with speed-proportional volume adjustment, it worked great. However, since we got rid of the pitch shifting, the "base rhythm" and bassline both stopped abruptly.

Not good! Solving one problem had created another. But we knew this one would be a lot easier to deal with.

We thought perhaps we could simply start the new rhythm where the old one left off - if you were two measures into a four-measure loop, it'd simply start on the third measure of the new loop, and though the beat would change, it'd still be in sync.

Just a side note: Another really, really important skill to learn for any type of development is how to prototype something properly. It's pretty simple, but a lot of people seem to make a critical error in the process. The wrong way to prototype: implement the final solution. I know that sounds incredibly dumb, but it happens all the time. That's not a prototype. The right way to prototype: Figure out what question you're trying to answer - be very specific - and answer it in the cheapest, easiest, fastest way that is appropriate. For us, the questions were simple: does this sound good, and what's the impact on performance?

This was trivial to try out - we just played every audio file in the game that we'd need at once, starting at the same time to ensure they're all in sync, then adjust the volume and play only the tracks we needed at the time. It worked out well - sounded a LOT better than having the abrupt transitions. Great! Only there were a couple problems: 1.) Due to a technical issue, we couldn't "track" where in the loop we were. After some research, we discovered the cost to get this functionality would be more than we were willing to invest. Worse, 2.) doing it the "easy way" (by playing all the tracks simultaneously and just adjusting the volume as appropriate) took up enough resources that it had a noticeable negative impact on the game's performance. Also unacceptable.

In the end, sometimes the simplest and most obvious solutions are actually the best ones. When a fare ends, we want to celebrate the player's accomplishment - so playing a "reward" sound made sense. This would cover up the "fare end" transition. We tried a whole variety of stuff, from the fare cheering, to saying, "Thanks!" to a simple "cha-ching!" sound.

The sound we ended up settling on was a simple "trumpet fanfare" sound:

The reason is actually pretty simple. Recognizable repetition is really annoying. Having the same "Thanks!" sample play 20 times in the span of 10 minutes drives players completely bonkers. Same with any really distinctive sound - the more distinctive, the worse the problem got. Worse, if you used a human voice, you'd have to have a number of male & female variants, as we had distinctively male and female fares!

The fanfare sound, while distinctive, was much less irritating than any of the actual spoken dialog. Probably has something to do with how people react to language - you try to find some meaning, or depth to the phrase, because you're used to doing that with language, and the short repetition breaks the illusion that there is any meaning. Music, on the other hand - sometimes a sound is just a sound, and the meaning of the "reward" for finishing a fare is immediate and obvious, so it seems like your brain doesn't get as irritated... If anyone has any deeper insight into this, I'd love to hear about it. So we had our "fare end" sound.

All we needed was a "fare start" sound and we'd be good.

In the vein of that sort of "DJ" effect we had with the tracks mixing as the player rolled the ball around, we created a "vinyl scratch" sound - essentially made a "scratch noise" then reversed it - and tried it out. Turns out, it worked great. Sounded almost like the ball was "sucking" the fare towards it with the sound, and the transition in music became almost unnoticeable.

Here's what the transition sounded like before the addition of the transition sound:

Here's the "scratch" sound masking a transition:

Much better! Not only was it substantially "cheaper" to implement than our earlier attempts at a solution, but it sounded really appropriate, and masked the transition so well that it became a total non-issue.

Great! Two down, one to go!

Step 9: Sounding Good...?

So, one of the big strengths of the iPhone/iPod is that you know that it's got the capacity to play excellent-sounding audio, and that players are likely to have headphones they can use because they're probably using the iPod functionality on a relatively regular basis.

But I play a lot of games at home, and pipe the audio through the external speakers, 'cause wearing headphones around the house isn't something I'm used to. So even though the iPhone/iPod is capable of playing beautiful music through the proper outputs, a good portion of the time it sounds like a bunch of people banging tin cans together.

Worse, it wasn't simply that everything just sounded worse across the board - it was that they sounded totally *different*. Low-end audio was completely absent, and the higher frequencies became harsher and less tolerable.

Through the Headphones:

Through the speakers:

What we'd tried to do was actually have each portion of the audio spectrum mean something. The low frequencies - the bassline - would tell you when you had a fare, and what type of fare you had (short distance, med. distance, or long distance). The midrange was the "base beat," which was effectively an audio clock, that reminded you that time was progressing. The high range, or the "drum and bass" beat was a reinforcement of how fast you were going. The more "jangly" high-pitched cymbal and pitched-up drums you had going, the faster you were going. (well, causally-speaking, the reverse of that, but whatever...)

Ideally, you'd hear everything, whether you were listening to it through headphones or the external speakers - even if one source sounded worse. The problem was that through the headphones, you could hear everything and it sounded great. But if you listened to it through the speakers, you couldn't hear the bass, and the high-pitched drum sounds were really tinny and irritatingly harsh.

We rebalanced the audio, mitigating a lot of the high-pitched stuff, and turning the bass up - you literally couldn't hear it at all through the speakers before. The problem was that now that you had something acceptable through the speakers, it sounded overly bassy and totally "flat" without the higher-pitched sounds through the headphones.

While in an ideal world, the solution would have been to actually trigger entirely different audio whether you have a headset plugged in or not, the best solution that was available to us was really quite simple - brute force and iteration.

We'd simply go through every single sample, listen to it under both circumstances, alone and in combination with the other samples, and using an audio editor, manually mute things that were overly harsh on the external speakers, and cranked the bassline and lower frequencies as loud as we could without making things sound bad on the headphones.

There used to be certain fares who would whistle while they were going for a ride - but all the whistling sounds were so harsh through the speakers that they all ended up being removed. It also turned out that having repetitive loops that were that high frequency also got really, really irritating in a way that the lower frequency loops didn't... In the end, anything other than percussion that was in the higher frequencies was cut out, and only the "cymbal" sounds were left in that range.

Whistle Loop:

If you loop that, or play it 10 times, it'll get recognizable and really annoying. Compare that to this:

...which makes you want to claw your ears out a lot less. Couple that with the tinny output of the built-in speakers, and you had a strong indicator that if you wanted a melodic track, it had better stay out of the high frequencies.

In the end, with headphones, the game sounds a little muffled, and without headphones, the basslines are still only barely audible. But it was an acceptable compromise, and the game still sounds great. More importantly, we were able to keep all the layers of "information" that were contained in the audio track without sacrificing much in the way of sound quality.

Step 10: So... That's It?

For the most part, yeah - that's it. In the end, the all-vocal beatbox soundtrack has been a feature that Taxiball players have really enjoyed so far, and lends the game a distinctive flavor. You can still play iPod tracks over it if you want, but the fact that the sound's always changing, and that it reacts to your input into the game, keeps the beatbox audio track a vital and tightly integrated part of the game.

All the sound effects were also done vocally - and are a lot more "literal" than the music track, simply because the information they need to convey *needs* to have a 1:1 relationship with your action on screen. Bump a wall? Hear a bump. Drop into water? Hear a "sploosh." Because they were all made by the same mouth, they had a level of consistency that was really neat, and lent the game's audio a distinctive and very memorable character.

We learned a lot about developing sound for an iPhone game while making Taxiball - we had to deal with the vast difference between the speaker and headphones, ways to cover up otherwise inelegant transitions in the audio, how to separate sounds so that each part of the audio spectrum conveys a different meaning, and what kinds of ways we can have audio react to player input.

I hope this has shone some light on what kinds of thinking go into creating audio for a videogame. In the end, the dynamic beatbox soundtrack's been one of the things that players have responded really positively to, and one of the things that makes Taxiball a unique experience.

Thanks for reading!
Art of Sound Contest

Participated in the
Art of Sound Contest