WAVEIO

The Project

WAVEIO is my specialization project at The Game Assembly in Malmö where I dive head first into the world of audio programming.

Background

My 10+ years of experience as a recording and mixing engineer had taught me the basics of digital audio including audio formats, DSP and what where the usual struggles when working with audio.

Although my experience as a programmer only spans my years at TGA I've taken on the responsibility of implementing audio wrappers for playback as well as the creative task of making game music and custom sound effects during our game projects.

"Audio Is Low Priority"

Well, it shouldn't be! Sound in a game does, in my opinion, very much contribute to the user experience by setting moods and vibes or by accentuating important features that pushes the narrative forward.

Goal

It's hard to fit my goals into this seven week project. I want to develop a whole ecosystem around audio with the ability to swiftly add, edit and refine audio assets for use in games.

While trying to prioritize what to focus on during this time period I arrived at a feature list containing two main elements: the editor and the WAVEIO library.

On a personal level, my wish was to expand my knowledge about digital audio from a programming perspective. Dealing with audio buffers and audio files would be a great way to get more experience in memory handling, file decoding and creating an application focus on optimization and performance.

If you want to know more about the process of creating WAVEIO you should check out my blog The Audio Editor Journey where you'll find weekly posts from the development period. Take me there

I implemented WAVEIO (wave-i-o) as a static library with the intention that it could be used both in my game engine and in the editor. In both cases, the library would be a module ready to be used by other parts of the code base.

The library

I chose to split the library into different parts with very specific tasks. The reasoning behind this was that I wanted to keep the code well separated and to make future expansions easily implemented.

WavePlayer

The main playback system (also wrapping XAudio2) was named WavePlayer and was handed, besides the obvious task of playing back sounds, the task of playing audio data with specific properties such as custom start and end points, fades and varying gain values. I wanted these properties to live outside the actual audio buffers so I needn't ever change the actual audio data in files. This also enables changing the properties during runtime and hearing them inside the game.

AssetHandler

Instead of having audio buffers and textures as members in small classes around the code base I wanted the AssetHandler to have ownership of all assets. This way the audio region class (in the editor) or the AudioSourceComponent (in the engine) only needs to keep track of an index to the asset it is representing. When either one is making a sound it gives its own unique properties to the WavePlayer.

Skärmavbild 2025-04-03 kl. 19.32.11.png

The DSPEngine have functions for manipulating audio buffers and is only used once a property needs to be committed to a buffer.

The FileHandler reads and writes audio files and is where all conversions to internal structures are happening.

The Editor

The editor is a standalone tool for creating and editing sound effects for games. It works as a portal between regular audio files on disk and audio assets ready to be used by the WAVEIO library.

Features

I enabled the editor to accept audio files or wio assets. Once the properties of an audio clip are changed, the user can choose to export it and play it inside a game with retained properties. This is a big advantage as there is no need to make any changes to a source file and it also enables a single audio file to be the source of multiple sound effects.

The editor accepts drag and drop

XAudio2

The system used for audio playback is XAudio2. I chose this mainly because I previously worked with Bass and FMOD, and wanted to learn how to use another audio API.

Since I wanted to start working with digital audio data, I also liked that the API lets the programmer have control of the audio buffers (rather than abstracting it away and just giving you a handle).

In order to be able to easily switch to other APIs in the future I made WAVEIO use its' own structures for handling audio buffers and formats.

The WIO Format

WAVEIO uses its own binary format (.wio, .wioasset) for handling scenes in the editor and for exporting and importing audio assets.

I took on this approach in order to have total control of what the format can handle.

As of today, the format only includes information about the underlying audio sources as well as data connected to the properties of audio regions. In the future this format can be extended to include more data (actual audio buffers, information about audio format etc) to cut away audio file dependencies all together.

Wio Asset Structure

* Number of regions

* Size of ID

* ID

For every region

*Size of Region ID

*Region ID

*Size of File ID*

*File ID*

*Region Start

*Region End

*Gain

*Length

*Fade in

*Fade out

*Audio data *

Regions and waveform rendering

As a music producer I expect to get visible feedback while using the editor and therefore a key feature for me was to enable rendering of waveforms.

The audio data is rendered to a texture which is then displayed inside a region as a resource (shader resource view in DirectX11). This texture, together with a buffer and some information about the format, is what makes up an audio asset in WAVEIO.

Many regions in the editor can refer to the same underlying audio asset. In order to avoid having multiple copies of audio buffers and textures I chose to think of regions as "portions of a buffers". This means that when a region is resized no changes are made to underlying data; it is just a changing value between 0 and 1 controlling which portion of the buffer/texture is being played/displayed.

Audio waveforms being displayed inside regions

WAVEIO in a game engine

To go full circle in this project I implemented WAVEIO in my own engine. This meant I used the editor to create audio assets which were then loaded as part of a game scene.

Creating a set of sounds

By exporting an asset with multiple regions, the user can create a set of sounds for use with "round robin style" cycling. Making the regions differ slightly from each other will make the audible feedback from a game appear more natural.

A big advantage of my definition of an audio region is the fact that multiple regions can use the same underlying audio buffer source, leading to more bang for the buck; one source can produce lots of different audio feedback in the game.

Exporting a set of effort sounds through the editor

Using the exported asset in my engine

Final words

What I learned

This project has taught me a lot of technical stuff regarding audio programming in general. It has also proven to be a useful experience regarding how to develop a tool that is actually usable.

Going into the project I had my assumptions about which tasks would be most difficult to complete. These assumptions where proven wrong again and again, which teached me that some features are very hard to time estimate. Thinking about this also makes me realize that I have developed this skill a lot by just working on this project.

Documenting the journey

Keeping a weekly diary about my progress was a great way to even out my thoughts. Countless of times did I manage to solve problems by just reading my previous posts and countless of times did I also get good ideas while writing a new post.

Future improvements

Although WAVEIO along with the editor are tools ready for use today, I have lots of stuff I want to improve or implement in the future.

Zoom

Zooming in on an audio region does not work as one would expect right now. As a user you want to be able to zoom in and be very perticular about where regions begin. The way to fix this, I imagine, would be to utilize multiple mip levels for the waveform textures so that higher resolution textures are displayed when the user is zoomed in.

32 bit floating point

I would like to move the editor towards always be working with 32 bit floating point data, which I imagine is what all digital audio will be about in the future. This would of course mean that all audio sources imported into the editor would need conversion but would also result in a format that doesn't need to know about bit depth any more.

DSP effects

By expanding the editor and the wio format with more features such as pitch changes, reverbs or filters, the usability of the editor will increase a lot. This is especially true when exporting sets of sounds where a slight change in reverb decay or pitch can be enough to create variety from the same audio source.

Format

I want to extend the wio format to include actual audio data in the future. By doing this, the format will remove external dependencies of audio files all together.

The structure I use to store information about audio sources could also be refined a lot. As of now, these structures look much like the header and format chunks in a .wav file but they could of course be changed into enums or bit fields. This would lead to smaller file sizes, faster reading and writing to files as well as easier handling once the data is inside the library.

Optimizations

With future exended audio data manipulation comes more basic calculations performed by the CPU. With this comes also good places to optimize how the computations are performed.

I would love to utilize intrinsic functions for super fast parallell computations as well as the usage of compute shaders when I extend the editor with more features.