Full Sail 3.0 HQ

Set sail. Arggh

back home

Programming for Music Production:

Building AI-Powered Audio Tools

Balancing professionally mastered music with untreated recordings of humans speaking into a phone is easy. Easy, that is, unless you want to understand what’s being said. I learned this lesson the hard way.

As an ambitious beginner to programming for music production, I bagged a project to balance audio for a startup providing live gym sessions. Despite being a hodgepodge of TypeScript, C++, ffmpeg, VSTs, and AWS CDK code, my solution was effective & I got paid.

Today I write this post to save you that painful introduction and outline the intersection of programming, AI, and music production. I hope to accurately describe the landscape of tools you have at your disposal today as a creative to make the audio tools of tomorrow with AI.

Table of Contents

Evolution of Music Production Tools

Before we start shipping commits and dropping beats, let’s quickly survey the history of music production technology.

Brief history of digital audio workstations

Prior to digital audio workstations (DAWs), music production was an entirely analog art form. Here’s how things evolved from there:

timeline of digital audio workstations

Although digital audio technology revolutionized the field with faster workflows, cheaper storage, easy collaboration and totally new creative possibilities for artists, it still suffers from limitations

Current limitations in traditional music production software

At a high level, here are some relevant limitations of current DAWs:

Potential of AI to Revolutionize Audio Tools

While the popular narrative surrounding Artificial Intelligence tends to focus on unrealistic sci-fi doom scenarios and preemptive “they took our jobs” style grousing, there is a very real potential for AI tools to impact musicians positively.

Imagine an intelligent assistant who gets you, your music, and knows exactly how to translate your concepts into bangers in the studio. Imagine never having to edit drums again. Imagine a little guy who sits on your shoulder and tells you what song will bring the girl in the red dress to the dance floor tonight.

decorative

How do we get there? Follow me, let’s keep going.

Fundamental Programming Concepts for Audio Processing

Let’s discuss a few concepts you’ll need to know to program audio.

Overview of key audio programming languages

During the tool selection phase of any project, two considerations reign supreme:

Python

Python excels in audio work due to its rich ecosystem of audio and AI libraries, coupled with extensive documentation and community support. Its gentle learning curve and readability make it accessible to many developers, facilitating rapid development and team collaboration. For virtually any AI use case, Python will be the standard choice.

C++

C++ is prized in audio programming for its performance and low-level control, critical for real-time processing. It boasts industry-standard audio libraries and is widely used for developing plugins and audio engines, with a large pool of experienced developers and resources available. For virtually any audio manipulation use case, C++ is standard.

SuperCollider

SuperCollider offers a specialized ecosystem for audio synthesis and algorithmic composition, combining a powerful audio server with a flexible programming language. While less mainstream, it’s highly valued in computer music circles for experimental sound design and live coding performances.

Honorable Mentions

I will certainly catch hell online for this, but all the same: You should consider TypeScript. TensorFlow has published JavaScript bindings with types that can be used to train neural networks for deep learning, the technology powering many recent AI innovations. With frameworks like Howler, Tone, and even Dolby products now shipping in JS, it’s easier than ever to ship cross-platform AI tools for music. If your use case can be handled in TypeScript, there are huge distribution benefits to doing so.

decorative

Now, in order to program with any of these tools, we’re going to need a few algorithms.

Basic Audio Processing Algorithms

We won’t spend too long on theory or computer science, but there are a few algorithms you MUST know in order to work effectively in this field.

Fast Fourier Transform (FFT)

FFT transforms a time-domain signal into its frequency-domain representation, revealing the signal’s frequency content. Here’s how we do it:

  1. Divide the input signal into segments
  2. Apply window function to each segment
  3. Pad each segment with zeros if necessary
  4. Compute the discrete Fourier transform
  5. Output the frequency domain representation

Filtering

Filtering selectively attenuates or amplifies specific frequency components of a signal to make it quieter or louder in volume. Here’s how:

  1. Design the filter (determine coefficients)
  2. For each input sample: a. Multiply the sample by the filter’s feed-forward coefficients b. Add the result to previous output samples multiplied by feedback coefficients
  3. Output the filtered sample

Delay

A delay effect creates a time-shifted copy of the input signal, producing echoes or used as a building block for more complex effects. Here’s how:

  1. Create a buffer to store past samples
  2. For each input sample: a. Read the delayed sample from the buffer b. Write the current sample to the buffer c. Output the delayed sample
  3. Update the buffer read/write positions That’s all you really need to know. But how do we translate these concepts to our productions?

Translating concepts to music production

If you understand FFT, Filtering, and Delay, then you know the basics of how almost all audio effects plugins are built. That’s because they form the fundamental building blocks of digital signal processing in its essence. By creatively combining these elements - often with additional modulation and non-linear processing - audio engineers can create complex effects. For instance, a phaser uses all-pass filters and modulation, while a reverb typically employs a network of delays and filters to simulate room reflections. Understanding this fundamental relationship allows audio programmers to efficiently design and implement a wide array of effects, paving the way for more advanced AI-driven audio processing tools.

decorative

Introduction to AI in Audio Processing

Now that we’ve laid a strong foundation, let’s learn how to incorporate AI skillfully.

Machine learning models relevant to audio

Let’s cover the three deep learning architectures most relevant to audio.

Convolutional Neural Networks (CNNs)

Recurrent Neural Networks (RNNs)

decorative

GANs

How AI Can Enhance Audio Processing

Here are just a few ways AI can uniquely enhance audio processing workflows:

  1. Personalization. AI can learn user preferences and adapt programming to your individual tastes, specific genres, or audience.
  2. Restoration. AI can separate and clean up audio sources more effectively than traditional tools, improving restoration of old recordings and cleanup of noise and defects.
  3. Stems. AI can split any track into stems quickly, useful in the studio and on stage.
  4. Creative augmentation. AI can create new sounds, melodies, or arrangements, providing tools for musicians and producers to expand their repertoire easily.

Examples of AI in commercial audio tools

Most importantly, you should know AI is used heavily in commercial tools already. Here are just a few, and this is only barely scratching the surface.

AI commercial tools for audio

Advanced AI Applications in Music Production

Now that we’ve covered the basics, let’s talk about some more advanced applications in the field. decorative

Automated mixing and mastering

I’ve been using tools to automate portions of my post-production process including mixing and mastering since 2018. Early tools were heavily manual, inaccurate, and low-quality. By now, I’d say LANDR is better than most mastering engineers, and iZotope’s Ozone and Neutron are nearly the best channel processing tools for mixing and mastering on the market. Next generations of these tools will be even more powerful.

AI-assisted composition and arrangement

Composition and arrangement follow a more precise mathematical algorithm than most aspects of songwriting and music production. Ironically, the complexity involved here means musicians often navigate these duties intuitively or based off vibes. This approach has worked for me especially when my advanced math and music theory were rather weak. AI tools that can actually grok the advanced mathematical formulae behind what works will support musicians in making more informed choices during the composition and arrangement phases, which will lead to more masterful material and more confident risk-taking by artists.

Intelligent sample selection and sound design

These days, most musicians don’t need to spend that much time designing their own sounds or searching for samples. This is thanks to the provenance of digital services like Splice, where a virtually unlimited number of human crafted and curated samples are available for download at a modest monthly fee. Modern sample ecosystem dynamics primarily present an issue for the most discerning and visionary creators, who may spend weeks looking for or creating a sound that meets their exact specifications. With prompt-driven open source AI tools like Stable Audio Open, any sound you can describe accurately with text is yours to download. Truly the only limit is your imagination.

Ethical Considerations and Future Outlook

Although I doubt AI will eliminate more creative jobs than it creates, there are ethical concerns and considerations we must take into account as technologists and creators when using AI.

Impacts of AI on the music industry and creativity

Extrapolating from the trends we observed at the top of the article:

Concerns about AI replacing human creativity

Countless times in human history, there has been public outcry against new technology on the grounds that it will replace humans and invalidate their labour. And yet, not once has this come to fruition. Let’s consider just a few examples of this mass hysteria:

decorative

Furthermore, what is creativity? Where does it come from? What does it do? Machine learning models are trained to recognize and learn patterns from data, and generative models use these learned patterns to create new data in the same pattern. They fundamentally can’t replace human creativity. We will always need humans to innovate, disrupt, and break the mould. If your role as a musician or artist is limited to replicating known patterns over and over again, I’ve got bad news for you. AI won’t automate your job or replace you, because your job is automated now, and you were replaced long ago. The first duty of an artist is to be original.

My vision for the future of AI in music production

I dream of a future where humans and machines collaborate together in peace and harmony. I see a future where our capabilities are extended, not replaced, by machines. Most importantly, I envision bad ass art made by human and machine collaborators that neither would have a hope of accomplishing by themselves.

Conclusion

Today we covered the history of digital music production, techniques and tools for applying artificial intelligence, and walked through a few sample applications. We covered a little computer science, a little math, and a lot of digital signal processing tools that will help you later. More than anything, I want you to take what you’ve just learned and apply it in your own art, music, and software. There are unlimited possibilities with digital sound, and the only limitation is your imagination. Take your newfound wealth of knowledge and use it to produce abundance. And please, tell your friends. Share your knowledge and experiences. AI doesn’t need to be scary, it doesn’t need to replace humans or human creativity, and if we play our cards right, it could be the greatest boon to music and musicians that this world has ever seen.

decorative