Speculative: SciML for Physical Modeling Synthesis

Aug 10, 2020   #SciML  #JuliaLang  #Zygote  #Physical modeling synthesis 

Tools in the “Scientific Machine Learning” (SciML) space have been getting more sophisticated and general purpose. While going through some material regarding “universal differential equations”, I couldn’t help dream up potential applications to tough problems in physical modeling synthesis of musical instruments .. particuarly interested in the vina family of instruments. Noting down some speculative thoughts here that I hope will refine and be replaced with more accurate ones as I learn more.

What’s physical modeling synthesis

The electronic and computer music community has given us plenty of synthesis techniques such as sampling synthesizers, wavetable synthesis, FM synthesis, granular synthesis and then some .. not to mention a plethora of audio processing (ex: digital filters) and combinatory tools (ex: convolution).

One technique to synthesize “realistic” sounds for musical purposes is to model the physics of a musical instrument and its control surface - likely using partial differential equations - and solve it numerically using finite methods to produce the instrument’s output. The promise of physmod synthesis is that it can give rich and realistic feeling sound synthesis, especially when clubbed with good controllers.

For example, the one-dimensional wave equation shown below can be used to describe the vibration of a string with linear characteristics.

$$ \frac{\partial^2{u}}{\partial{x}^2} = \frac{1}{v^2}\frac{\partial^2{u}}{\partial{t}^2} $$

where

$$ v = \sqrt{\frac{T}{\rho}} $$

where \(v\) is the speed of sound on the string, \(T\) is the string tension and \(\rho\) is the “linear density” of the string (mass per unit length). We therefore have

$$ f_0 = \frac{v}{2L} $$

where \(f_0\) is the fundamental vibrating frequency of the string of length \(L\). .. so we can manipulate the vibrating pitch by either controlling the string tension (ex: pulling on it) or by changing the vibrating length (ex: sliding).

The aspect that complicates the promise of physical modeling is that coming up with realistic phsyical models is usually pretty hard, with common approaches being closer to “physically inspired synthesis” than actual physical modeling. One reason this is complicated is that, at least for some instruments, the interesting aspects of their sound stem from non-linearities in the system. These non-linearities don’t necessarily come from deviations in the elastic behaviour of the materials that the instruments are made of, but can be designed into the structure of the instrument itself - like the curved bridge of the tanpura and vina that lend a nasal and drone-like quality to their sounds.

Some techniques like waveguide synthesis are similar to FM synthesis in that they can be used to produce a variety of instrumental sounds .. if you can live with some loss of realism .. but I’m unaware of general purpose modeling approaches that can shoot for the moon in sound and control fidelity.

Scientific Machine Learning .. or SciML

The thought process goes roughly like this -

  • Neural networks are pretty good “universal (function) approximators” (UAs) but need excessive data (“big data”) to be of any use in many cases.
  • Scientific models are very useful but are sparse on parameters and so have to overlook fine grained effects.
  • What if we use neural networks to model unknown correction factors in known scientific models and fit their parameters from data through model training? That’s the launching point for SciML.

https://sciml.ai provides a good overview of the areas of impact of these tools and approaches.

Machine Learning is about constructing compact computational models that can mimic training datasets. As Christopher Rackauckas et al put it eloquently in the UDEs for SciML paper,

In the context of science, the well-known adage “a picture is worth a thousand words” might well be “a model is worth a thousand datasets.”

I think this adage applies to physical modeling synthesis too, since the benefit of having a model is to gain good control over it to produce high quality musical output. An opaque model like what a conventional ML model might produce may not give satisfactory handles or require too much data to get a modest control space.

So a compact model not unlike what scientists (physicists in particular) desire would appear to be of greater use. This is what got me excited when I looked at the ecosystem of tools that are now available in this space that can combine known science (ex: wave equation) with ML models trained from data to stand in for the unknown aspects of systems with partially known physics (like .. non-linear couplings). This sounds like the best of both worlds, and in our wave equation example might look like -

$$ \frac{\partial^2{u}}{\partial{x}^2} = \frac{1}{v^2}\frac{\partial^2{u}}{\partial{t}^2} + UA(u, x) $$

where \(UA\) is used as an abbreviation for “Universal Approximator” .. which a neural net is capable of being. The “t” variable can also be included in \(UA\) as \(UA(u,x,t)\), but I think time independent non-linearities would suffice. Then again, my thinking about this problem, formulated in this way, is very nascent and so I could be (and perhaps am) very wrong. My minimal hope then is to understand to the extent that I know how wrong I am and whether this is a promising direction at all. I’m also curious about what kind of computational complexity this will entail, given that 5 seconds of 48KHz audio is already 1/4 million data points.

SciML and its tools

There are some very impressive tools / projects either already there or are coming up in this space .. they seem to be mostly written as Julia packages -

  1. Zygote.jl - a Julia library for whole program automatic differentiation (forward and reverse) - is used throughout the SciML ecosystem to auto calculate gradients from loss functions (“adjoints”) to seek parameters.
  2. DifferentialEquations.jl and family (including Optim.jl) cover PDE and ODE solvers and other common optimizer algos.
  3. Turing.jl and family cover probabilistic modeling. (Ex: I’ve been tracking Covid19 Rt for India and states here https://labs.imaginea.com/covid-19/ for a while now. Turing.jl code included for the simplistic prob model included on the page to give a taste.)
  4. FluxML - Julia’s “deep learning” library similar to Keras but simple enough to be something we can sit down and write ourselves! FluxML uses Zygote.jl for backprop.

Practically all of these in the ecosystem are in Julia .. which “looks like Python and walks like C” but has a major trick in “multiple argument dispatch” up its sleeve that is pretty much single handedly responsible for the synergy in the tools in this ecosystem - methinks.

Yes they all work together incredibly well! .. so well that in a few lines of code you can implement a differential equation system which has a neural network component and have it trainable from data .. or implement a custom layer type such as graph neural networks and plug it into FluxML.

The ecosystem has another sort-of-magic algorithm called SInDy (sparse identification of nonlinear dynamics) using which you can take a fitted NN that models the system’s nonlinearities and get back minimal closed form terms .. sounds really cool! Doing this increases explainability of the model.

I suspect SInDy might be useful in the audio space too to get models that can be implemented efficiently as well as provide richer control over the instrument.

Goal

I’m interested in applying the tools in the SciML ecosystem to get a rich sounding vina .. my vina. By that I mean I hope to train a dynamics model on the actual sound my instrument produces. I’m also curious about applying it to skin-drums such as the Mrdangam which have a rich timbre. I also see this as a way to document and archive ageing instruments if we can make realistic models out of recorded sounds from them.