tldr: ready-to-use pipeline is demucs + basic pitch + omnizart drum + omnizart chord + all-in-one (+ sheet sage chords). or buy RipX license, it’s worth it
Suppose you have a wav/mp3 of Western music and you want to produce a midi.
Well, take RipX DeepRemix and you’ll get a certain quality.
Questions:
The rest of this doc focuses on how to build your own RipX from existing tools.
Demucs v4 (2022) ▶️ splits into bass, drums, vocals and other. Previously spleeter (2020) was widely used - it can do additional fifth stem with piano.
Next, you want to get beats and downbeats (measures) - in millisecond timestamps. And verse/chorus/bridge form annotation, ideally. Well, since July 2023 you can get All-in-One (2023) 🤗:
There’s also a nice visualizer:
Other:
Questions:
For each of non-drum parts you want pitch recognition. General-purpose SOTA is Basic Pitch:
Other:
Historically, a lot of effort was put specifically into solo piano recognition. So now we have Onsets and Frames (2018) 🐟
Other:
Doing it straight away via MT3 gives very bizarre results.
So: get your parts Basic Pitched, get your drums separately, get your downbeat timings, and then use gpt4 mido scripting / midisox / Ableton (XML-hackable!) to merge parts back together.
Caveat: Basic Pitch produces midi with resolution 480, whereas omnizart outputs in 220. Merging them can lead to errors when using online tools
Possibly, with “omnizart chord” and any beat tracking you can already build a clone of Chordify and even improve it by adding form annotation.
Also, a paper by Chordify (2014) with original references.
It’s tough and not solved holistically yet.