Notebookcheck Logo

MIT's new AI system can tune individual musical instruments straight out of a concert video

PixelPlayer can recognize pixels making specific soundwaves. (Source: MIT CSAIL)
PixelPlayer can recognize pixels making specific soundwaves. (Source: MIT CSAIL)
A new deep learning AI algorithm developed by MIT CSAIL called PixelPlayer can isolate soundwaves of individual instruments in a video by recognizing the pixels from where the sound is coming thus enabling tuning of individual frequencies. MIT says the system is 'self-supervised' and doesn't require any human annotations.

Ever wished for an easy way to tune the guitar or the saxophone in an old video footage lying in the attic instead of having to re-master the entire audio track? The Computer Science and Artificial Intelligence Laboratory (CSAIL) of the Massachusetts Institute of Technology (MIT) has developed a new deep learning artificial intelligence (AI) algorithm that might just be what the doctor ordered. 

CSAIL calls the system PixelPlayer and it has the capability to identify, isolate, and tune individual musical instruments from a footage with just a click. CSAIL researchers lead by Hang Zhao say that the program has received over 60 hours of video training and it can perform instrument isolation by identifying from which pixels the particular soundwaves emanate from — all without any human supervision or annotation even on never-before-seen videos. 

The ability to isolate specific musical instruments from a video recording opens up immense possibilities, the researchers say. It gives engineers an easy way to repair/restore old concert footage or even swap instruments to preview what they sound like. The team says that in its current form, PixelPlayer can distinguish between sounds of more than 20 common instruments and it has the potential to 'learn' more if sufficient training data is provided. It does, however, face certain challenges with respect to identifying subtle differences between instrument subclasses. While there have been previous attempts to isolate soundwaves using AI from audio files, the inclusion of the visual element makes PixelPlayer 'self-supervised'. This 'self-supervision' adds a whole new complexity to the mix as it makes it difficult for the team to understand every aspect of how the system learns. Sounds a lot like Skynet, doesn't it?

Zhao says that PixelPlayer uses deep learning using neural networks trained on existing videos. There are three neural networks that individually perform the tasks of analyzing the visuals, analyzing the audio, and synthesizing soundwaves with specific pixels for isolation. Zhao and co-authors will be presenting their work at the European Conference on Computer Vision (ECCV) slated to take place in September this year in Munich.

Have a look at the video below to appreciate the AI in action and let us know your thoughts.


static version load dynamic
Loading Comments
Comment on this article
Please share our article, every link counts!
> Expert Reviews and News on Laptops, Smartphones and Tech Innovations > News > News Archive > Newsarchive 2018 07 > MIT's new AI system can tune individual musical instruments straight out of a concert video
Vaidyanathan Subramaniam, 2018-07-17 (Update: 2018-07-17)