Description
The magic of good visuals lies in the matching of audio attributes such as rhythm, volume, sound color of instruments to properties in video, e.g., movement, pulsation, shape, size, color and especially their exact temporal correspondence. Music players such as VLC or Windows Media players offer some sort of abstract visualization but the correspondence if weak if there at all, AI Audio to Video converters provide rich but mostly music video style, exhibiting the same issues. What we want to achieve is take input and output attributes, create a framework that maps them, and outputs first abstract shapes, then lets artists design more complex designs using prompts with generative AI in place. Furthermore, stereo and 3D audio and 3D video could be added as additional attributes, using state-of-the-art XR glasses or the new mixed-reality lab. Some preliminary work can be found here [1,2,3].
Tasks
- Analyze related work, such as abstract audio visualizers, AI Audio to Video converters or generative AI tools to create abstract shapes, and AI Audio-to-MIDI Converters
- Create a framework that reads MIDI input (as it can also be learned from audio) and outputs simple shapes designed with researched properties
- Create a structure (e.g., config file) that allows mapping input to output attributes under specific conditions
- Locate a generative AI tool that can be used to replace abstract shapes with a rich, prompt-based or human-preference learned vocabulary
- Possibly, locate a generative AI tool that can be used in an end-to-end pipeline generating video from audio while maintaining the above (learned) correspondences
- Evaluate correspondence of attributes with a user study and compare to the state of the art
[1] https://www.cg.tuwien.ac.at/research/publications/2013/giefing_juergen-…
[2] https://www.cg.tuwien.ac.at/research/publications/2013/hauer_alex-2013-…
[3] https://www.cg.tuwien.ac.at/research/publications/2013/schmid_andreas_2…
Requirements
- Knowledge of python, C++, or other suitable programming language
- Knowledge of English language (source code comments and final report should be in English)
- Bonus for experience with machine learning and music or visuals related programming
Environment
Platform-independent, tested on Linux, Windows, or running in the browser
A bonus of €500/€1000 if completed to satisfaction within an agreed time-frame of 6/12 months (PR/BA or DA)