Futureverse, an AI and metaverse technology and content company, has announced the launch of JEN-1, a new AI model for text-to-music generation. JEN-1 is a significant advancement in music AI, as it is the first model to achieve state-of-the-art performance in text-music alignment and music quality while maintaining computational efficiency.
“We extensively evaluate JEN-1 against state-of-the-art baselines across objective metrics and human evaluations. Results demonstrate JEN-1 produces music of perceptually higher quality (85.7/100) compared to the current best methods (83.8/100),” Futureverse wrote.
Creating music from text is difficult because of the intricate nature of musical arrangements and the need for a high sampling rate. According to Futureverse’s paper, JEN-1 can overcome these challenges as its diffusion model is based on autoregressive and non-autoregressive training. This allows JEN-1 to generate music that is realistic and creative.
Because of its computational efficiency, it is possible to use JEN-1 to generate music in real-time, which opens up new possibilities for music production, live performance, and virtual reality.
The AI model uses a special autoencoder and diffusion model to directly produce detailed stereo audio at a high sampling rate of 48kHz. Moreover, JEN-1 avoids the usual quality loss when converting audio features. The model is trained in multiple tasks, including generating music, continuing music sequences, and filling in missing parts, making it versatile.
JEN-1 also cleverly combines autoregressive and non-autoregressive methods to balance the trade-off between capturing dependencies in music and generating it efficiently. In addition, the AI model employs smart learning techniques and is trained to handle various musical aspects at once.
JEN-1 Versus MusicLM, MusicGen, and Other AI Models
Futureverse compares JEN-1 with the current state-of-the-art models, such as MusicLM from Google and MusicGen from Meta, and demonstrates that its approach produces better results in fidelity and realism.
The evaluation was based on the performance of different models on the MusicCaps test set, which is a dataset of music and text pairs. Futureverse used both quantitative and qualitative measures to evaluate the models. Quantitative measures included the FAD (Fidelity-Awareness-Disentanglement) score and the CLAP (Continuity-and-Local-Anomaly-Penalties) score. Qualitative measures included human assessments of the quality and alignment of the generated music.
The results showed that JEN-1 outperformed the other models on quantitative and qualitative measures. JEN-1 had the highest FAD and CLAP scores and received the highest scores from human assessors. In addition, JEN-1 was more computationally efficient than the other models, with only 22.6% of the parameters of MusicGen and 57.7% of the parameters of Noise2Music.
JEN-1 is a sign of the growing potential of AI in the music industry. AI is already used to create music, but JEN-1 is a significant step forward. It is the first model to achieve state-of-the-art performance on both quantitative and qualitative measures, and it is also more computationally efficient than previous models.
Read more:
Read More: mpost.io