Star 历史趋势
数据来源: GitHub API · 生成自 Stargazers.cn
README.md

MuQ & MuQ-MuLan

Static Badge Static Badge Static Badge Static Badge Static Badge

This is the official repository for the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".

In this repo, the following models are released:

  • MuQ: A large music foundation model pre-trained via Self-Supervised Learning (SSL), achieving SOTA in various MIR tasks.
  • MuQ-MuLan: A music-text joint embedding model trained via contrastive learning, supporting both English and Chinese texts.

Overview

We develop the MuQ for music SSL. MuQ applys our proposed Mel-RVQ as quantitative targets and achieves SOTA performance on many music understanding (or MIR) tasks.

We also construct the MuQ-MuLan, a CLIP-like model trained by contrastive learning, which jointly represents music and text into embeddings.

For more details, please refer to our paper.

Evaluation on MARBLE Benchmark Evaluation on Zero-shot Music Tagging

Usage

To begin with, please use pip to install the official muq lib, and ensure that your python>=3.8:

pip3 install muq

To extract music audio features using MuQ, you can refer to the following code:

import torch, librosa from muq import MuQ device = 'cuda' wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000) wavs = torch.tensor(wav).unsqueeze(0).to(device) # This will automatically fetch the checkpoint from huggingface muq = MuQ.from_pretrained("OpenMuQ/MuQ-large-msd-iter") muq = muq.to(device).eval() with torch.no_grad(): output = muq(wavs, output_hidden_states=True) print('Total number of layers: ', len(output.hidden_states)) print('Feature shape: ', output.last_hidden_state.shape)

Using MuQ-MuLan to extract the music and text embeddings and calculate the similarity:

import torch, librosa from muq import MuQMuLan # This will automatically fetch checkpoints from huggingface device = 'cuda' mulan = MuQMuLan.from_pretrained("OpenMuQ/MuQ-MuLan-large") mulan = mulan.to(device).eval() # Extract music embeddings wav, sr = librosa.load("path/to/music_audio.wav", sr = 24000) wavs = torch.tensor(wav).unsqueeze(0).to(device) with torch.no_grad(): audio_embeds = mulan(wavs = wavs) # Extract text embeddings (texts can be in English or Chinese) texts = ["classical genres, hopeful mood, piano.", "一首适合海边风景的小提琴曲,节奏欢快"] with torch.no_grad(): text_embeds = mulan(texts = texts) # Calculate dot product similarity sim = mulan.calc_similarity(audio_embeds, text_embeds) print(sim)

Note that both MuQ and MuQ-MuLan strictly require 24 kHz audio as input. We recommend using fp32 during MuQ inference to avoid potential NaN issues.

Performance

Table MARBLE Benchmark Table Mulan Results

Model Checkpoints

Model NameParametersDataHuggingFace🤗
MuQ~300MMSD datasetOpenMuQ/MuQ-large-msd-iter
MuQ-MuLan~700Mmusic-text pairsOpenMuQ/MuQ-MuLan-large

Note: Please note that the open-sourced MuQ was trained on the Million Song Dataset. Due to differences in dataset size, the open-sourced model may not achieve the same level of performance as reported in the paper. The training recipes can be found here.

License

The code in this repository is released under the MIT license as found in the LICENSE file.

The model weights (MuQ-large-msd-iter, MuQ-MuLan-large) in this repository are released under the CC-BY-NC 4.0 license, as detailed in the LICENSE_weights file.

Citation

@article{zhu2025muq,
      title={MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization}, 
      author={Haina Zhu and Yizhi Zhou and Hangting Chen and Jianwei Yu and Ziyang Ma and Rongzhi Gu and Yi Luo and Wei Tan and Xie Chen},
      journal={arXiv preprint arXiv:2501.01108},
      year={2025}
}

Acknowledgement

We borrow many codes from the following repositories:

Also, we are especially grateful to the awesome MARBLE-Benchmark.

关于 About

Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".

语言 Languages

Python99.2%
Shell0.8%

提交活跃度 Commit Activity

代码提交热力图
过去 52 周的开发活跃度
1
Total Commits
峰值: 1次/周
Less
More

核心贡献者 Contributors