Spaces:
Running
on
Zero
Running
on
Zero
File size: 6,202 Bytes
70d8fcf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
---
title: SongFormer
emoji: π΅
colorFrom: blue
colorTo: indigo
sdk: gradio
python_version: "3.10"
app_file: app.py
tags:
- music-structure-annotation
- transformer
short_description: State-of-the-art music analysis with multi-scale datasets
fullWidth: true
---
<p align="center">
<img src="figs/logo.png" width="50%" />
</p>
# SONGFORMER: SCALING MUSIC STRUCTURE ANALYSIS WITH HETEROGENEOUS SUPERVISION


[]()
[](https://github.com/ASLP-lab/SongFormer)
[](https://huggingface.co/spaces/ASLP-lab/SongFormer)
[](https://huggingface.co/ASLP-lab/SongFormer)
[](https://huggingface.co/datasets/ASLP-lab/SongFormDB)
[](https://huggingface.co/datasets/ASLP-lab/SongFormBench)
[](https://discord.gg/rwcqh7Em)
[](http://www.npu-aslp.org/)
Chunbo Hao<sup>*</sup>, Ruibin Yuan<sup>*</sup>, Jixun Yao, Qixin Deng, Xinyi Bai, Wei Xue, Lei Xie<sup>†</sup>
----
SongFormer is a music structure analysis framework that leverages multi-resolution self-supervised representations and heterogeneous supervision, accompanied by the large-scale multilingual dataset SongFormDB and the high-quality benchmark SongFormBench to foster fair and reproducible research.

## News and Updates
## π To-Do List
- [x] Complete and push inference code to GitHub
- [x] Upload model checkpoint(s) to Hugging Face Hub
- [ ] Upload the paper to arXiv
- [x] Fix readme
- [ ] Deploy an out-of-the-box inference version on Hugging Face (via Inference API or Spaces)
- [ ] Publish the package to PyPI for easy installation via `pip`
- [ ] Open-source evaluation code
- [ ] Open-source training code
## Installation
### Setting up Python Environment
```bash
git clone https://github.com/ASLP-lab/SongFormer.git
# Get MuQ and MusicFM source code
git submodule update --init --recursive
conda create -n songformer python=3.10 -y
conda activate songformer
```
For users in mainland China, you may need to set up pip mirror source:
```bash
pip config set global.index-url https://pypi.mirrors.ustc.edu.cn/simple
```
Install dependencies:
```bash
pip install -r requirements.txt
```
We tested this on Ubuntu 22.04.1 LTS and it works normally. If you cannot install, you may need to remove version constraints in `requirements.txt`
### Download Pre-trained Models
```bash
cd src/SongFormer
# For users in mainland China, you can modify according to the py file instructions to use hf-mirror.com for downloading
python utils/fetch_pretrained.py
```
After downloading, you can verify the md5sum values in `src/SongFormer/ckpts/MusicFM/md5sum.txt` match the downloaded files:
```bash
md5sum ckpts/MusicFM/msd_stats.json
md5sum ckpts/MusicFM/pretrained_msd.pt
md5sum ckpts/SongFormer.safetensors
# md5sum ckpts/SongFormer.pt
```
## Inference
## Inference
### 1. One-Click Inference with HuggingFace Space (coming soon)
Available at: [https://huggingface.co/spaces/ASLP-lab/SongFormer](https://huggingface.co/spaces/ASLP-lab/SongFormer)
### 2. Gradio App
First, cd to the project root directory and activate the environment:
```bash
conda activate songformer
```
You can modify the server port and listening address in the last line of `app.py` according to your preference.
> If you're using an HTTP proxy, please ensure you include:
>
> ```bash
> export no_proxy="localhost, 127.0.0.1, ::1"
> export NO_PROXY="localhost, 127.0.0.1, ::1"
> ```
>
> Otherwise, Gradio may incorrectly assume the service hasn't started, causing startup to exit directly.
When first running `app.py`, it will connect to Hugging Face to download MuQ-related weights. We recommend creating an empty folder in an appropriate location and using `export HF_HOME=XXX` to point to this folder, so cache will be stored there for easy cleanup and transfer.
And for users in mainland China, you may need `export HF_ENDPOINT=https://hf-mirror.com`. For details, refer to https://hf-mirror.com/
```bash
python app.py
```
### 3. Python Code
You can refer to the file `src/SongFormer/infer/infer.py`. The corresponding execution script is located at `src/SongFormer/infer.sh`. This is a ready-to-use, single-machine, multi-process annotation script.
Below are some configurable parameters from the `src/SongFormer/infer.sh` script. You can set `CUDA_VISIBLE_DEVICES` to specify which GPUs to use:
```bash
-i # Input SCP folder path, each line containing the absolute path to one audio file
-o # Output directory for annotation results
--model # Annotation model; the default is 'SongFormer', change if using a fine-tuned model
--checkpoint # Path to the model checkpoint file
--config_pat # Path to the configuration file
-gn # Total number of GPUs to use β should match the number specified in CUDA_VISIBLE_DEVICES
-tn # Number of processes to run per GPU
```
You can control which GPUs are used by setting the `CUDA_VISIBLE_DEVICES` environment variable.
### 4. CLI Inference
Coming soon
### 4. Pitfall
- You may need to modify line 121 in `src/third_party/musicfm/model/musicfm_25hz.py` to:
`S = torch.load(model_path, weights_only=False)["state_dict"]`
## Training
## Citation
If our work and codebase is useful for you, please cite as:
````
comming soon
````
## License
Our code is released under CC-BY-4.0 License.
## Contact Us
<p align="center">
<a href="http://www.nwpu-aslp.org/">
<img src="figs/aslp.png" width="400"/>
</a>
</p>
|