[Top] [All Lists]

Announcing SOUNDATA: A Python library for reproducible use of audio data

To: "" <>
Subject: Announcing SOUNDATA: A Python library for reproducible use of audio datasets
From: Justin Salamon <>
Date: Wed, 3 Nov 2021 00:29:57 +0000

*** apologies for any cross-postings ***

Dear colleagues,

We’re excited to announce the release of soundata, a python library for reproducible use of audio datasets.

Soundata can be installed via: pip install soundata

The source code lives here:

We’re launching with 14 popular environmental sound datasets, with plans to continue expanding with additional datasets spanning a range of audio domains, including bioacoustics! 

Soundata makes it easy to:

  • Download datasets to a common location and format

  • Validate that a downloaded dataset is complete and perfectly matches a canonical version

  • Load audio and annotation files into a common format

  • Parse clip-level metadata for detailed evaluations

We hope soundata will help the community to:

  • Ensure results are reproducible by working against exactly the same data

  • Save time by avoiding manual downloads and having to write custom dataset parsers

  • Automate large-scale download, training, and evaluation pipelines

  • Increase the visibility of new datasets by adding them to soundata

Soundata is a cross-organizational collaboration spanning researchers from , Adobe Research, , and 

You can learn more about the library on our docs page:

A bit more about the motivation for soundata can be found in our (work in progress) paper:

"Soundata: A Python library for reproducible use of audio datasets"

Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Plaja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello


We *welcome and encourage* contributions from the community, especially data loaders for datasets not included yet in soundata. If you'd be interested in adding a bioacoustics dataset to soundata, we'd love to hear from you!


Justin & Magdalena on behalf of the soundata team

Justin Salamon | Adobe Research |
<Prev in Thread] Current Thread [Next in Thread>
  • Announcing SOUNDATA: A Python library for reproducible use of audio datasets, Justin Salamon <=

The University of NSW School of Computer and Engineering takes no responsibility for the contents of this archive. It is purely a compilation of material sent by many people to the Bioacoustics-L mailing list. It has not been checked for accuracy nor its content verified in any way. If you wish to get material removed from the archive or have other queries about the archive e-mail Andrew Taylor at this address: andrewt@cse.unsw.EDU.AU