Announcing SOUNDATA: A Python library for reproducible use of audio data

To:	"" <>
Subject:	Announcing SOUNDATA: A Python library for reproducible use of audio datasets
From:	Justin Salamon <>
Date:	Wed, 3 Nov 2021 00:29:57 +0000

*** apologies for any cross-postings ***

Dear colleagues,

We’re excited to announce the release of soundata, a python library for reproducible use of audio datasets.

Soundata can be installed via: pip install soundata

The source code lives here: https://github.com/soundata/soundata

We’re launching with 14 popular environmental sound datasets, with plans to continue expanding with additional datasets spanning a range of audio domains, including bioacoustics!

Soundata makes it easy to:

Download datasets to a common location and format
Validate that a downloaded dataset is complete and perfectly matches a canonical version
Load audio and annotation files into a common format
Parse clip-level metadata for detailed evaluations

We hope soundata will help the community to:

Ensure results are reproducible by working against exactly the same data
Save time by avoiding manual downloads and having to write custom dataset parsers
Automate large-scale download, training, and evaluation pipelines
Increase the visibility of new datasets by adding them to soundata

Soundata is a cross-organizational collaboration spanning researchers from , Adobe Research, , and .

You can learn more about the library on our docs page: https://soundata.readthedocs.io/

A bit more about the motivation for soundata can be found in our (work in progress) paper:

"Soundata: A Python library for reproducible use of audio datasets"

Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Plaja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello

[arXiv]

We *welcome and encourage* contributions from the community, especially data loaders for datasets not included yet in soundata. If you'd be interested in adding a bioacoustics dataset to soundata, we'd love to hear from you!

Cheers,

Justin & Magdalena on behalf of the soundata team

Justin Salamon | Adobe Research | www.justinsalamon.com

<Prev in Thread]	Current Thread	[Next in Thread>
Announcing SOUNDATA: A Python library for reproducible use of audio datasets, Justin Salamon <=

Previous by Date:	Reminder, Upcoming Webinar: Underwater Noise Abatement, November 4, 2021, 12pm ET, Holly Morin
Next by Date:	Fw: Job Vacancy - Hydroacoustics, Oceanography and Offshore Wind, Ana Širović
Previous by Thread:	Reminder, Upcoming Webinar: Underwater Noise Abatement, November 4, 2021, 12pm ET, Holly Morin
Next by Thread:	Fw: Job Vacancy - Hydroacoustics, Oceanography and Offshore Wind, Ana Širović
Indexes:	[Date] [Thread] [Top] [All Lists]

The University of NSW School of Computer and Engineering takes no responsibility for the contents of this archive. It is purely a compilation of material sent by many people to the Bioacoustics-L mailing list. It has not been checked for accuracy nor its content verified in any way. If you wish to get material removed from the archive or have other queries about the archive e-mail Andrew Taylor at this address: andrewt@cse.unsw.EDU.AU