Datasets
This dataset contains a collection of voice commands for a smart speaker, each beginning with the common wake-word "Hey Alexa." The commands cover a range of tasks such as music control, smart home management, information requests, reminders, shopping, entertainment, and communication. The dataset reflects natural language usage from a diverse group of speakers, capturing various phrasings, inflections, and contexts. It includes contributions from both male and female voices and features speakers with different native languages. The full dataset is available at the following DOI: https://doi.org/10.60593/ur.d.26417548.v1.
If you plan to download this dataset, we would appreciate it very much if you could fill out the Google form at https://forms.gle/dixQ4mkZ4xbXtXRDA. This will help us understand the usage and impacts of this dataset. Your feedback will also help us improve any future extensions of this work.
Please cite the following if you plan to use the dataset :
[1] DiPassio, Tre; Heilemann, Michael; Rutowski, Jenna; Sedlacek, Paula; Thompson, Benjamin; Wen, Yutong (2024). Smart Speaker Command Dataset. University of Rochester. Dataset. https://doi.org/10.60593/ur.d.26417548.v1
[2] T. DiPassio, M. C. Heilemann, B. Thompson and M. F. Bocko, "Estimating the Direction of Arrival of a Spoken Wake Word Using a Single Sensor on an Elastic Panel," 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 2023, pp. 1-5, https://doi.org/10.1109/WASPAA58266.2023.10248068
The dataset includes impulse responses recorded from 14 different rooms. Each room has unique acoustic properties, providing a wide range of RT60, clarity, and EDT values. The recordings are in 48kHz, 32bit, mono WAV files. The dataset is organized by room, with each subfolder containing the impulse responses specific to that room, as well as a general layout of each room and plots of acoustic data. The full dataset is available at the following DOI: https://doi.org/10.60593/ur.d.26801089.v3
If you plan to download this dataset, we would appreciate it very much if you could fill out the Google form at https://forms.gle/jnuP2dYRK3CPmXQG6. This will help us understand the usage and impacts of this dataset. Your feedback will also help us improve any future extensions of this work.
Please cite the following if you plan to use the dataset :
[1] J. Rutowski, T. DiPassio, B. R. Thompson, M. F. Bocko, and M. C. Heilemann “Estimating Direction of Arrival in Reverberant Environments for Wake-Word Detection Using a Single Structural Vibration Sensor,” J. Acoust. Soc. Am., vol. 156, No. 4, pp. 2619–2629, Oct 2024. https://doi.org/10.1121/10.0032367
[2] Rutowski, Jenna; DiPassio, Tre; Thompson, Benjamin R.; Heilemann, Michael C.; Bocko, Mark F. (2024). University of Rochester room impulse response dataset. University of Rochester. Dataset. https://doi.org/10.60593/ur.d.26801089.v3
The Solid State Logic bus compressor applies a time-varying gain g[k] to an input signal. In conventional datasets, only the input and output audio are available, requiring evaluation via proxy metrics defined on the waveform. This dataset includes the control signal corresponding to the applied gain, enabling direct computation of error with respect to the underlying quantity of interest. Unlike conventional datasets, this enables both system identification approaches and black-box or generative models to optimize directly for the applied gain trajectory rather than relying on proxy loss functions. The control signal is provided as gain reduction in decibels:
G[k] proportional to CV[k]
and has been pre-scaled such that it directly represents gain reduction in dB.
The full dataset is available at the following DOI: https://doi.org/10.60593/ur.d.31892026
Please cite the following if you plan to use the dataset :
Anticipated DAFx Paper