Libri-adhoc40

Libri-adhoc40 is a synchronized speech corpus, which collects the replayed Librispeech data from loudspeakers by ad-hoc microphone arrays of 40 strongly synchronized distributed nodes in a real office environment. Besides, to provide the evaluation target for speech frontend processing and other applications, it also recorded the replayed Librispeech data in an anechoic chamber.

Description of the dataset

The Libri-adhoc40 dataset is built on the ‘train-clean-100’, ‘dev-clean’, and ‘test-clean’ corpora of Librispeech, which contains about 110 hours of US English speech from 331 speakers. Eventually, Libri-adhoc40 contains 4510 hours data in total with 110 hours data per microphone.

An overview of Libri-adhoc40 is listed in the following table:

subset	recording environment	duration per channel	spkr nums	ch nums	loudspeaker positions	playback corpus in Librispeech
training data	office room	100h	251	40	9	train-clean-100
dev data		5h	40	40	4	dev-clean
test data		5h	40	40	4	test-clean
ground-truth clean data	anechoic chamber	110h	331	1	1	train-clean-100 dev-clean test-clean

For each utterance in ‘train-clean-100’, ‘dev-clean’, and ‘test-clean’ corpora, we replayed it through loudspeaker both in the office room and the anechoic chamber. Besides, when we collected the training data in the office room, the positions of the 40 microphones are different from those when collecting the development data and test data.

Assume that the sentence with the number of '229-130880-0017' was replayed, where number '229-130880-0017' means that speaker '229' speaked according to sentence '0017' in chapter '130880'. The naming rule can be described as follows:

We can obtain 41 channels of data in total, since we recorded it in the office room and the anechoic chamber respectively. For each sentence we recorded, we first classified them according to the position of loudspeaker and speaker, then according to the chapters, and finally according to the original sentences number. Specifically, for each utterance recorded in the office room, we created a new name for it through adding a suffix to the original number ('174-84280-0010') according to the number of the microphone. As for the utterances recorded in anechoic chamber, a suffix named 'anechoic' is added at the end of each utterance.
In Librispeech corpus, the relative path of sentence '229-130880-0017' is:
.\train-clean-100\229\130880\229-130880-0017.flac
In Libri-adhoc40 corpus, the relative path of recorded sentences from '229-130880-0017' have the following forms:
.\adhoc40-train\pos #\229\130880\229-130880-0017-ch-1.wav
.\adhoc40-train\pos #\229\130880\229-130880-0017-ch-2.wav
.\adhoc40-train\pos #\229\130880\229-130880-0017-ch-3.wav
                         ...
.\adhoc40-train\pos #\229\130880\229-130880-0017-ch-40.wav
.\adhoc40-train\pos #\229\130880\229-130880-0017-anechoic.wav
The pos # indicates the position of loudspeaker. See below for more detailed descriptions.

Training data

The plane structure of the office room is shown below.

The red dot indicates the origin of the reference axes. The blue dots indicate the positions of the microphones, whose coordinates are listed in the upper-left corner. The positions and orientations of the loudspeaker are marked by loudspeaker icons. The terms ‘pos’ is short for position. The term ‘mic’ is short for microphone.

The height of the room is 4.2 m. Because the room size is large, and the floor is laid with smooth tiles, the room is highly reverberant with the T60 around 900 ms. Because the room is far from noisy environments, the recorded speech has little additive noise. A directional loudspeaker and 40 omnidirectional microphones of the same type were placed in the room. The sampling rate is 16 kHz.

The specific coordinates of these 40 microphones (for training data) are shown in the tables below：

mic	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
x(m)	9.1	8.3	9.1	8.3	9.1	8.3	9.1	8.3	7.5	6.7	7.5	6.7	7.5	6.7	7.5	6.7	5.9	5.1	5.9	5.1
y(m)	5.2	6.0	3.6	4.4	2	2.8	0.4	1.2	5.2	6.0	3.6	4.4	2	2.8	0.4	1.2	5.2	6.0	3.6	4.4
z(m)	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9

mic	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40
x(m)	5.9	5.1	5.9	5.1	4.3	3.5	4.3	3.5	4.3	3.5	4.3	3.5	2.7	1.9	2.7	1.9	2.7	1.9	2.7	1.9
y(m)	2	2.8	0.4	1.2	5.2	6	3.6	4.4	2	2.8	0.4	1.2	5.2	6	3.6	4.4	2	2.8	0.4	1.2
z(m)	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9

The specific coordinates of the loudspeaker (for training data) are shown in the table below：

loudspeaker position	1	2	3	4	5	6	7	8	9
x(m)	2.7	2.7	2.7	4.3	5.9	8.3	8.3	8.3	5.1
y(m)	4.4	2.8	1.2	1.2	1.2	2.0	3.6	5.2	3.6
z(m)	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95

Note that the loudspeaker at ‘pos 9’ has 2 opposite orientations, we refer to the loudspeaker facing upward as pos 9u, and the another one as pos 9d.

The relationships between the positions of loudspeaker and the identities of speakers can be found here. And the whole training set was saved under the subdirectory named .\adhoc40-train\

Development and test data

The plane structure of the office room and the positions of loudspeaker and microphones are shown below.

The red dot indicates the origin of the reference axes. The blue dots indicate the positions of the microphones, whose coordinates are listed in the upper-left corner. The positions and orientations of the loudspeaker are marked by loudspeaker icons. The terms ‘pos’ is short for position. The term ‘mic’ is short for microphone.

Pos 1 to 4 were selected to replay 'test-clean' corpus and pos 5 to 8 were selected to replay 'dev-clean' corpus.

The specific coordinates of these 40 microphones (for development and test data) are shown in the tables below：

mic	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
x(m)	8.3	8.3	8.3	8.3	8.3	8.3	8.3	8.3	6.7	6.7	6.7	6.7	6.7	6.7	6.7	6.7	5.1	5.1	5.1	5.1
y(m)	6	5.2	4.4	3.6	2.8	2	1.2	0.4	6	5.2	4.4	3.6	2.8	2	1.2	0.4	6	5.2	4.4	3.6
z(m)	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9

mic	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35	36	37	38	39	40
x(m)	5.1	5.1	5.1	5.1	3.5	3.5	3.5	3.5	3.5	3.5	3.5	3.5	1.9	1.9	1.9	1.9	1.9	1.9	1.9	1.9
y(m)	2.8	2	1.2	0.4	6	5.2	4.4	3.6	2.8	2	1.2	0.4	6	5.2	4.4	3.6	2.8	2	1.2	0.4
z(m)	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9	0.9

The specific coordinates of the loudspeaker (for development and test data) are shown in the table below：

loudspeaker position	1	2	3	4	5	6	7	8
data catagory	test set				dev set
x(m)	2.7	4.3	5.9	7.5	2.7	4.3	5.9	7.5
y(m)	1.2	1.2	1.2	1.2	5.2	5.2	5.2	5.2
z(m)	0.95	0.95	0.95	0.95	0.95	0.95	0.95	0.95

The relationships between the positions of loudspeaker and the identities of speakers can be found here. The development set and test set were saved under the subdirectories named .\adhoc40-dev\ and .\adhoc40-test\ respectively.

Ground-truth clean speech

The size of the net space of the anechoic chamber is 11.8×4.2×3.8 m after the installation of sound-absorbing materials.

We replayed the clean speech of Librispeech (including ‘train-clean-100’, ‘dev-clean’, and ‘test-clean’ corpora in Librispeech) in the anechoic chamber to provide the ground-truth clean speech of Libri-adhoc40. The distance between the loudspeaker and the recording device is 40 cm. The sound volume of the loudspeaker was set the same as that in the office room.

Download Link

The test data of Libri-adhoc40 can be downloaded at https://www.dropbox.com/s/3ph407rvr8bhg0e/adhoc40-test.rar?dl=0 now.

The dev data of Libri-adhoc40 can be downloaded at https://www.dropbox.com/sh/xozyvr1bbybh3fi/AABLUwZxbKlJcpPgwfq-o4Mra?dl=0 now.

The rest will be available soon.

Reference

Libri-adhoc40: A dataset collected from synchronized ad-hoc microphone arrays

Scaling sparsemax based channel selection for speech recognition with ad-hoc microphone arrays

Attention-based multi-channel speaker verification with ad-hoc microphone arrays

Deep ad-hoc beamforming based on speaker extraction for target-dependent speech separation

Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
images		images
relationships_of_ldspkrpos_to_spkr		relationships_of_ldspkrpos_to_spkr
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Libri-adhoc40

Description of the dataset

Training data

Development and test data

Ground-truth clean speech

Download Link

Reference

About

Releases

Packages

License

ISmallFish/Libri-adhoc40

Folders and files

Latest commit

History

Repository files navigation

Libri-adhoc40

Description of the dataset

Training data

Development and test data

Ground-truth clean speech

Download Link

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages