zfparse layers structure, including motifs, bouts, and phrases, onto a sequence of zebra finch vocalizations.
zfparse was written for Python 3.5. Installation with Conda is recommended.
git clone https://github.com/gfetterman/zfparse
cd zfparse
pip install .
The zfparse data-loading function assumes your data is in Bark format, but the parsing itself does not. If your data is not in Bark, writing a custom data-loading routine should not be difficult.
If your data is Bark-formatted, collect_vocalizations()
will produce a list of vocalizations to feed into the parsing functions.
root = bark.read_root('path/to/root')
entries = [root.entries[e] for e in root.entries]
vocs = zfparse.collect_vocalizations(entries, 'voc_labels.csv')
Information about a given bird's vocalization behavior is stored in a BirdMetadata
object for simplicity.
birdmeta = zfparse.BirdMeta(name='[birdname]',
syllables=['a', 'b', 'c', 'd', 'e'],
intro_note='i',
intro_aliases=[],
ignore=['z'])
The optional parameter ignore
allows you to specify label names which should not be included in the parse - for example, if 'z' denotes a cagemate call, including it in ignore
will ensure that cagemate calls don't interfere with parsing. (They may not interfere even if included, but this option allows you to be certain.)
You can then obtain phrases, bouts, and motifs by running full_parse()
.
phrases = zfparse.full_parse(vocs, birdmeta)
The module contains some default constants for the parameters of parsing, which work reasonably well for many birds. You should examine the results of the parsing to see if you need to adjust these parameters (which you can do in the call to full_parse()
) for your bird.
A call to full_parse()
produces a list of phrases, each of which is simply a list of Bout
objects.
>>> phrases = full_parse(...)
>>> phrases
[[<Bout ...>, <Bout ...>],
[<Bout ...>],
[<Bout ...>, <Bout ...>, <Bout ...>]]
>>> phrases[0]
[<Bout ...>, <Bout ...>]
A Bout
bundles together a sequence of zero or more introductory notes and a sequence of zero or more syllables, which are broken into motifs. A Bout
also keeps track of intervening vocalizations, such as short calls that occur in the middle of these sequences without breaking them.
A Bout
's introductory notes are stored as a list of Vocalizations
, which possess a start
and stop
(both datetime
objects) and a name
.
>>> phrases[0][0].intro_notes
[Vocalization(start=datetime.datetime(...), stop=datetime.datetime(...), name='i'),
Vocalization(start=datetime.datetime(...), stop=datetime.datetime(...), name='i')]
A Bout
's motifs are stored as a list of VocGroup
objects.
>>> phrases[0][0].motifs
[<VocGroup ...>, <VocGroup ...>]
These VocGroup
objects contain Vocalization
objects, which can be accessed as if the VocGroup
were a list.
>>> [v.name for v in phrases[0][0].motifs[0]]
['a', 'b', 'c', 'd', 'e']
>>> phrases[0][0].motifs[0][3]
Vocalization(start=datetime.datetime(...), stop=datetime.datetime(...), name='d')
The dataframe_from_phrases()
function will construct a Pandas DataFrame from a phrase list. The columns are:
- the phrase number within the list
- the bout number within a phrase
- the motif number within a bout
- this will be 'intro' for intro notes
- this will be 'extra' for intervening vocalizations
- the vocalization number within a motif or 'intro' or 'extra' group
- the vocalization start datetime
- the vocalization stop datetime
- the vocalization name
Parsing occurs in six steps, some of which have associated parameters which shape their output and can be modified from their default values in the call to full_parse()
:
-
Separately group intro notes and song syllables.
These groups are generated solely by sequential proximity:
i i c3 i i i
=> two groups:i i
andi i i
-
Combine groups of the same type separated by a short interval, regardless of intervening vocalizations.
Parameters:
intro_anneal
andsong_anneal
These parameters determine how short an interval is used.
The default value for both is 500ms.
-
Split groups containing long silent intervals.
Parameter:
intro_split
andsong_split
These parameters determine how long an interval of silence must be to trigger a split.
The default value for both is 800ms.
-
Combine intro note and song syllable groups into bouts.
Parameter:
bout_anneal
This parameter determines how close an intro note group and a song syllable group must be to be glued together into a bout.
The default value is 500ms.
-
Break song syllable groups in the bouts into motifs.
A new motif begins when the current syllable comes before the last syllable in the song sequence.
I.e.:
a b c a b
=> two motifs:a b c
anda b
-
Combine bouts separated by a short interval into phrases.
Parameter:
phrase_anneal
This parameter determines how close two bouts must be to be combined into a phrase.
The default value is 2000ms.