-
Notifications
You must be signed in to change notification settings - Fork 250
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
415129e
commit 9a10345
Showing
77 changed files
with
2,463 additions
and
1,086 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,4 +5,6 @@ | |
|
||
process_ctm_line | ||
export_textgrid | ||
construct_output_tiers | ||
construct_output_path | ||
output_textgrid_writing_errors |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
|
||
.. _server_api: | ||
|
||
Managing MFA servers | ||
==================== | ||
|
||
Functions | ||
--------- | ||
|
||
.. currentmodule:: montreal_forced_aligner.command_line.utils | ||
|
||
.. autosummary:: | ||
:toctree: generated/ | ||
|
||
configure_pg | ||
initialize_server | ||
check_databases | ||
start_server | ||
stop_server | ||
delete_server |
31 changes: 31 additions & 0 deletions
31
docs/source/user_guide/implementations/alignment_analysis.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
|
||
(alignment_analysis)= | ||
# Analyzing alignment quality | ||
|
||
When exporting textgrids following alignment, an additional file named `alignment_analysis.csv` will be exported. I am still currently working to refine what are the best measures for analyzing alignments as it's not entirely as straightforward as taking the overall alignment log-likelihood. | ||
|
||
## Alignment log-likelihood | ||
|
||
The first measure provided for each utterance is the alignment log-likelihood. This represents overall the objective measure that was optimized for alignment. However, it is extremely important to note that this log-likelihood is a relative measure for the best path of alignment for this particular utterance compared to other possible alignments. | ||
|
||
A primary reason that such heavy caveats come with this metric is due to the use of speaker adaptation. MFA does two passes of alignment. The first uses a speaker-independent model to generate an initial alignment. This initial alignment is used to estimate per-speaker feature transforms that try to map the observed features into a common space. Depending on the amount of data for a particular speaker, and the amount of variability they exhibit (i.e., do they yell, do they get excited, do they whisper, did they have a cold, etc etc), speaker transforms have a variable effect on improving alignment. This variable improvement directly affects the log likelihood for a given utterance. | ||
|
||
Additionally, log-likelihood reflects differences in the training data versus alignment data. Is the variety of the language the same? Does it have similar gender distribution? Does it have similar styles (conversational, scripted)? Does it have similar noise levels? All of these can affect the acoustics of phones and skew how "likely" a given phone at a given point in time is. | ||
|
||
## Speech log-likelihood | ||
|
||
The overall alignment log-likelihood represents the best path including all sections of silence. In general when we're thinking about how good an alignment is, we don't necessarily care how good of a match the silence intervals in a given utterance are to the trained silence model. So the speech log-likelihood measure takes out all log-likelihoods from silence intervals and is the average of per-phone log-likelihoods in the utterance. | ||
|
||
## Phone duration deviation | ||
|
||
Stepping back from log-likelihoods generated by the model, we can take a look at statistics of the duration of phones in the aligned corpus. By calculating the mean and standard deviation of durations per phone, we can z-score the individual phone's duration to see how unexpected it is relative to the corpus overall. The phone duration deviation measure is an average of the absolute z-score of each phones duration. | ||
|
||
We use the absolute value of the z-score because often excessive durations due to misalignment will also result in excessively small durations on other phones. The average of raw z-scores in these cases will trend towards zero, when really we want these deviations to aggregate to utterances that clearly had something go wrong. | ||
|
||
It is important to note that there stylistic and speaker influences on duration, and statistics are gathered for the whole corpus, not normalized per speaker, so false positives are likely to pop up when sorting by this metric. Normalizing per-speaker, however, might minimize the magnitude of duration deviation if a given speaker's utterances are all poorly aligned. This would increase the likelihood of false negatives, and false positives are more acceptable than false negatives. | ||
|
||
|
||
## Ideas for the future that need a lot more thinking before I implement them | ||
|
||
1. Use the alignment best path from the speaker adapted pass with a lattice and scores generated using the speaker-independent first-pass alignment model | ||
* This *might* help get around the variable optimizations that are speaker dependent |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,54 @@ | ||
|
||
.. _server: | ||
|
||
*********** | ||
MFA Servers | ||
*********** | ||
|
||
MFA database servers | ||
==================== | ||
|
||
By default, MFA starts or creates a PostgreSQL servers when a command is invoked, and stops the server at the end of processing. The goal here is to have as unobtrusive of a database server as possible, however there are use cases that you may require more control. To turn off the automatic management of PostgreSQL servers, run :code:`mfa configure --disable_auto_server`. | ||
|
||
You can have multiple PostgreSQL servers by using the :code:`--profile` flag, if necessary. By default the "global" profile is used. The profile flags are used in :ref:`configure_cli`, as the default options set with :code:`configure` are done on a per-profile basis. | ||
|
||
|
||
PostgreSQL configuration | ||
------------------------ | ||
|
||
MFA overrides some default configuration values for its PostgreSQL servers when they are initialized. | ||
|
||
.. code-block:: | ||
log_min_duration_statement = 5000 | ||
enable_partitionwise_join = on | ||
enable_partitionwise_aggregate = on | ||
unix_socket_directories = '/path/to/current/profile/socket_directory' | ||
listen_addresses = '' | ||
maintenance_work_mem = 500MB | ||
work_mem = 128MB | ||
shared_buffers = 256MB | ||
max_connections = 1000 | ||
The goal for MFA is to run on local desktops at reasonable performance on moderate sized corpora (<3k hours). Depending on your use case, you may need to tune the :code:`postgres.conf` file further to suit your set up and corpus (see `PostgreSQL's documentation <https://www.postgresql.org/docs/15/runtime-config.html>`_ and `postgresqltuner utility script <https://github.com/jfcoz/postgresqltuner>`_. Additionally, note that any port listening is turned off by default and connections are handled via socket directories. | ||
|
||
.. warning:: | ||
|
||
MFA PostgreSQL databases are meant to be on the expendable side. Though they can persist across use cases, it's not really recommended. Use of :code:`--clean` drops all data in the database to ensure a fresh start state, as various commands perform destructive commands. As an example :ref:`create_segments` deletes and recreates :class:`~montreal_forced_aligner.db.Utterance` objects, so the original text transcripts are absent in the database following its run. | ||
|
||
.. _server_cli: | ||
|
||
Managing MFA database servers | ||
============================= | ||
|
||
MFA PostgreSQL servers can be managed via the subcommands in `mfa server`, allowing you to initialize new servers, and start, stop, and delete existing servers. | ||
|
||
.. click:: montreal_forced_aligner.command_line.server:server_cli | ||
:prog: mfa server | ||
:nested: full | ||
|
||
API reference | ||
------------- | ||
|
||
- :ref:`server_api` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.