Skip to content

Latest commit

 

History

History
97 lines (73 loc) · 3.63 KB

voice_features_notes.md

File metadata and controls

97 lines (73 loc) · 3.63 KB

Vocal Concepts

Metacomunnication that modify textual meaning by prosody, pitch, volume, intonation

Subjective perceptino of sound pressure:

Pa and dB are units of pressure https://en.wikipedia.org/wiki/Sound_pressure#Sound_pressure_level

Study of elements that are not individual phnetic segments (vowels, consonants) Properties of syllables Attributes of Prosody

Number of speech units within a given amount of time

Most commonly measured in Words Per Minute WPM Also Sounds per Second

  • 9.4 sounds per second for poetry reading
  • 13.83 per second for sports commentary

Variation in pitch to indicate speaker's attitude and emotions

Language feature Rhytmic division of time into equal portions by a language "French, Telugu and Yoruba ... are syllable-timed languages, ... English, Russian and Arabic ... are stress-timed languages."

Three ways to divide languge over time

  1. Syllable timed (French, Italian, Spanish, Romanian, Brazilian Portuguese, Icelandic, Singlish,[14][15][16] Cantonese, Mandarin Chinese, Armenian, Turkish and Korean[)
  2. Mora timed (time equal or shorter to a syllable): Japanese, Gilbertese, Slovak and Ganda
  3. Stress timed: English, Thai, Lao, German, Russian, Danish, Swedish, Norwegian, Faroese, Dutch, European Portuguese

Means of making a syllable, words or part of tha sentence prominent

Emphasizing words or ideas

  • I didn't take the test yesterday. (Somebody else did.)
  • I didn't take the test yesterday. (I did not take it.)
  • I didn't take the test yesterday. (I did something else with it.)
  • I didn't take the test yesterday. (I took one of several, or I didn't take the specific test that would have been implied.)
  • I didn't take the test yesterday. (I took something else.)
  • I didn't take the test yesterday. (I took it some other day.)

Stress is can be executed by:

  • Pitch variation
  • Increased duration
  • Increased Loudness
  • Timbre differences???

Pause

Interruption of sound Can convey hesitation, importane

  • Filled Pauses (eh, uh)
  • Paralingual pauses (sighs)

Chunking

Pattern of pausing of lack of pausing:

  • "You know what I mean?" - "No wada meeen?"
  • "y lo sabes" - "ylosaes"

Preceived osound quality UNIQUE for voices

Interesting Info

Singer Identity Representation

Singer Identity Representation Learning using Self-Supervised Techniques

@inproceedings{torres2023singer,
  title={Singer Identity Representation Learning using Self-Supervised Techniques},
  author={Torres, Bernardo and Lattner, Stefan and Richard, Gael},
  booktitle={International Society for Music Information Retrieval Conference (ISMIR 2023)},
  year={2023}
}

Speaker Identification

https://www.researchgate.net/publication/360961643_Speaker_Identification_using_Speech_Recognition

It wasn't clear what features are extracted to identify speakers. Found sample audios Large-scale (1000 hours) corpus of read English speech

Read further

https://en.wikipedia.org/wiki/Psychoacoustics