Metacomunnication that modify textual meaning by prosody, pitch, volume, intonation
Subjective perceptino of sound pressure:
Pa and dB are units of pressure https://en.wikipedia.org/wiki/Sound_pressure#Sound_pressure_level
Study of elements that are not individual phnetic segments (vowels, consonants) Properties of syllables Attributes of Prosody
Number of speech units within a given amount of time
Most commonly measured in Words Per Minute WPM Also Sounds per Second
- 9.4 sounds per second for poetry reading
- 13.83 per second for sports commentary
Variation in pitch to indicate speaker's attitude and emotions
Language feature Rhytmic division of time into equal portions by a language "French, Telugu and Yoruba ... are syllable-timed languages, ... English, Russian and Arabic ... are stress-timed languages."
Three ways to divide languge over time
- Syllable timed (French, Italian, Spanish, Romanian, Brazilian Portuguese, Icelandic, Singlish,[14][15][16] Cantonese, Mandarin Chinese, Armenian, Turkish and Korean[)
- Mora timed (time equal or shorter to a syllable): Japanese, Gilbertese, Slovak and Ganda
- Stress timed: English, Thai, Lao, German, Russian, Danish, Swedish, Norwegian, Faroese, Dutch, European Portuguese
Means of making a syllable, words or part of tha sentence prominent
Emphasizing words or ideas
- I didn't take the test yesterday. (Somebody else did.)
- I didn't take the test yesterday. (I did not take it.)
- I didn't take the test yesterday. (I did something else with it.)
- I didn't take the test yesterday. (I took one of several, or I didn't take the specific test that would have been implied.)
- I didn't take the test yesterday. (I took something else.)
- I didn't take the test yesterday. (I took it some other day.)
Stress is can be executed by:
- Pitch variation
- Increased duration
- Increased Loudness
- Timbre differences???
Interruption of sound Can convey hesitation, importane
- Filled Pauses (eh, uh)
- Paralingual pauses (sighs)
Pattern of pausing of lack of pausing:
- "You know what I mean?" - "No wada meeen?"
- "y lo sabes" - "ylosaes"
Preceived osound quality UNIQUE for voices
Singer Identity Representation Learning using Self-Supervised Techniques
@inproceedings{torres2023singer,
title={Singer Identity Representation Learning using Self-Supervised Techniques},
author={Torres, Bernardo and Lattner, Stefan and Richard, Gael},
booktitle={International Society for Music Information Retrieval Conference (ISMIR 2023)},
year={2023}
}
https://www.researchgate.net/publication/360961643_Speaker_Identification_using_Speech_Recognition
It wasn't clear what features are extracted to identify speakers. Found sample audios Large-scale (1000 hours) corpus of read English speech
Read further