Replies: 3 comments 1 reply
-
Its currently a difficult task, since the llm models only give timestamps for words. I think when you find an model on huggingface which is trained for singing, that would be the best chance. |
Beta Was this translation helpful? Give feedback.
-
I have been thinking about it again. Since we have the voice separated, we could analyze the volume. Then we could say that if there is no word/timestamp, then there is some kind of "lalala". |
Beta Was this translation helpful? Give feedback.
-
We have now some spectrograms to play with. |
Beta Was this translation helpful? Give feedback.
-
Hello,
is there a way to increase recognition of shouts, or voice that is'nt necessarily a word, to not get filtered when they are not recognized as a word. so one can get the timing of those shouts in the ultrastar text file? would i need a language model trained for this, oder can i add an additional model somehow which increases supports this anlong with the normal language models?
Beta Was this translation helpful? Give feedback.
All reactions