Ability to define prompts/speakers in text corpus #211
Replies: 4 comments 1 reply
-
Might make sense to follow the SSML standard in this regard. |
Beta Was this translation helpful? Give feedback.
-
Thanks, didn't know there is already a standard. https://www.w3.org/TR/speech-synthesis11/#S3.2.1 seems to fit nicely. |
Beta Was this translation helpful? Give feedback.
-
I started a very basic parser/generator only supporting the voice element so far. But that way you can at least define voices for lines to speak. Here is a quick result from this input
|
Beta Was this translation helpful? Give feedback.
-
I have implemented a different method in my Discord bot at bghira/discord-tron-master import re
def process_line(line, characters):
pattern = r"\{([^}]+)\}:?"
match = re.search(pattern, line)
if match:
actor = match.group(1)
line = re.sub(pattern, "", line).strip()
character_voice = characters.get(actor, {}).get("voice", None)
else:
character_voice = None
return line, character_voice
actors = {
"MAN": {"voice": "hi_speaker_1"},
"WOMAN": {"voice": "hi_speaker_5"},
}
prompt = """{MAN}: Hello, lady!
{WOMAN} Hello, sir!
"""
# Then, to use it:
lines = prompt.split("\n")
for line in lines:
line_segment, character_voice = process_line(line, actors)
generate_audio(line_segment, history_prompt=character_voice)..
combined_audio = np.array([], dtype=np.int16)
for audio in audio_segments:
# Concatenate the audio
combined_audio = np.concatenate((combined_audio, audio))
# Then, save combined_audio. |
Beta Was this translation helpful? Give feedback.
-
When Bark finally breaks the 13s limit on text, it would be cool to be able to define fixed voices for text blocks. That way, you could enable scripted dialogues and they would be reproduceable, instead of the (albeit funny) random walks now. What do you guys think?
The hardest point might be the definition markup itself. The first which came to my mind would be XML notation, something like:
<voice name="speaker_en_09">Bark is the best, forget the rest!</voice><voice name="speaker_en_07">Who makes up these bullshit ideas?</voice>
Perhaps I'll start experimenting with this in my fork this weekend if I find the time.
Beta Was this translation helpful? Give feedback.
All reactions