multimodal audio in vs. transcribing an audio attachment? #43

ghchinoy · 2025-01-10T05:47:08Z

Currently, it appears that audio is handled as a binary attachment, and is then transcribed. ref

ai/lib/src/views/llm_chat_view/llm_chat_view.dart

Line 230 in b1fbc7c

Future<void> _onTranslateStt(XFile file) async {

For multimodal models such as Gemini, audio as an input is natively supported.

The expectation is that instead of an audio attachment that is transcribed, the audio should be used as the input to the model directly rather than the transcription.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal audio in vs. transcribing an audio attachment? #43

multimodal audio in vs. transcribing an audio attachment? #43

ghchinoy commented Jan 10, 2025

multimodal audio in vs. transcribing an audio attachment? #43

multimodal audio in vs. transcribing an audio attachment? #43

Comments

ghchinoy commented Jan 10, 2025