Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.3.0 minor issues #40

Open
ghost opened this issue Jun 5, 2024 · 4 comments
Open

0.3.0 minor issues #40

ghost opened this issue Jun 5, 2024 · 4 comments

Comments

@ghost
Copy link

ghost commented Jun 5, 2024

Just did some testing on 0.3.0, pretty good!
Android 14 (GrapheneOS)

Minor issues:

  1. If app is started and microphone is blocked, even though app has microphone permission, the app doesn't trigger the "unblock microphone" notification.

  2. The app will give "♪ ♪" when you aren't talking and you tap the microphone. 🤷🏻‍♂️ Nothing big, doesn't seem to do it if you talk, even a little.

  3. I still think there should be some type of trigger to cause it to output the text instead of just stop because the output is very fast when you trigger it via the stop thus, tapping such a trigger key would output what's been input as the user then continues on. Best example is when finishing a paragraph: tap the trigger → output → return key → continue talking 😁

All in all though, great upgrade!

Edit

  1. Not sure about this stuff:

What is 1200 divided by 6?

16 times 12 equals

🤷🏻‍♂️

@soupslurpr
Copy link
Owner

Oh hey there again! Thanks for the feedback on 0.3.0!

  1. Welp unfortunately this one is an upstream issue with the SpeechRecognizer class it seems. It also happens with Google's SpeechRecognizer, which you can see by trying to use Google Maps with it selected. Also there's an issue for that at App won't work if global mic toggle is enabled when you try an initial transcription, even if you re-enable it later #3
  2. Ah that's because of the new sound effect that gets played, actually those music notes shouldn't be appearing when using whisper.cpp's supress_non_speech_tokens option, and it seems to be a bug on their end. I believe there was a PR resolving it. I could just manually find those and simply replace them with nothing, but ideally it would be suppressed with the suppress_non_speech_tokens option.
  3. Yeah that does sound good, but I don't think I'm going to implement it soon. There needs to be a way to customize which tiles are visible and their size before adding more.

Oh what do you mean this stuff? What's that?

@ghost
Copy link
Author

ghost commented Jun 5, 2024

Oh what do you mean this stuff? What's that?

🤣

When saying phrases like "16*7=" they come out as "16 times 7 equals"

I mention it in my suggestion #41

@soupslurpr
Copy link
Owner

soupslurpr commented Jun 5, 2024

Ah, well that's probably difficult to accomplish, as you can't just replace every instance of times to * and equals to = or it would break nonmathematical uses of those words. Maybe prompting Whisper would work but I haven't tested that and it might also affect the quality of other things. The best solution would probably be using a local AI to process the outputs if prompting doesn't work well. Not sure how much the AI would affect speed on relatively modern device (such as a Pixel 7). It would also be several gigabytes large and take up several gigabytes in memory.

@ghost
Copy link
Author

ghost commented Jun 6, 2024

It would also be several gigabytes large and take up several gigabytes in memory.

😱

Yeah, I'm for lean and fast 😁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant