Talkify is a Swift library designed to streamline the process of integrating speech recognition and synthesis capabilities into iOS and macOS applications. The library harnesses the power of native APIs such as SFSpeechRecognizer and AVSpeechSynthesizer, providing a high-level interface that simplifies their usage and handles common tasks, such as managing audio sessions and checking microphone permissions.
The primary component is the Talkify class. This class provides a comprehensive set of methods for managing speech recognition tasks. It establishes and manages an AVAudioEngine instance for audio operations, handles speech recognition requests and tasks, and provides delegate methods to keep your application informed about the status of speech recognition processes. It also integrates with TalkifyRecordingSession to facilitate the audio recording process.
- Swift 5.0 or higher
- iOS 13.0 or higher
- macOS 10.15 or higher
- SPM
For text to speech and as well for speech to text
Language | Flag |
---|---|
English (Australia) | 🇦🇺 |
English (United Kingdom) | 🇬🇧 |
English (United States) | 🇺🇸 |
English (Ireland) | 🇮🇪 |
English (South Africa) | 🇿🇦 |
中文(中国) | 🇨🇳 |
中文(香港) | 🇭🇰 |
中文(台灣) | 🇹🇼 |
Nederlands (België) | 🇧🇪 |
Nederlands (Nederland) | 🇳🇱 |
Français (Canada) | 🇨🇦 |
Français (France) | 🇫🇷 |
Deutsch (Deutschland) | 🇩🇪 |
Deutsch (Österreich) | 🇦🇹 |
Deutsch (Schweiz) | 🇨🇭 |
Italiano (Italia) | 🇮🇹 |
日本語 (日本) | 🇯🇵 |
한국어 (대한민국) | 🇰🇷 |
Norsk (Norge) | 🇳🇴 |
Polski (Polska) | 🇵🇱 |
Português (Brasil) | 🇧🇷 |
Português (Portugal) | 🇵🇹 |
Română (România) | 🇷🇴 |
Русский (Россия) | 🇷🇺 |
Slovenčina (Slovenská republika) | 🇸🇰 |
Español (Argentina) | 🇦🇷 |
Español (México) | 🇲🇽 |
Español (España) | 🇪🇸 |
Español (Estados Unidos) | 🇺🇸 |
Svenska (Sverige) | 🇸🇪 |
ไทย (ประเทศไทย) | 🇹🇭 |
Türkçe (Türkiye) | 🇹🇷 |
Language | Voices |
---|---|
Arabic | Maged |
Bulgarian | Daria |
Catalan | Montserrat |
Czech | Zuzana |
Danish | Sara |
German | Anna |
Greek | Melina |
Australian English | Karen |
British English | Daniel |
Irish English | Moira |
Indian English | Rishi |
US English | Samantha, Whisper, Princess, Bells, Organ, BadNews, Bubbles, Junior, Bahh, Deranged, Boing, GoodNews, Zarvox, Ralph, Cellos, Kathy, Fred |
South African English | Tessa, Trinoids, Albert, Hysterical |
Spanish | Monica (Neutral), Paulina (Mexican) |
Finnish | Satu |
French | Amelie (Canadian), Thomas |
Hebrew | Carmit |
Hindi | Lekha |
Croatian | Lana |
Hungarian | Mariska |
Indonesian | Damayanti |
Italian | Alice |
Japanese | Kyoko |
Korean | Yuna |
Malay | Amira |
Norwegian | Nora |
Dutch | Ellen (Belgium), Xander (Netherlands) |
Polish | Zosia |
Portuguese | Luciana (Brazil), Joana (Portugal) |
Romanian | Ioana |
Russian | Milena |
Slovak | Laura |
Swedish | Alva |
Thai | Kanya |
Turkish | Yelda |
Ukrainian | Lesya |
Vietnamese | Linh |
Chinese | Tingting (China), Sinji (Hong Kong), Meijia (Taiwan) |
- Text to Speech on different languages with different type of voice models.
- Listens to your voice and provides text, based on your setup.
- You can get all available list of voices programatically
- With Ergonomics while using
- Dedicated delegates to control recording/speaking/reading states on your side.
- RxSwift, Combine, TCA Support
Talkify is available through the Swift Package Manager.
To integrate Talkify into your project using SPM, you can add the package dependency to your Package.swift
file:
dependencies: [
.package(url: "https://github.com/tornikegomareli/Talkify.git", .upToNextMajor(from: "0.1.0"))
]
Before you start using Talkify, there are a few setup steps you need to ensure:
To use the recording features of Talkify, you need to request microphone access. Additionally, for speech recognition, you must request speech recognition authorization. Add the following keys to your Info.plist
:
<key>NSMicrophoneUsageDescription</key>
<string>We need access to the microphone to record your voice.</string>
<key>NSSpeechRecognitionUsageDescription</key>
<string>We need access to speech recognition to convert your voice into text.</string>
For macOS users:
Open your Xcode project. Navigate to the "Signing & Capabilities" tab. In the "Resource Access" section, ensure that "Audio Input" is selected. This allows recording of audio using the built-in microphone and grants access to audio inputs using any Core Audio API that supports audio input. This step is not required for iOS.
The Talkify
class provides a high-level API for managing speech synthesis, recognition tasks and reading text with different voices.
Here's a guide on how to use it:
To start with, you'll need to initialize a Talkify
instance:
let talkify = Talkify()
Setup delegates
talkify.recordingDelegate = self
talkify.speakingDelegate = self
Your class should then conform to the TalkifyRecordingDelegate
and TalkifySpeakingDelegate
protocols and implement their respective methods.
Before starting recording, ensure to set up the recorder:
talkify
.setupRecording()
.startRecording()
You can stop recording programatically with
talkify
.stopRecording()
The recognized text will be available through the recordingDidFinishWithResults(text:)
delegate method.
To start speaking a text, you need to setup speaker
Initialize the TalkifySpeaker
:
let speaker = TalkifySpeaker()
Customizing Voice:
speaker.withVoice(customVoice: .kyoko) // Sets the voice to Kyoko (Japanese Female voice)
Customizing Voice Rate: This adjusts the speed at which the text is spoken. The value range typically is between 0.0 (slowest) and 1.0 (fastest), with 0.5 being the default rate.
speaker.withVoiceRate(value: 0.7) // Sets a faster speaking rate
Customizing Pitch Multiplier: This adjusts the pitch of the synthesized voice. A value of 1.0 means a regular pitch. Values above or below this can be used to raise or lower the pitch, respectively.
speaker.withMultiplier(value: 1.2) // Raises the pitch slightly
Customizing Volume: This adjusts the volume of the synthesized voice, with 1.0 being the loudest and 0.0 being muted.
speaker.withVolume(value: 0.8) // Slightly quieter than the default volume
Set speaker to Talkify instance:
talkify.setSpaker(wih: speaker) // Pass above created speaker instance
Start Speaking:
talkify.speak(text: "Hello, this is Talkify!")
You can pause or continue the speech synthesis using:
talkify.pauseSpeaking()
talkify.continueSpeaking()
Remember to handle the delegate methods for TalkifySpeakingDelegate
to get callbacks about the speech synthesis status.
With Talkify, you can choose a particular voice for speech synthesis. Here's how to set a voice:
let voice = TalkifyVoice(voice: .samantha, quality: .default)
talkify.voice = voice
Replace .samantha with the desired voice identifier from the TalkifyVoiceIdentifier
enum. The quality parameter lets you set the voice's quality; you can choose between .default
and other available options.
To set a specific language for speech recognition and synthesis, you can leverage the TalkifyLanguage
enum:
let language: TalkifyLanguage = .englishUS
talkify.recognitionLanguage = language
talkify.synthesisLanguage = language
For detailed usage and advanced functionalities, refer to the inline documentation provided within the Talkify class and its extensions.
I will appreciate your contributions! Whether you're fixing bugs, improving the documentation, or enhancing the features, I'd love to have your help. Here's how you can contribute:
- Fork the repository: Start by forking the Talkify repository.
- Clone your fork:
git clone https://github.com/YOUR_USERNAME/Talkify.git
- Create a branch:
git checkout -b your-branch-name
- Make your changes: Improve the codebase, add features, fix bugs, or enhance the documentation.
- Commit your changes:
git commit -m "Your descriptive commit message"
- Push to your fork:
git push origin your-branch-name
- Submit a pull request: Go to the Talkify repository and create a new pull request. Describe your changes in detail and ensure it's directed from your branch to the main Talkify branch.
Encountered a bug or an unexpected behavior? I appreciate your feedback. Just Open a new issue on the GitHub repository, providing as much as u can. This helps me address and fix issues faster.
Because this repository is more for educational purpose, I will happily add new functionalities step by step
- watchOS Support: Aim to extend Talkify's capabilities to watchOS, allowing for seamless integration with Apple Watch applications.
- Rx and Combine Listeners: In addition to the delegate pattern, I'm planning on introducing listeners using popular reactive frameworks like RxSwift and Combine.
- Unit Tests: To ensure the robustness and reliability of Talkify, unit tests are on the way. This will boost confidence in the library's functionality and make future changes safer.
- Third party integrations: I have idea to add some third party APIS, for example ChatGPT Speech recognition api with ergonomics to use, but I don't know I need to still think about it, if it will be worth at all.
Just to beat my procrastination 😄
But it really aims to be a comprehensive solution for developers looking to incorporate speech recognition and synthesis into their apps. It abstracts away the complexity of the underlying APIs.
Talkify is licensed under the MIT License. See LICENSE for more information.
If you've found the README helpful or you like the project idea, please give it a ⭐️ (star) on GitHub.