Skip to content

Commit

Permalink
README lint
Browse files Browse the repository at this point in the history
  • Loading branch information
freddyaboulton committed Jan 7, 2025
1 parent 447adad commit bef8212
Showing 1 changed file with 13 additions and 1 deletion.
14 changes: 13 additions & 1 deletion gemini/multimodal-live-api/gradio-voice/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Gemini Voice Chat Demo

A low-latency bidirectional streaming pure-python application that enables voice conversations with Google's Gemini model using Gradio. This demo showcases real-time audio streaming capabilities, allowing natural conversations where you can speak with Gemini and receive audio responses back.
A low-latency bidirectional streaming pure-python application that enables voice conversations with Gemini using Gradio. This demo showcases real-time audio streaming capabilities, allowing natural conversations where you can speak with Gemini and receive audio responses back.

## Features

Expand All @@ -15,6 +15,7 @@ A low-latency bidirectional streaming pure-python application that enables voice
Before running the application, you need to:

1. Enable Vertex AI in your Google Cloud Project

- Visit [Enable Vertex AI](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com)

2. Set up authentication
Expand All @@ -23,31 +24,36 @@ Before running the application, you need to:
## Installation

1. Clone this repository:

```bash
git clone [email protected]:GoogleCloudPlatform/generative-ai.git
cd generative-ai/gemini/multimodal-live-api/gradio-voice
```

2. Install the required dependencies:

```bash
pip install -r requirements.txt
```

The main dependencies are:

- gradio_webrtc>=0.0.27,<1.0
- librosa
- google-genai==0.3.0

## Usage

1. Run the application:

```bash
python app.py
```

2. Open your web browser and navigate to the local Gradio interface (typically http://localhost:7860)

3. Configure the application:

- Enter your Google Cloud Project ID
- Select your preferred region (default: us-central1)
- Choose a voice for Gemini (options: Puck, Charon, Kore, Fenrir, Aoede)
Expand All @@ -60,7 +66,9 @@ python app.py
## Configuration Options

### Regions

The application supports any Google Cloud regions, with presets for:

- us-central1 (default)
- us-east5
- us-south1
Expand All @@ -72,7 +80,9 @@ The application supports any Google Cloud regions, with presets for:
Additional locations can be found in the [Vertex AI documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations#united-states).

### Voice Options

Gemini can respond using different voice personalities:

- Puck (default)
- Charon
- Kore
Expand All @@ -82,12 +92,14 @@ Gemini can respond using different voice personalities:
## Technical Details

The application uses:

- Gradio for the web interface
- WebRTC for real-time audio streaming
- Google Gen AI SDK for Gemini integration
- Vertex AI for model hosting and inference

The audio streaming implementation uses:

- 16kHz input sample rate
- 24kHz output sample rate
- PCM audio format
Expand Down

0 comments on commit bef8212

Please sign in to comment.