AI Podcast Generator 🎙️

An AI-powered tool that transforms YouTube videos into engaging podcast discussions. Features a modern web interface for easy use and optional CLI functionality. Downloads videos, transcribes them, and generates natural conversations between AI voices discussing the content.

podcast-ai.mp4

Features

🎯 Modern web interface for easy podcast generation
🎥 Embedded YouTube video player
🔊 Interactive audio player for generated podcasts
💾 History tracking of processed videos
🤖 Natural conversation generation using Claude AI or XAI
🗣️ Multiple AI voices using ElevenLabs
⚡ Real-time processing status updates
📝 Optional fact-checking of content

Quick Start with Docker Run 🐳

Install Docker.
Place your .env file in the same directory as this command. See the .env.example for details

Run the following command in wsl or ubuntu:

docker run -d --name podcast-app \
  --env-file .env \
  --add-host=host.docker.internal:host-gateway \
  -e VITE_API_URL=http://localhost:5000 \
  -e PYTHONUNBUFFERED=1 \
  -e HOST=0.0.0.0 \
  -e PORT=5000 \
  -p 5000:5000 \
  -p 5173:5173 \
  -v $(pwd)/public/audio:/app/public/audio \
  -v $(pwd)/output:/app/output \
  --restart unless-stopped \
  --health-cmd="curl -f http://localhost:5000/health || exit 1" \
  --health-interval=30s \
  --health-timeout=10s \
  --health-retries=3 \
  bigsk1/podcast-ai:latest

Run in Windows Command Prompt:

docker run -d --name podcast-app --env-file .env --add-host=host.docker.internal:host-gateway -e VITE_API_URL=http://localhost:5000 -e PYTHONUNBUFFERED=1 -e HOST=0.0.0.0 -e PORT=5000 -p 5000:5000 -p 5173:5173 -v %cd%/public/audio:/app/public/audio -v %cd%/output:/app/output --restart unless-stopped --health-cmd="curl -f http://localhost:5000/health || exit 1" --health-interval=30s --health-timeout=10s --health-retries=3 bigsk1/podcast-ai:latest

Prerequisites

Node.js 18 +
Python 3.10 +
FFmpeg installed and in PATH
ElevenLabs API key
Anthropic (Claude) or XAI API key

Docker Compose Setup 🐳

You can run the application using Docker with these simple steps:

Clone the repository and navigate to it:

git clone https://github.com/bigsk1/podcast-ai.git
cd podcast-ai

Create your .env file with required API keys and settings, see the .env.example
Using Docker Compose:

cd docker
docker-compose up -d --build

The application will be available at:

Frontend UI: http://localhost:5173

To stop the container:

docker-compose down

Note: Generated audio files will be available in the public/audio directory, just like in the standard setup.

Docker with Cuda for faster transcription on Nvidia GPU

Note: This is only slightly faster as the transcription can go pretty quick anyway on cpu. To use make sure you have nvidia container toolkit and cudnn installed on host machine.

Test to see if you can run by first using

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Run docker compose with cuda enabled transcription

docker compose -f cuda.docker-compose.yml up -d --build

Installation - Windows / Ubuntu

Clone the repository:

git clone https://github.com/bigsk1/podcast-ai.git
cd podcast-ai

Install frontend dependencies:

npm install

Set up Python environment and install dependencies:

python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate     # Windows

Install requirements

pip install -r requirements.txt

Create .env file with your API keys:

# ELEVENLABS VOICE ID'S - add your own voice id's 
VOICE1=111111111111
VOICE2=111111111111

# AI Model Settings - xai, anthropic
AI_PROVIDER=anthropic

# model name: grok-beta, claude-3-5-sonnet-latest
MODEL_NAME=claude-3-5-sonnet-latest

# Podcast Generation Settings
# Minimum number of back-and-forth exchanges
MIN_EXCHANGES=4
# Maximum number of exchanges
MAX_EXCHANGES=20
# Minimum words per exchange
EXCHANGE_LENGTH_MIN_WORDS=20
# Maximum words per exchange
EXCHANGE_LENGTH_MAX_WORDS=150

# Audio Length Control
# Target length for final podcast (in minutes)
TARGET_LENGTH_MINUTES=3
# Allowed deviation from target (20% = ±36 seconds for 3 min target)
LENGTH_FLEXIBILITY=0.2
# Target output length as ratio of source (0.2 = 20% of original)
SOURCE_LENGTH_RATIO=0.2
# Minimum podcast length in minutes
MIN_PODCAST_LENGTH=2
# Minimum podcast length in minutes
MAX_PODCAST_LENGTH=5
# Maximum podcast length in minutes

# Content Coverage
# comprehensive, summary, or highlights, humor, emotional, debate, simple
COVERAGE_STYLE=highlights
# Enable AI fact checking
FACT_CHECK_ENABLED=false
# balanced, critical, or supportive
FACT_CHECK_STYLE=balanced          

# Model Settings
TEMPERATURE=0.7
MAX_TOKENS=8192

LOGGING_LEVEL=DEBUG

# Output Directory
OUTPUT_DIR=output

# ANTHROPIC API KEY
ANTHROPIC_API_KEY=your_key_here

# ELEVENLABS API KEY
ELEVENLABS_API_KEY=your_key_here

# For XAI
XAI_BASE_URL=https://api.x.ai
XAI_API_KEY=your_xai_key

# Frontend configuration
# if access on other machine on network change to actual server ip
VITE_API_URL=http://localhost:5000

Change voice.json.example to voice.json and add your voice names and id's from elevenlabs, this is a collection that you want to use, set the current voice id in the .env when running the app.
Make sure you have ffmpeg installed

Windows

winget install ffmpeg

Linux

sudo apt install ffmpeg

Usage

Web Interface

Start the backend server in one terminal:

python api.py

Start the frontend development server in another terminal:

npm run dev

Open http://localhost:5173 in your browser
Paste a YouTube URL and click "Generate AI Podcast Review"

CLI Version (Optional)

The tool can also be used from the command line:

# Basic usage
python main.py "https://www.youtube.com/watch?v=video_id"

# Skip audio generation
python main.py --no-audio "https://www.youtube.com/watch?v=video_id"

# Generate without merging
python main.py --no-merge "https://www.youtube.com/watch?v=video_id"

# Merge audio files manually
python merge_audio_cli.py output conversation.mp3

Output

Generated files are saved in:

UI version: public/audio/ directory
CLI version: output/ directory

Configuration Options

All configuration options are set through the .env file. See the sample .env file above for common settings. Also make sure you have your elevenlabs voice id's. The provided example voice id's in voices.json won't work for you, each account has it's own specific id's to match there api key.

Examples

Check out the video on X.

https://aicodelabs.io/emotional.mp3

Your browser does not support the audio element.

https://aicodelabs.io/silo.mp3

Your browser does not support the audio element.

https://aicodelabs.io/merged.mp3

Your browser does not support the audio element.

In Progress

Adding Openai
Adding ollama
Add web search into fact checking of podcast
Add youtube API and add a search feature in header and seperate page for one click podcast generation

Troubleshooting

Could not locate cudnn_ops64_9.dll

Could not locate cudnn_ops64_9.dll. Please make sure it is in your library path!
Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor

To resolve this:

Install cuDNN: Download cuDNN from the NVIDIA cuDNN page https://developer.nvidia.com/cudnn

Here’s how to add it to the PATH:

Open System Environment Variables:

Press Win + R, type sysdm.cpl, and hit Enter. Go to the Advanced tab, and click on Environment Variables. Edit the System PATH Variable:

In the System variables section, find the Path variable, select it, and click Edit. Click New and add the path to the bin directory where cudnn_ops64_9.dll is located. Based on your setup, you would add:

C:\Program Files\NVIDIA\CUDNN\v9.5\bin\12.6

Apply and Restart:

Click OK to close all dialog boxes, then restart your terminal (or any running applications) to apply the changes. Verify the Change:

Open a new terminal and run

where cudnn_ops64_9.dll

pyaudio codec issue

Make sure you have ffmpeg installed and added to PATH on windows terminal ( winget install ffmpeg )

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
docker		docker
public		public
src		src
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md
api.py		api.py
eslint.config.js		eslint.config.js
index.html		index.html
main.py		main.py
merge_audio.py		merge_audio.py
merge_audio_cli.py		merge_audio_cli.py
package-lock.json		package-lock.json
package.json		package.json
podcast_analyzer.py		podcast_analyzer.py
postcss.config.js		postcss.config.js
processor.py		processor.py
requirements.txt		requirements.txt
tailwind.config.js		tailwind.config.js
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
voice.json.example		voice.json.example
voices.json		voices.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Podcast Generator 🎙️

Features

Quick Start with Docker Run 🐳

Prerequisites

Docker Compose Setup 🐳

Docker with Cuda for faster transcription on Nvidia GPU

Installation - Windows / Ubuntu

Usage

Web Interface

CLI Version (Optional)

Output

Configuration Options

Examples

In Progress

Troubleshooting

Could not locate cudnn_ops64_9.dll

pyaudio codec issue

About

Releases 1

Sponsor this project

Packages

Contributors 2

Languages

License

bigsk1/podcast-ai

Folders and files

Latest commit

History

Repository files navigation

AI Podcast Generator 🎙️

Features

Quick Start with Docker Run 🐳

Prerequisites

Docker Compose Setup 🐳

Docker with Cuda for faster transcription on Nvidia GPU

Installation - Windows / Ubuntu

Usage

Web Interface

CLI Version (Optional)

Output

Configuration Options

Examples

In Progress

Troubleshooting

Could not locate cudnn_ops64_9.dll

pyaudio codec issue

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Sponsor this project

Packages 0

Contributors 2

Languages

Packages