AICAST: the AI podcast maker - 22/08/2024

AICAST is a web application that allows users to create podcast-style audio content from either uploaded files or directly inputted text.

Overview

This Podcast Generator is a web application that allows users to create podcast-style audio content from either uploaded text files or directly inputted text. It uses OpenAI’s GPT model to convert text into a conversational format and then uses OpenAI’s text-to-speech API to generate audio.

Features

File Upload: Support for PDF, DOCX, and Markdown files.
Direct Text Input: Users can paste or type text directly into the application.
Automatic Episode Generation: For longer content, the application automatically splits the text into multiple episodes with generated titles.
Custom Audio Intro: Each episode starts with “Made by medtty”.
Dynamic Two-Speaker Conversion: The application converts input text into a dynamic dialogue between two speakers.
Instant Audio Generation: Audio is generated immediately after file upload or text input, without a review step.
Web-based Interface: Easy-to-use web interface for uploading files or entering text.
Audio Playback: Generated episodes can be played directly in the browser.

Technologies Used

Backend: Python with Flask
Frontend: HTML, CSS, and JavaScript
Text Processing: Python-Markdown, pdfplumber, python-docx
AI Services: OpenAI GPT-3.5 and Text-to-Speech API

Setup

Clone the repository: git clone https://github.com/medtty/aicast.git cd aicast
Create the Python env: python3 -m venv aicastenv
Install required Python packages: pip install -r requirements.txt
Set up your OpenAI API key:
- Create a .env file in the project root
- Add your OpenAI API key to the .env file: OPENAI_API_KEY=your_api_key_here
Create necessary directories: mkdir uploads
Run the application: python main.py
Open a web browser and navigate to http://127.0.0.1:8000

Usage

Uploading a File

Click on the “Choose File” button under “Upload File”.
Select a PDF, DOCX, or Markdown file from your computer.
Click the “Upload and Generate” button.
Wait for the podcast to be generated (a countdown will be displayed).
Once generation is complete, you’ll see audio players for each episode.

Entering Text Directly

Scroll down to the “Or Enter Text” section.
Type or paste your text into the text area.
Click the “Generate from Text” button.
Wait for the podcast to be generated (a countdown will be displayed).
Once generation is complete, you’ll see an audio player for your podcast.

How It Works

Text Extraction: For uploaded files, the application extracts text using appropriate libraries (pdfplumber for PDF, python-docx for DOCX, and markdown for MD files).
Text Splitting: For longer texts, the content is split into multiple episodes of approximately 1000 words each.
Title Generation: For each episode, a title is generated using OpenAI’s GPT model.
Dialogue Conversion: The text for each episode is sent to OpenAI’s GPT model to be converted into a two-speaker dialogue format.
Audio Generation:
- First, the intro “Made by medtty” is generated using OpenAI’s text-to-speech API.
- Then, the dialogue is sent to OpenAI’s text-to-speech API to generate audio, with different voices for each speaker.
- The intro and dialogue audio are combined into a single audio file.
Audio Playback: The generated audio files are saved and made available for playback through the web interface.

Limitations

The application requires an OpenAI API key and uses API credits for text processing and audio generation.
The quality and appropriateness of the generated content depend on the OpenAI models and may not always be perfect.
Very large files may take a long time to process and may result in multiple episodes.

Future Improvements

Add user authentication to allow saving and managing generated podcasts.
Implement background processing for large files to improve user experience.
Add options for users to customize voices, speaking styles, and episode lengths.

Credits

This repo is inspired from alp-ex

Contributing

Contributions to improve the Podcast Generator are welcome. Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.