AICAST: the AI podcast maker - 22/08/2024
AICAST is a web application that allows users to create podcast-style audio content from either uploaded files or directly inputted text.
Overview
This Podcast Generator is a web application that allows users to create podcast-style audio content from either uploaded text files or directly inputted text. It uses OpenAI’s GPT model to convert text into a conversational format and then uses OpenAI’s text-to-speech API to generate audio.
Features
- File Upload: Support for PDF, DOCX, and Markdown files.
- Direct Text Input: Users can paste or type text directly into the application.
- Automatic Episode Generation: For longer content, the application automatically splits the text into multiple episodes with generated titles.
- Custom Audio Intro: Each episode starts with “Made by medtty”.
- Dynamic Two-Speaker Conversion: The application converts input text into a dynamic dialogue between two speakers.
- Instant Audio Generation: Audio is generated immediately after file upload or text input, without a review step.
- Web-based Interface: Easy-to-use web interface for uploading files or entering text.
- Audio Playback: Generated episodes can be played directly in the browser.
Technologies Used
- Backend: Python with Flask
- Frontend: HTML, CSS, and JavaScript
- Text Processing: Python-Markdown, pdfplumber, python-docx
- AI Services: OpenAI GPT-3.5 and Text-to-Speech API
Setup
-
Clone the repository:
git clone https://github.com/medtty/aicast.git cd aicast
-
Create the Python env:
python3 -m venv aicastenv
-
Install required Python packages:
pip install -r requirements.txt
-
Set up your OpenAI API key:
- Create a
.env
file in the project root - Add your OpenAI API key to the
.env
file:OPENAI_API_KEY=your_api_key_here
- Create a
-
Create necessary directories:
mkdir uploads
-
Run the application:
python main.py
-
Open a web browser and navigate to
http://127.0.0.1:8000
Usage
Uploading a File
- Click on the “Choose File” button under “Upload File”.
- Select a PDF, DOCX, or Markdown file from your computer.
- Click the “Upload and Generate” button.
- Wait for the podcast to be generated (a countdown will be displayed).
- Once generation is complete, you’ll see audio players for each episode.
Entering Text Directly
- Scroll down to the “Or Enter Text” section.
- Type or paste your text into the text area.
- Click the “Generate from Text” button.
- Wait for the podcast to be generated (a countdown will be displayed).
- Once generation is complete, you’ll see an audio player for your podcast.
How It Works
-
Text Extraction: For uploaded files, the application extracts text using appropriate libraries (pdfplumber for PDF, python-docx for DOCX, and markdown for MD files).
-
Text Splitting: For longer texts, the content is split into multiple episodes of approximately 1000 words each.
-
Title Generation: For each episode, a title is generated using OpenAI’s GPT model.
-
Dialogue Conversion: The text for each episode is sent to OpenAI’s GPT model to be converted into a two-speaker dialogue format.
-
Audio Generation:
- First, the intro “Made by medtty” is generated using OpenAI’s text-to-speech API.
- Then, the dialogue is sent to OpenAI’s text-to-speech API to generate audio, with different voices for each speaker.
- The intro and dialogue audio are combined into a single audio file.
-
Audio Playback: The generated audio files are saved and made available for playback through the web interface.
Limitations
- The application requires an OpenAI API key and uses API credits for text processing and audio generation.
- The quality and appropriateness of the generated content depend on the OpenAI models and may not always be perfect.
- Very large files may take a long time to process and may result in multiple episodes.
Future Improvements
- Add user authentication to allow saving and managing generated podcasts.
- Implement background processing for large files to improve user experience.
- Add options for users to customize voices, speaking styles, and episode lengths.
Credits
This repo is inspired from alp-ex
Contributing
Contributions to improve the Podcast Generator are welcome. Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.