✦ MΣDY

AICAST: the AI podcast maker - 22/08/2024

AICAST is a web application that allows users to create podcast-style audio content from either uploaded files or directly inputted text.

Overview

This Podcast Generator is a web application that allows users to create podcast-style audio content from either uploaded text files or directly inputted text. It uses OpenAI’s GPT model to convert text into a conversational format and then uses OpenAI’s text-to-speech API to generate audio.

Features

Technologies Used

Setup

  1. Clone the repository: git clone https://github.com/medtty/aicast.git cd aicast

  2. Create the Python env: python3 -m venv aicastenv

  3. Install required Python packages: pip install -r requirements.txt

  4. Set up your OpenAI API key:

    • Create a .env file in the project root
    • Add your OpenAI API key to the .env file: OPENAI_API_KEY=your_api_key_here
  5. Create necessary directories: mkdir uploads

  6. Run the application: python main.py

  7. Open a web browser and navigate to http://127.0.0.1:8000

Usage

Uploading a File

  1. Click on the “Choose File” button under “Upload File”.
  2. Select a PDF, DOCX, or Markdown file from your computer.
  3. Click the “Upload and Generate” button.
  4. Wait for the podcast to be generated (a countdown will be displayed).
  5. Once generation is complete, you’ll see audio players for each episode.

Entering Text Directly

  1. Scroll down to the “Or Enter Text” section.
  2. Type or paste your text into the text area.
  3. Click the “Generate from Text” button.
  4. Wait for the podcast to be generated (a countdown will be displayed).
  5. Once generation is complete, you’ll see an audio player for your podcast.

How It Works

  1. Text Extraction: For uploaded files, the application extracts text using appropriate libraries (pdfplumber for PDF, python-docx for DOCX, and markdown for MD files).

  2. Text Splitting: For longer texts, the content is split into multiple episodes of approximately 1000 words each.

  3. Title Generation: For each episode, a title is generated using OpenAI’s GPT model.

  4. Dialogue Conversion: The text for each episode is sent to OpenAI’s GPT model to be converted into a two-speaker dialogue format.

  5. Audio Generation:

    • First, the intro “Made by medtty” is generated using OpenAI’s text-to-speech API.
    • Then, the dialogue is sent to OpenAI’s text-to-speech API to generate audio, with different voices for each speaker.
    • The intro and dialogue audio are combined into a single audio file.
  6. Audio Playback: The generated audio files are saved and made available for playback through the web interface.

Limitations

Future Improvements

Credits

This repo is inspired from alp-ex

Contributing

Contributions to improve the Podcast Generator are welcome. Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.