Create your own Audiobook from any pdf with Python

Subscribe to my newsletter and never miss my upcoming articles

Do you read books? Do you like listening to Audiobooks? Do you wish to create your own Audiobook from any pdf? Here's how you can do it.

You can also follow along with the video tutorial of the same!

Repository for Ultimate Resource in python. Drop a star if you find it useful! Got anything to add? Open a PR on the same!

Its time to code!

Let's get started!

You can find the code at my GitHub Repository

First, we need to install the necessary libraries. We require two libraries to build Audiobook using Python.

1. PyPDF2

A Pure-Python library built as a PDF toolkit. It is capable of extracting document information splitting documents page by page merging documents page by page cropping pages merging multiple pages into a single page encrypting and decrypting PDF files and more!

So open your terminal and run the following command.

pip install PyPDF2

If you wish to know more about it, you can refer to the documentation.

2. pyttsx3

pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline, and is compatible with both Python 2 and 3.

So open your terminal and run the following command.

pip install pyttsx3

If you wish to know more about it, you can refer to the documentation.

carbon (3).png Now that we have installed the packages, we can import them in our program.

import pyttsx3
import PyPDF2

Now we need to open our file in reading format and store into book. The name of my pdf file is demo.pdf. rb stands for reading mode.

book = open('demo.pdf','rb')

Now I will call PyPDF2's PdfFileReader method on book and store it into pdf_reader

pdf_reader = PyPDF2.PdfFileReader(book)

Now let's calculate the number of pages in our pdf by using numPages method on pdf_reader and store in num_pages.

num_pages = pdf_reader.numPages

Now let's initialize pyttsx3 using init method and let's print playing Audiobook

play = pyttsx3.init()
print('Playing Audio Book')

Now, let's run a loop for the number of pages in our pdf file. A page will get retrieved at each iteration.

for num in range(0,num_pages):
    page = pdf_reader.getPage(num)
    data= page.extractText()
    play.say(data)
    play.runAndWait()

Moving forward, let's extract the text from our page using extractText method on our page and store it into data.

Next, we will call say method on data and finally we can call runAndWait method at the end.

Run the python script and your Audiobook will play.

That's it. We are done. You can find the code at my GitHub Repository

If you have any queries or suggestions, feel free to reach out to me.

You can connect with me on Twitter.

You should definitely check out my other Blogs:

Resources:

See you in my next article, Take care!

Peter Thaleikis's photo

Interesting, pretty cool idea.

Ameen's photo

Can't wait to do this in javascript , great article

Ayushi Rawat's photo

Sounds great, Thank you Ameen

Sree Harsha K's photo

Great idea. Will definitely try it out!!

Ayushi Rawat's photo

Thank you, Sree Harsha K keep learning!

Abhishek Mani Tripathi's photo

Thats very cool idea !!

Pranav Bhattarai's photo

I tried to run your program, as you can see it.

But unfortunately it is throwing me this error.

Help needed. What's wrong.