3 minute read


Last week, I was at a convention and was able to talk to an author named Martin Shoemaker. He co-wrote a book about dictation with Kevin J. Anderson, who has written a lot of and he does this by dictating.

I’ve known this for a while, and I didn’t know about Martin and that he co-wrote a book on dictation with Kevin.

After one of the panels Martin was on I decided to approach him and see what kind of tips and ideas I could get from him on how to overcome some of the obstacles I have had with dictation in the past. Now, granted, I don’t have a lot of resistance to diction, but my biggest challenge has been finding a space where I feel like I can dictate without being self-conscious.

Usually, when I return home from work, I want to sit down and start writing. This would be the prime time I would do my dictation. I haven’t done it any other way because I rely on my iPad, phone, or my computer to do the transcription right there and then. However, sometimes these devices will time out, and so I can’t do the full dictation without being distraction-free because I need to watch the device to make sure that it’s actually still continuing to transcribe. There have been times when I have been driving and have lost some of what I was saying because the software timed out while I was in the middle of dictating and I didn’t see it.

However, things have changed recently, which has changed how I dictate and transcribe.

The first thing is that OpenAI has released their new whisper engine, which will take a WAV or MP3 and read it to detect what language it is that you’re speaking in and then transcribe it. The resulting file is a video text transcription file. This particular file format has timestamps where the transcription matches the original audio or video file so that it can display those words on the screen at the same time the words are spoken. Along with the timestamps, it has the exact words as detected by the trained AI.

Since it is just a text file with timestamp information I can throw a regular audio file at it and have it do the transcription and then remove the timestamps, do some editing of the file, and “poof” a fairly good first draft of whatever I am working on at the time.

Now I can use this method while driving. I can do my dictation when I get home I can upload the file to my computer and then have it do the actual transcription for me.

I’ve been using this method now for the past few days, and it’s been great! I have been able to dictate while on my phone using the Voice Recording app which I haven’t had any timeout issues with. It goes on recording for as long as you are talking and you have battery life.

Another benefit of the new engine from OpenAI is how it handles ambient noise. They have been tested in a number of different circumstances, including low voices or very loud ambient sounds. Training it in these kinds of circumstances can make it very difficult for the transcription service to detect what’s being said. But this artificial intelligence engine has been able to detect and transcribe the conversations I have thrown at it and has done a good job at it.

So that is my approach now, is I’m going to be recording a lot of the stuff that I want to write using my phone and then sending it to the OpenAI engine.

A further refinement to this process would be dropping the audio in a cloud folder and having my home computer monitor that direction and start a transcription process when it sees a new file arrive and then email or text me to let me know it completed the process. That way when I get home all I have to do is open the file and begin the editing process.

I’m looking forward to coming up with maybe a tool or a script that will take care of that for me, and I’m looking forward to sharing how that all works out in the future.

P.S. This post took 9:38 minutes to dictate. That’s about 80 words per minute.

Photo by Jason Rosewell on Unsplash