In short, transcription consists of listening to speech in an audio recording and writing down what is being said. It sounds simple, but unfortunately, it’s not that easy – for various reasons.

Speech is produces instantaneously, whereas writing presupposes that you’ve already mentally put into words what you wish to express. Thus, speech (including transcribed speech) will inevitably contain more words, be less precise, contain repetitions, interruptions, incomplete sentences, as well as be structured very differently from written language. Some say – and I completely agree – that writing and speech are two completely separate languages.

Yea, so first you need to take that, uh, thingy over there, yea, what’s it called, like, the long, orange plastic one … with an x at the end, that you turn when it’s in.


Use the orange Phillips screwdriver.

Spoken vs. written instructions

Transcribing is indeed a discipline that often requires substantial resources, e.g. a knack for language, a vast vocabulary, concentration, common knowledge, imagination, patience, good hearing, language skills, software, hardware – and time.

However, transcription quality, usefulness, and precision does not only depend on the skills and knowledge of the transcriber. The nature of the source material is also paramount. If the audio recording is bad it is harder – maybe even impossible – to transcribe it. That’s why the price of a transcription partially depends on which and how many resources the transcriber needs to use to produce a reliable and useful quality transcription from suboptimal source material.

