Why Use Transcribers and Not Speech Recognition Software? | Lex Academic Blog

Transcription is a deeply specialised skill, demanding meticulousness, and a nuanced approach to how people communicate. Its challenges lie not just in the scrupulous conversion of spoken words into prose, but also in navigating the intricacies of various accents, dialects, and terminologies, and honouring contextual nuances. In academic research, particularly when qualitative data is being gathered, transcription proves indispensable, but it can be extremely time-consuming, and there are many barriers to ensuring the accuracy and fidelity of audiovisual findings.

In recent years, researchers have turned to speech recognition software to reduce the time spent away from critical analysis and research tasks. The appeal of these automated, AI-based services is understandable: at first glance, they offer a cheap and fast solution to this issue, which anyone with an internet connection can access. The technology has undoubtedly become more sophisticated, evolving to interpret spoken language and enabling technologies we use every day, such as virtual assistants. But the software has its limitations, as many have found when presented with incomplete and erroneous transcripts. In controlled environments with clear diction, speech recognition can demonstrate remarkable accuracy. However, this ability is easily foxed by regional accents, background noise, multiple speakers, or complex patterns or parts of speech.

While AI models are often trained on large datasets, they may not be equally proficient in recognising all accents or regional variations. Multilingual transcription is particularly challenging for this software: switching between languages is a complex process for even the most skilled linguist, but is a persistent hurdle for transcription and translation software operating from algorithms that are, by design, prescriptive and inflexible. This becomes especially clear when they attempt to follow speakers who shift rapidly between two or more languages or dialects. Additionally, speech recognition software struggles with speech quirks such as filler sounds used in transitions or in moments of thoughts, like ‘ums’ and ‘ahs’. These verbal distortions, the pauses and hesitations that characterise natural conversation, can be misinterpreted or ignored, but are a valuable part of establishing emotion and context and lending weight to a script, and so are beneficial to retain.

Ultimately, software-based transcription can be costlier and slower than professional human transcription. What the software spits out needs checking, correcting, and, in the most frustrating scenarios, complete rewriting. Working on a recent job checking AI-generated transcripts for an advisory body with a tight turnaround, featuring multiple interviewees of varying accents and native languages, I had to disregard the scripts forwarded by the client. Heartfelt accounts of social care filtered through the AI became surreal fragments of vocabulary and phonetically captured neologisms. Pivotal emotional moments I was hearing through my headphones during interviews were peppered with surprising and even X-rated word choices on the page: a jarring afternoon’s work even for the most seasoned transcriber. Ultimately, I began again from a blank page: a far quicker, smoother, and less expletive-ridden process.

Another benefit of seeking out human transcribers is how customisable their transcriptions can be. Transcription software doesn’t currently have the capability to provide précis or in-depth annotations, and is limited in its ability to identify and correct mistakes. Conversely, a human transcriber can summarise material, fill in context to be as inclusive as possible, and clarify any muddled material to convey the intended meaning. Transcribers are adept at taking cues from clients to tailor their transcriptions to accommodate special requirements, and can deliver scripts that, for example, meet accessibility guidelines, or provide extra content to maximise the readability of the scripts.

Furthermore, a critical aspect of transcription is how communicative nuances come across: even the most sophisticated algorithms struggle to recognise human emotions in voices and visual cues. Human quirks of communication, such as sarcasm, irony, meaningful pauses, or deliberate omissions, pose formidable hurdles for transcription software. It is therefore crucial to acknowledge the areas where speech recognition software may falter, and where the interpretive power of a human transcriber can transform and enrich a source: especially when dealing with sensitive or controversial material. An unfortunate consequence of using algorithmic technologies in processing data, even in large-scale datasets as mentioned above, is that it runs the risk of replicating human biases[1], and, in the case of transcription, this could manifest in software failing to notice material using discriminatory or derogatory language, or even adding it in mistakenly.

Delving into concerns around sensitive material, privacy and confidentiality, the use of automated speech recognition introduces a host of issues, such as the storage and processing of sensitive data. Before committing to the use of transcription software, it is best practice to research the data protection and security measures taken by the companies responsible for them, to identify any potential breaches of the integrity and safety of the resulting data. Conversely, using a human transcriber enables these conversations around data safety to occur at an early stage with minimal stress, setting out expectations to be met around confidentiality and the entailing follow up to ensure that data remains protected or is responsibly destroyed.

Finally, part of an effective and complete transcription service is the scrupulous proofreading or copy-editing of the finalised text, and where required, ensuring high quality formatting and presentation, so that the scripts are of publishable quality.

The value of accurate records to research cannot be overstated. They serve as meticulous archives, capturing the essence of spoken words and converting them into a tangible written form. This not only guards against the potential loss or distortion of vital information, but enables researchers to search within, revisit, and analyse data. In disciplines where every nuance matters, such as psychology, sociology, or linguistics, the fidelity of transcripts ensures that the richness of qualitative data is preserved. Researchers armed with accurate transcripts, produced with attention and sensitivity by professional transcription services, are best equipped to establish patterns, draw meaningful conclusions, and contribute substantively to collective knowledge. For this reason, the work of human transcribers remains, and will continue to be, deeply valuable in the age of AI.


This blogpost was written by a Lex Academic transcriber.

[1] For more information on AI and algorithmic replication of bias, see the works of Safiya Umoja Noble and Tracey Spicer, among others.