HomeWinBuzzer NewsExperts: OpenAI Whisper AI Transcription Errors Could Be Dangerous

Experts: OpenAI Whisper AI Transcription Errors Could Be Dangerous

Experts warn about Whisper’s hallucination problem as OpenAI’s transcription tool fabricates phrases, affecting trust and accuracy in high-stakes medical contexts.

-

OpenAI’s transcription tool Whisper seems to produce severe hallucinations, according to new reports. In AI lingo this describes a model’s tendency to add fabricated phrases or sentences to transcriptions, text that wasn’t present in the original audio. For high-stakes environments, particularly in medicine, these hallucinations pose risks as Whisper is already being deployed to handle doctor-patient documentation and other sensitive applications.

As AP reports, French and U.S.-based tech company Nabla has integrated Whisper into a transcription tool currently deployed across 30,000 healthcare centers. Nabla claims that Whisper, which helps medical professionals document patient consultations efficiently, requires clinicians to review transcriptions, though it erases original audio files after transcription to protect privacy. This process, however, raises concerns about the model’s reliability. William Saunders, a former OpenAI engineer, cautioned that “removing the original recording means there’s no way to catch mistakes,” highlighting the challenge of quality control without source verification.

Cornell University’s June 2024 Study Exposes Whisper’s Hallucination Issue

Already earlier this year, researchers at Cornell University, led by Assistant Professor Allison Koenecke, flagged a significant flaw in Whisper. Their study found that in over 1% of 13,000 recordings, Whisper hallucinated content—including violent or unrelated language. In one instance, Whisper transcribed an audio clip and added entirely fabricated violent phrases, introducing potential risks in fields where precision is essential.

These findings, presented at the ACM Conference on Fairness, Accountability, and Transparency (FAccT), involved audio from AphasiaBank, a collection of speech samples from individuals with aphasia, curated by Carnegie Mellon University. AphasiaBank provides a wide array of recordings from diverse speakers, allowing researchers to analyze Whisper’s performance with unique speech patterns, such as pauses or background noise. The study’s findings suggest that Whisper struggles particularly with audio containing these nuances, leading to random hallucinations, which researchers argue could be corrected with further development and testing.

In addition to healthcare, Whisper’s transcription accuracy is crucial for the Deaf and hard-of-hearing communities, who rely on AI-generated captions for communication. Christian Vogler, head of Gallaudet University’s Technology Access Program, noted that hallucinated phrases could lead to misinterpretation. Users relying solely on captions are left unaware when Whisper generates content not spoken, which can distort important information. Vogler emphasized the risks for users dependent on accurate transcripts, who have no way of verifying fabricated phrases.

Privacy around Whisper’s use in hospitals is also under scrutiny. In California, Assemblywoman Rebecca Bauer-Kahan was approached by a healthcare provider to consent to sharing audio of her child’s appointment with companies including OpenAI’s partner, Microsoft Azure. She declined, citing concerns about personal medical information being shared with tech vendors, a sentiment that echoes a broader privacy issue as tech companies play a role in sensitive patient data handling. John Muir Health’s spokesperson Ben Drew assured compliance with privacy laws but acknowledged public concerns.

Technical Challenges with Whisper’s Data Handling

Whisper, launched in 2022 and trained on 680,000 hours of data, was developed to process audio from various sources with high accuracy. Yet, as noted in the Cornell study, the model frequently hallucinates names, addresses, or random web information, particularly during pauses in speech or when background noise is present. Engineers attribute these hallucinations to Whisper’s sensitivity to such audio patterns, an issue Cornell’s researchers and others in AI have highlighted for further improvements.

Koenecke’s team at Cornell recommended that OpenAI conduct rigorous pre-release testing and make adjustments to handle diverse speech characteristics more effectively. With AI’s increasing role in professional and public applications, experts call for improved reliability and accuracy in tools like Whisper, especially in fields that depend heavily on precise transcriptions.

Last Updated on November 7, 2024 2:19 pm CET

SourceAP
Markus Kasanmascheff
Markus Kasanmascheff
Markus has been covering the tech industry for more than 15 years. He is holding a Master´s degree in International Economics and is the founder and managing editor of Winbuzzer.com.

Recent News

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
We would love to hear your opinion! Please comment below.x
()
x