What AI Can Do… And What It Can’t: A Pragmatic Guide for Transcription, Captioning, and Business Communications
The short answer…
AI is brilliant at speed, scale, and affordability. It’s unreliable on nuance, context, names, specialist terms, accents, and anything where the cost of a mistake is high. In other words, AI is a fantastic assistant, but a risky decision-maker. The safest setup for in person or online meetings, HR or legal processes, and public communications is AI-assisted, human-verified.
How the tech actually works (in human terms)
Automatic Speech Recognition (ASR) converts sound waves into text using acoustic models and language models that have learned statistical patterns from vast audio–text pairs. Modern ASR such as OpenAI’s Whisper is trained on hundreds of thousands of hours of multilingual audio and can be robust to noise compared with older systems, especially at low signal-to-noise ratios.
Large Language Models (LLMs) such as GPT-style systems are trained to predict the next token in a sequence. That single objective; next-token prediction, is the foundation of how they “write,” “summarise,” and “answer questions.” Surveys of LLMs describe this training paradigm, its variants, and why scaling data and parameters increases capability, while also explaining where models still fail.
Because LLMs optimise for plausibility, not truth, they sometimes generate text that sounds right but is false. The now-famous phenomenon called hallucination. Recent surveys classify types of hallucinations, explain why they happen, and review mitigation tactics like retrieval-augmented generation and stricter verification.
What AI is genuinely good at today
Speed and scale. AI can produce rough transcripts or first-draft summaries in minutes, even for long meetings or multi-speaker events. Whisper-class models, for example, were designed to generalise “zero-shot” across many tasks without extra training.
Noise robustness (relative to older models). On benchmark tests with simulated background noise, Whisper degrades more gracefully than some classic LibriSpeech-trained models. That matters for real offices and hybrid calls.
Cost efficiency. For internal notes or low-risk content, AI can dramatically reduce per-minute costs compared with fully manual workflows. (Costs vary by provider and volume.)
Great drafting assistant. LLMs are strong at rephrasing, structuring, and turning transcripts into rough outlines; provided a human checks facts and terminology.
Where AI falls down (and why that matters)
Accuracy varies by accent, dialect, and speaker group. Peer-reviewed work shows state-of-the-art ASR performs better on some native English accents than others, and better on native than non-native speech. That translates into unequal error rates in multinational meetings.
Auto-captions can be far from accessible. A University of Minnesota study cited by accessibility practitioners found only 60–70% accuracy for YouTube auto-captions. WCAG guidance notes that auto-captions do not meet accessibility needs or requirements unless they are fully accurate. For many business and public-sector contexts, that bar is not met without human correction.
“Looks right” ≠ “is right.” LLMs can fabricate citations, quotes, or facts with complete confidence. Courts have had to warn practitioners after multiple incidents where AI-invented case law appeared in filings, including sanctions in the U.S., UK warnings after fictitious citations in High Court matters, and a 2025 Australian apology to the Supreme Court of Victoria over AI-generated false quotes. These are vivid reminders that unverified AI output can create real-world risk.
Benchmarks ≠ your meeting. A 2024 methods paper argues that headline “human parity” claims for ASR often hinge on narrow testing conditions; accessible captioning for people with hearing loss requires substantially higher and more consistent accuracy than many off-the-shelf auto-caption tools deliver in the wild.
Real consequences when AI gets it wrong
Legal filings with fake citations. The Avianca case in U.S. federal court led to sanctions after a filing relied on fabricated cases suggested by a chatbot. UK judges have since issued formal warnings after fictitious authorities appeared in multiple cases, and an Australian KC apologised for AI-generated false quotes in a murder matter. If your minutes, transcripts, or summaries feed into legal or HR processes, misstatements can escalate rapidly.
Accessibility exposure. Publishing videos with low-accuracy auto-captions can violate accessibility expectations or policies, and excludes people with hearing loss, and multilingual audiences; harming trust and reach. WCAG-aligned guidance is explicit: don’t assume auto-captions are compliant “unless they are confirmed to be fully accurate.”
Reputation and brand risk. Mis-captioned names, titles, or terms (for example, turning “data breach” into “day at the beach”) become screenshots that live forever.
So, what can AI do for you (Safely and Reliably)?
For internal, low-risk uses
AI is excellent for first-pass transcripts and draft summaries; for searchability across large audio libraries; and for quick multilingual drafts that a human will review.
For external, regulated, or risk-bearing uses
Use AI as the engine, with human verification on top. In practice that means a hybrid workflow: AI draft → trained human edits for accuracy, terminology, and context → QA sign-off → accessible, compliant deliverable. This is the model that brings speed without regret.
Ethical and social value considerations when using AI for transcription, captioning and minutes
Accessibility is an equity issue. By 2050, nearly 2.5 billion people are projected to have some degree of hearing loss and 700 million will require hearing rehabilitation. Unaddressed hearing loss already carries an estimated US$ 1 trillion annual global cost, and captioning is explicitly cited by WHO as part of rehabilitation and participation in work and education. Providing accurate captions and transcripts is therefore a social inclusion measure with measurable economic value, not a nice-to-have.
Auto-captions alone are rarely sufficient for accessibility compliance.
The W3C Web Accessibility Initiative notes that captions must accurately convey spoken content to meet user needs; automatic captions are acceptable only when they are confirmed fully accurate. Real-world studies repeatedly find platform auto-captions around 60–70% accuracy, which is not accessible for people with hearing loss without human correction.
Job quality and distributional effects matter.
The International Labour Organization’s analysis of generative AI finds that its predominant effect is augmentation rather than full automation, with the largest exposure in clerical and advanced-economy roles. That implies task redesign and reskilling, not blanket displacement; but it also means firms must plan for changed job content and inequality risks if augmentation benefits are unevenly distributed.
Macro labour-market projections underscore both upside and risk.
The IMF estimates ~40% of global jobs are exposed to AI, rising to ~60% in advanced economies, with risks of widening inequality without policy and firm-level safeguards. The World Economic Forum’s employer survey expects 23% of roles to change by 2027, with 69 million jobs created and 83 million eliminated, underscoring the need for reskilling and responsible adoption.
SME reality check: adoption is uneven and creates competitive gaps.
ONS data show only 9% of UK firms used AI in 2023, although the share was expected to rise to 22% in 2024; adoption is far higher among large firms and those with stronger management practices. Barriers cited by SMEs include unclear use cases (39%), cost (21%), and skills (16%). These gaps can push smaller suppliers into price-only competition unless they differentiate on verified quality, security, and accessibility outcomes. (Office for National Statistics)
Productivity upside is real (but only with controls).
McKinsey estimates generative AI could add US$ 2.6–4.4 trillion in annual value across functions, but the capture of that value depends on redesigned processes, guardrails, and human oversight. For language workflows, that means “AI first draft; human verify; audit trail preserved.”
What this means ethically for buyers and providers
Accuracy is a duty of care. Where records affect people’s jobs, pay, benefits, discipline or public decisions, organisations have an ethical obligation to deliver transcripts and captions that are actually correct. W3C’s guidance and WHO’s data both imply that uncorrected auto-captions can exclude people and fail accessibility objectives. Build human verification into any AI-enabled workflow touching HR, legal, healthcare, education or public information.
Inclusion by design beats retrofits. Plan for accessibility from the start: microphone placement, speaker briefing, glossary sharing, and human-checked captions. With hundreds of millions living with hearing loss now and far more by 2050, inclusive meetings reduce social isolation and improve participation in work and learning.
Protect workers while adopting AI. The ILO’s “augment, don’t just automate” finding supports a training-first approach: upskill note-takers into QA editors, glossary managers and accessibility specialists. That improves job quality and reduces the risk that AI benefits accrue to a few firms while hollowing out smaller suppliers.
Mind the SME gap. ONS shows a clear adoption divide tied to management capability. For small providers, the ethical and competitive edge is to prove secure handling, ISO-aligned processes, WCAG-conformant outputs, and documented human checks; rather than racing to the bottom on price with raw auto-captions.
Transparency around limits. State, in writing, what your AI can and cannot do, your expected accuracy ranges under typical conditions, and where human sign-off applies. This transparency reduces reputational risk and aligns with the IMF/WEF warnings about uneven impacts and inequality if benefits and risks are not clearly managed.
imf.org
Where human expertise is still essential
Names, technical terms, and domain nuance. Human editors resolve ambiguity that language models can miss; especially across law, healthcare, finance, and government.
Accents, cross-talk, and messy audio. Humans can use context, agendas, and glossaries to disambiguate overlapping speech and regional terms, where ASR error rates jump. Accent-related disparities are well-documented in ASR research.
Accessibility and compliance. Meeting WCAG expectations or public-sector standards reliably requires validated captions/transcripts, not just “whatever the platform auto-generated.” (boia.org)
Accountability documents. HR minutes, board actions, disciplinary notes, and legal bundles require verified accuracy, traceability, and secure handling not “best-effort” drafts.
Practical accuracy expectations (and why they differ)
There is no single accuracy number that applies to every meeting. Accuracy depends on microphones, room noise, accents, jargon density, number of speakers, and the model you use.
Platform auto-captions: Can be fast but vary widely; studies and accessibility reviews report real-world accuracy closer to 60–70% in some settings, which is unusable for accessibility or compliance without correction.
accessibility.com
Modern robust ASR (e.g., Whisper-class): Is markedly better than legacy systems, particularly in noise, but still shows accent-dependent gaps and will mis-spell names and domain terms without a glossary.
Human-verified transcripts: These consistently reach the accuracy levels required for accessibility and formal records because a trained editor resolves the edge cases algorithms miss. WCAG guidance implicitly assumes this verification step.
How to get the best of both worlds (an operating model you can copy)
1) Prepare the model. Share agendas, speaker lists, and a glossary of names/terms with your provider. This single step improves both AI and human outcomes. (It reduces ASR confusion on OOV terms.)
2) Capture clean audio. One microphone per speaker or a high-quality conference mic; reduce cross-talk; record a backup.
3) Use AI for the heavy lift. Generate a first-pass transcript and draft summary quickly.
4) Apply human verification. A trained editor reviews start-to-finish for names, numbers, and decisions; inserts timestamps; aligns to house style.
5) Deliver accessible outputs. WCAG-compliant captions for video; accessible PDFs for minutes; multilingual subtitles that are checked by native linguists.
boia.org
6) Keep an audit trail. Version history, who edited what, and when. This is essential for HR and legal defensibility.
Frequently asked realities
“Can AI alone deliver fully compliant captions?”
Not reliably. WCAG-aligned sources caution that automatic captions don’t meet user needs or requirements unless they’re confirmed fully accurate. In practice, that means human review.
“Are accent gaps still a thing?”
Yes. Multiple studies show higher error rates for some accents and for non-native speakers. If your team or audience is global, plan for human QA.
“Isn’t Whisper ‘human-parity’?”
Under some benchmark conditions it’s very strong, especially in noise, but “parity” headlines don’t capture messy real-world meetings, and accessibility contexts need higher and more consistent accuracy than benchmarks demand.
“What’s the real risk of LLM hallucinations?”
Concrete legal cases across multiple jurisdictions show fabricated citations making it into court filings. Treat LLM outputs as drafts that require verification.
Key Takeaways
Use AI where speed matters and the risk is low.
Use AI + human verification where truth, inclusion, and accountability matter.
For HR, legal, public-sector, and customer-facing work, insist on human-checked captions, transcripts, and summaries and keep the audit trail.
If you would like more information about transcription services, translation services, live captioning services, subtitling services, closed captioning, note taking or minute taking services, get in touch today.