Teams Meeting Summarization

🕘 The Problem#

The organization was already running speech-to-text on meeting recordings, but the existing solution was slow and inaccurate enough that people mostly ignored it — attendees still took their own notes, and reviewing a recorded meeting meant listening back to it. Hours of meetings were generating content nobody had time to revisit, and the tool that was supposed to solve that had become part of the problem.

🚀 The Decision#

The obvious move was to swap in a better commercial STT API and call it done. Instead I self-hosted OpenAI Whisper on Amazon SageMaker, because the real requirement wasn't just "transcribe audio" — it was accurate transcription and controllable cost at meeting-volume scale, and a pay-per-call API doesn't give you the second one.

The lever that made self-hosting viable was auto scaling groups tuned to actual usage patterns: scale up during business hours when meetings cluster, scale down nights, weekends, and holidays when there's nothing to transcribe. That's what turned "self-hosted" from an expensive alternative into a 60% cheaper one — idle GPU time during off-hours was the cost a naive always-on deployment would have eaten.

On top of the transcript, AWS Bedrock Sonnet handles the summarization step: context-aware summaries, key entity recognition, and automatic action-item extraction, so a meeting produces a usable artifact instead of just a wall of text.

🏗️ How It Was Built#

Meeting summarization architecture on AWS — the big picture, then two stages: transcribe and summarize

The pipeline is two independent stages:

Transcribe. Meeting audio goes to Whisper on SageMaker, behind the demand-tuned auto scaling group described above. Output is a 95%-accurate transcript even on real-world, noisy meeting audio.

Summarize. The transcript goes to Bedrock Sonnet, which produces a context-aware summary, pulls out key entities (people, dates, topics), and identifies action items — the parts of a meeting people actually need to find again later.

📈 Impact & Results#

95% transcription accuracy, even on noisy real-world audio
10x faster than the previous STT solution, with 60% lower cost from demand-based auto-scaling
Review time cut from hours to minutes, with automated action-item extraction replacing manual note-taking
Multi-user, multi-language support deployed across the organization, integrated directly with Microsoft Teams

Teams Meeting Summarization

🕘 The Problem#

🚀 The Decision#

🏗️ How It Was Built#

📈 Impact & Results#

Key Achievements