Back

How Bantrly Works

A deep dive into the four core capabilities powering every 1-minute voice lesson

The Pipeline

Step 1

Audio Capture

Browser MediaRecorder API captures audio with echo cancellation and noise suppression at 44.1kHz. Audio is encoded as WebM/Opus.

Step 2

Preprocessing

Audio is validated for duration and quality. Silence detection ensures there is actual speech content before running the pipeline.

Step 3

Analysis Pipeline

Three parallel processes run: transcription (speech-to-text), voice embedding extraction (speaker verification), and the raw audio is stored for reference.

Step 4

Scoring

The LLM evaluates the transcript for task adherence and generates scores across 6 dimensions: Pronunciation, Prosody, Speaking Rate, Fluency, Volume, and Mastery.

Capabilities Deep Dive

Voice-Based Instructions

Each coach persona delivers unique spoken prompts using browser text-to-speech synthesis. The system generates contextual instructions tailored to the coach's personality and the current exercise type.

Key Features

  • 6 distinct coach personas with unique speaking styles
  • Browser-native speech synthesis for instant audio playback
  • Contextual prompts spanning pronunciation, fluency, prosody, and more
  • Randomized prompt selection to keep practice sessions fresh

Technical Details

Uses the Web Speech API (SpeechSynthesisUtterance) for voice output. Each coach has a defined personality template that shapes how prompts are framed and delivered.

High-Fidelity Transcription

Unlike standard speech-to-text that cleans up your words, our pipeline preserves every filler word, hesitation, and false start. You see exactly what you said - "um"s, "uh"s, and all.

Key Features

  • Raw transcript output with zero auto-correction
  • Filler words preserved: "um," "uh," "like," "you know"
  • False starts and self-corrections kept intact
  • Designed for honest self-assessment, not polished output

Technical Details

The transcription pipeline is configured with verbatim mode enabled, disabling inverse text normalization and disfluency filtering. Supports model-swapping for different speech-to-text engines.

Task Adherence Checks

After transcription, the system compares your spoken content against the original prompt instructions. Did you repeat 3 times? Did you include the required words? A detailed score and rationale tell you exactly where you deviated.

Key Features

  • Compares transcript against prompt requirements
  • Counts repetitions, checks for required phrases
  • Generates a 0-100 adherence score
  • Provides a written rationale explaining the score

Technical Details

An LLM receives both the prompt text and raw transcript, then evaluates adherence across multiple dimensions: completeness, accuracy, repetition count, and instruction following.

Voice Consistency

Our voice consistency check verifies that the entire recording comes from the same speaker. This prevents someone from having another person speak partway through or splicing audio from different sources.

Key Features

  • Voice embedding extraction from audio segments
  • Cosine similarity matching between segments
  • Pass/fail determination with confidence score
  • Anti-spoofing-lite to ensure authentic practice

Technical Details

Audio is segmented into chunks, and voice embeddings are extracted for each segment. The system computes pairwise cosine similarity between embeddings. If similarity drops below a threshold, the voice match fails.

Audio Recording Permissions

Browser Microphone Access

When you click the mic button, your browser will ask for permission to access your microphone. This is a standard browser security feature. You must grant this permission for the app to capture your voice.

What We Access

  • Microphone audio input only (no camera, no screen)
  • Audio is captured at 44.1kHz with echo cancellation and noise suppression
  • Recording is limited to a maximum of 60 seconds

Privacy & Data Handling

  • Audio is sent to the server only when you click Submit
  • Audio data is processed in-memory and not stored permanently
  • You can revoke microphone permission at any time through your browser settings

Troubleshooting

  • If you see "Microphone access denied," click the lock/camera icon in your browser's address bar and allow microphone access
  • Ensure no other application is currently using your microphone
  • HTTPS is required for microphone access (localhost is also allowed)