How Bantrly Works

A deep dive into the four core capabilities powering every 1-minute voice lesson

The Pipeline

Step 1

Audio Capture

Browser MediaRecorder API captures audio with echo cancellation and noise suppression at 44.1kHz. Audio is encoded as WebM/Opus.

Step 2

Preprocessing

Audio is validated for duration and quality. Silence detection ensures there is actual speech content before running the pipeline.

Step 3

Analysis Pipeline

Three parallel processes run: transcription (speech-to-text), voice embedding extraction (speaker verification), and the raw audio is stored for reference.

Step 4

Scoring

The LLM evaluates the transcript for task adherence and generates scores across 6 dimensions: Pronunciation, Prosody, Speaking Rate, Fluency, Volume, and Mastery.

Capabilities Deep Dive

Voice-Based Instructions

Each coach persona delivers unique spoken prompts using browser text-to-speech synthesis. The system generates contextual instructions tailored to the coach's personality and the current exercise type.

Key Features

6 distinct coach personas with unique speaking styles
Browser-native speech synthesis for instant audio playback
Contextual prompts spanning pronunciation, fluency, prosody, and more
Randomized prompt selection to keep practice sessions fresh

Technical Details

Uses the Web Speech API (SpeechSynthesisUtterance) for voice output. Each coach has a defined personality template that shapes how prompts are framed and delivered.

High-Fidelity Transcription

Unlike standard speech-to-text that cleans up your words, our pipeline preserves every filler word, hesitation, and false start. You see exactly what you said - "um"s, "uh"s, and all.

Key Features

Raw transcript output with zero auto-correction
Filler words preserved: "um," "uh," "like," "you know"
False starts and self-corrections kept intact
Designed for honest self-assessment, not polished output

Technical Details

The transcription pipeline is configured with verbatim mode enabled, disabling inverse text normalization and disfluency filtering. Supports model-swapping for different speech-to-text engines.

Task Adherence Checks

After transcription, the system compares your spoken content against the original prompt instructions. Did you repeat 3 times? Did you include the required words? A detailed score and rationale tell you exactly where you deviated.

Key Features

Compares transcript against prompt requirements
Counts repetitions, checks for required phrases
Generates a 0-100 adherence score
Provides a written rationale explaining the score

Technical Details

An LLM receives both the prompt text and raw transcript, then evaluates adherence across multiple dimensions: completeness, accuracy, repetition count, and instruction following.

Voice Consistency

Our voice consistency check verifies that the entire recording comes from the same speaker. This prevents someone from having another person speak partway through or splicing audio from different sources.

Key Features

Voice embedding extraction from audio segments
Cosine similarity matching between segments
Pass/fail determination with confidence score
Anti-spoofing-lite to ensure authentic practice

Technical Details

Audio is segmented into chunks, and voice embeddings are extracted for each segment. The system computes pairwise cosine similarity between embeddings. If similarity drops below a threshold, the voice match fails.

Audio Recording Permissions

Browser Microphone Access

When you click the mic button, your browser will ask for permission to access your microphone. This is a standard browser security feature. You must grant this permission for the app to capture your voice.

What We Access

Microphone audio input only (no camera, no screen)
Audio is captured at 44.1kHz with echo cancellation and noise suppression
Recording is limited to a maximum of 60 seconds

Privacy & Data Handling

Audio is sent to the server only when you click Submit
Audio data is processed in-memory and not stored permanently
You can revoke microphone permission at any time through your browser settings

Troubleshooting

If you see "Microphone access denied," click the lock/camera icon in your browser's address bar and allow microphone access
Ensure no other application is currently using your microphone
HTTPS is required for microphone access (localhost is also allowed)

Start Your First Lesson