How Bantrly Works
A deep dive into the four core capabilities powering every 1-minute voice lesson
The Pipeline
Step 1
Audio Capture
Browser MediaRecorder API captures audio with echo cancellation and noise suppression at 44.1kHz. Audio is encoded as WebM/Opus.
Step 2
Preprocessing
Audio is validated for duration and quality. Silence detection ensures there is actual speech content before running the pipeline.
Step 3
Analysis Pipeline
Three parallel processes run: transcription (speech-to-text), voice embedding extraction (speaker verification), and the raw audio is stored for reference.
Step 4
Scoring
The LLM evaluates the transcript for task adherence and generates scores across 6 dimensions: Pronunciation, Prosody, Speaking Rate, Fluency, Volume, and Mastery.
Capabilities Deep Dive
Voice-Based Instructions
Each coach persona delivers unique spoken prompts using browser text-to-speech synthesis. The system generates contextual instructions tailored to the coach's personality and the current exercise type.
Key Features
- 6 distinct coach personas with unique speaking styles
- Browser-native speech synthesis for instant audio playback
- Contextual prompts spanning pronunciation, fluency, prosody, and more
- Randomized prompt selection to keep practice sessions fresh
Technical Details
Uses the Web Speech API (SpeechSynthesisUtterance) for voice output. Each coach has a defined personality template that shapes how prompts are framed and delivered.
High-Fidelity Transcription
Unlike standard speech-to-text that cleans up your words, our pipeline preserves every filler word, hesitation, and false start. You see exactly what you said - "um"s, "uh"s, and all.
Key Features
- Raw transcript output with zero auto-correction
- Filler words preserved: "um," "uh," "like," "you know"
- False starts and self-corrections kept intact
- Designed for honest self-assessment, not polished output
Technical Details
The transcription pipeline is configured with verbatim mode enabled, disabling inverse text normalization and disfluency filtering. Supports model-swapping for different speech-to-text engines.
Task Adherence Checks
After transcription, the system compares your spoken content against the original prompt instructions. Did you repeat 3 times? Did you include the required words? A detailed score and rationale tell you exactly where you deviated.
Key Features
- Compares transcript against prompt requirements
- Counts repetitions, checks for required phrases
- Generates a 0-100 adherence score
- Provides a written rationale explaining the score
Technical Details
An LLM receives both the prompt text and raw transcript, then evaluates adherence across multiple dimensions: completeness, accuracy, repetition count, and instruction following.
Voice Consistency
Our voice consistency check verifies that the entire recording comes from the same speaker. This prevents someone from having another person speak partway through or splicing audio from different sources.
Key Features
- Voice embedding extraction from audio segments
- Cosine similarity matching between segments
- Pass/fail determination with confidence score
- Anti-spoofing-lite to ensure authentic practice
Technical Details
Audio is segmented into chunks, and voice embeddings are extracted for each segment. The system computes pairwise cosine similarity between embeddings. If similarity drops below a threshold, the voice match fails.
Audio Recording Permissions
Browser Microphone Access
When you click the mic button, your browser will ask for permission to access your microphone. This is a standard browser security feature. You must grant this permission for the app to capture your voice.
What We Access
- Microphone audio input only (no camera, no screen)
- Audio is captured at 44.1kHz with echo cancellation and noise suppression
- Recording is limited to a maximum of 60 seconds
Privacy & Data Handling
- Audio is sent to the server only when you click Submit
- Audio data is processed in-memory and not stored permanently
- You can revoke microphone permission at any time through your browser settings
Troubleshooting
- If you see "Microphone access denied," click the lock/camera icon in your browser's address bar and allow microphone access
- Ensure no other application is currently using your microphone
- HTTPS is required for microphone access (localhost is also allowed)