RunAnywhere React Native SDK Part 4: Building a Voice Assistant with VAD
DEVELOPERSA Complete Voice Assistant Running Entirely On-Device
This is Part 4 of our RunAnywhere React Native SDK tutorial series:
- Chat with LLMs — Project setup and streaming text generation
- Speech-to-Text — Real-time transcription with Whisper
- Text-to-Speech — Natural voice synthesis with Piper
- Voice Pipeline (this post) — Full voice assistant with VAD
This is the culmination of the series: a voice assistant that automatically detects when you stop speaking, processes your request with an LLM, and responds with synthesized speech—all running on-device across iOS and Android.
Prerequisites
- Complete Parts 1-3 to have all three model types (LLM, STT, TTS) working in your project
- Physical device required — the pipeline uses microphone input
- All three models downloaded (~390MB total: 250 + 75 + 65)
Android Note: A physical ARM64 device is required. Emulators will NOT work. See Part 1's Android Setup for complete configuration instructions.
The Voice Pipeline Flow
1┌─────────────────────────────────────────────────────────────────┐2│ Voice Assistant Pipeline │3├─────────────────────────────────────────────────────────────────┤4│ │5│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │6│ │ Record │ -> │ STT │ -> │ LLM │ -> │ TTS │ │7│ │ + VAD │ │ Whisper │ │ LFM2 │ │ Piper │ │8│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │9│ │ │ │10│ │ Auto-stop when │ │11│ └────────── silence detected ────────────────┘ │12│ │13└─────────────────────────────────────────────────────────────────┘
Pipeline State Machine
Create src/hooks/useVoicePipeline.ts:
1import { useState, useCallback, useRef } from 'react'2import { RunAnywhere } from '@runanywhere/core'3import { AudioService } from '../services/AudioService'4import { TTSAudioPlayer } from '../services/TTSAudioPlayer'56// --- Energy-Based Voice Activity Detector ---7// Monitors audio input levels to detect speech start and end.89const SPEECH_THRESHOLD = 0.02 // Level to detect speech start10const SILENCE_THRESHOLD = 0.01 // Level to detect speech end11const SILENCE_DURATION_MS = 1500 // Milliseconds of silence before auto-stop1213class VoiceActivityDetector {14 private isSpeechDetected = false15 private silenceStartTime: number | null = null16 private vadInterval: NodeJS.Timeout | null = null1718 onSpeechEnded: (() => void) | null = null1920 startMonitoring() {21 this.isSpeechDetected = false22 this.silenceStartTime = null2324 this.vadInterval = setInterval(() => {25 const level = AudioService.getInputLevel()2627 // Detect speech start28 if (!this.isSpeechDetected && level > SPEECH_THRESHOLD) {29 this.isSpeechDetected = true30 this.silenceStartTime = null31 console.log('[VAD] Speech detected')32 }3334 // Detect speech end (only after speech was detected)35 if (this.isSpeechDetected) {36 if (level < SILENCE_THRESHOLD) {37 if (this.silenceStartTime === null) {38 this.silenceStartTime = Date.now()39 } else if (Date.now() - this.silenceStartTime >= SILENCE_DURATION_MS) {40 console.log('[VAD] Auto-stopping after silence')41 this.stopMonitoring()42 this.onSpeechEnded?.()43 }44 } else {45 this.silenceStartTime = null // Speech resumed46 }47 }48 }, 100) // Check every 100ms49 }5051 stopMonitoring() {52 if (this.vadInterval) {53 clearInterval(this.vadInterval)54 this.vadInterval = null55 }56 }57}5859// --- Pipeline Hook ---6061export type PipelineState = 'idle' | 'listening' | 'transcribing' | 'thinking' | 'speaking'6263export function useVoicePipeline() {64 const [state, setState] = useState<PipelineState>('idle')65 const [transcribedText, setTranscribedText] = useState('')66 const [responseText, setResponseText] = useState('')67 const [error, setError] = useState<string | null>(null)6869 const audioPlayerRef = useRef(new TTSAudioPlayer())70 const vadRef = useRef(new VoiceActivityDetector())7172 const isReady = useCallback(async (): Promise<boolean> => {73 const isLLMLoaded = await RunAnywhere.isModelLoaded()74 const isSTTLoaded = await RunAnywhere.isSTTModelLoaded()75 const isTTSLoaded = await RunAnywhere.isTTSVoiceLoaded()76 return isLLMLoaded && isSTTLoaded && isTTSLoaded77 }, [])7879 const processRecording = useCallback(async () => {80 // 1. Stop recording81 setState('transcribing')8283 try {84 const audioData = await AudioService.stopRecording()8586 // 2. Transcribe87 const userText = await RunAnywhere.transcribe(audioData)88 setTranscribedText(userText)8990 if (!userText.trim()) {91 setState('idle')92 return93 }9495 // 3. Generate LLM response96 setState('thinking')9798 const prompt = `You are a helpful voice assistant. Keep responses SHORT (2-3 sentences max).99Be conversational and friendly.100101User: ${userText}102Assistant:`103104 const streamResult = await RunAnywhere.generateStream(prompt, {105 maxTokens: 100,106 temperature: 0.7,107 })108109 let response = ''110 for await (const token of streamResult.stream) {111 response += token112 setResponseText(response)113 }114115 // 4. Speak the response116 setState('speaking')117118 const ttsResult = await RunAnywhere.synthesize(response, {119 rate: 1.0,120 pitch: 1.0,121 volume: 1.0,122 })123124 await audioPlayerRef.current.playTTSAudio(ttsResult.audio, ttsResult.sampleRate)125 } catch (e) {126 console.error('Pipeline error:', e)127 setError(e instanceof Error ? e.message : 'Unknown error')128 }129130 setState('idle')131 }, [])132133 const start = useCallback(async () => {134 if (state !== 'idle') return135136 const ready = await isReady()137 if (!ready) {138 setError('Models not loaded. Please load LLM, STT, and TTS first.')139 return140 }141142 setState('listening')143 setTranscribedText('')144 setResponseText('')145 setError(null)146147 try {148 await AudioService.initialize()149 AudioService.startRecording()150151 // Start energy-based VAD monitoring152 vadRef.current.onSpeechEnded = () => {153 processRecording()154 }155 vadRef.current.startMonitoring()156 } catch (e) {157 setError(e instanceof Error ? e.message : 'Failed to start')158 setState('idle')159 }160 }, [state, isReady, processRecording])161162 const stopManually = useCallback(async () => {163 vadRef.current.stopMonitoring()164 await processRecording()165 }, [processRecording])166167 const cancel = useCallback(() => {168 vadRef.current.stopMonitoring()169 audioPlayerRef.current.stop()170 setState('idle')171 }, [])172173 return {174 state,175 transcribedText,176 responseText,177 error,178 start,179 stopManually,180 cancel,181 isReady,182 }183}
AudioService.getInputLevel(): You need to add a
getInputLevel()static method to theAudioServicefrom Part 2. This returns the current RMS audio amplitude (0.0 to 1.0) so the VAD can monitor input levels:typescript1// Add to AudioService from Part 22static getInputLevel(): number {3 // Calculate RMS from the current recording buffer4 if (!this.currentBuffer || this.currentBuffer.length === 0) return 05 const samples = this.currentBuffer6 let sum = 07 for (let i = 0; i < samples.length; i++) {8 sum += samples[i] * samples[i]9 }10 return Math.sqrt(sum / samples.length)11}
Voice Pipeline Screen
Create src/screens/VoiceAssistantScreen.tsx:
1import React, { useEffect, useState } from 'react';2import {3 View,4 Text,5 TouchableOpacity,6 StyleSheet,7} from 'react-native';8import { useVoicePipeline, PipelineState } from '../hooks/useVoicePipeline';910export function VoiceAssistantScreen() {11 const {12 state,13 transcribedText,14 responseText,15 error,16 start,17 stopManually,18 isReady,19 } = useVoicePipeline();2021 const [modelsReady, setModelsReady] = useState(false);2223 useEffect(() => {24 isReady().then(setModelsReady);25 }, [isReady]);2627 function getStateColor(): string {28 switch (state) {29 case 'idle': return '#666';30 case 'listening': return '#ff4444';31 case 'transcribing':32 case 'thinking': return '#ffaa00';33 case 'speaking': return '#44ff44';34 default: return '#666';35 }36 }3738 function getStateText(): string {39 switch (state) {40 case 'idle': return 'Ready';41 case 'listening': return 'Listening...';42 case 'transcribing': return 'Transcribing...';43 case 'thinking': return 'Thinking...';44 case 'speaking': return 'Speaking...';45 default: return 'Ready';46 }47 }4849 function getStateHint(): string {50 switch (state) {51 case 'idle': return 'Tap to start';52 case 'listening': return 'Stops automatically when you pause';53 case 'transcribing': return 'Converting speech to text...';54 case 'thinking': return 'Generating response...';55 case 'speaking': return 'Playing audio response...';56 default: return '';57 }58 }5960 function handleButtonPress() {61 if (state === 'idle') {62 start();63 } else if (state === 'listening') {64 stopManually();65 }66 }6768 return (69 <View style={styles.container}>70 {/* State indicator */}71 <View style={styles.stateIndicator}>72 <View style={[styles.stateDot, { backgroundColor: getStateColor() }]} />73 <Text style={styles.stateText}>{getStateText()}</Text>74 </View>7576 {/* Error message */}77 {error && (78 <View style={styles.errorBox}>79 <Text style={styles.errorText}>{error}</Text>80 </View>81 )}8283 {/* Transcription */}84 {transcribedText !== '' && (85 <View style={[styles.bubble, styles.userBubble]}>86 <Text style={styles.bubbleLabel}>You said:</Text>87 <Text style={styles.bubbleText}>{transcribedText}</Text>88 </View>89 )}9091 {/* Response */}92 {responseText !== '' && (93 <View style={[styles.bubble, styles.assistantBubble]}>94 <Text style={styles.bubbleLabel}>Assistant:</Text>95 <Text style={styles.bubbleText}>{responseText}</Text>96 </View>97 )}9899 <View style={styles.spacer} />100101 {/* Main button */}102 <TouchableOpacity103 style={[104 styles.mainButton,105 state === 'idle' ? styles.buttonIdle : styles.buttonActive,106 ]}107 onPress={handleButtonPress}108 disabled={!modelsReady || (state !== 'idle' && state !== 'listening')}109 >110 <Text style={styles.buttonIcon}>111 {state === 'idle' ? '🎤' : '⬛'}112 </Text>113 </TouchableOpacity>114115 <Text style={styles.hintText}>{getStateHint()}</Text>116117 {!modelsReady && (118 <Text style={styles.warningText}>119 Please load LLM, STT, and TTS models first120 </Text>121 )}122 </View>123 );124}125126const styles = StyleSheet.create({127 container: {128 flex: 1,129 backgroundColor: '#000',130 padding: 24,131 alignItems: 'center',132 },133 stateIndicator: {134 flexDirection: 'row',135 alignItems: 'center',136 marginBottom: 24,137 },138 stateDot: {139 width: 12,140 height: 12,141 borderRadius: 6,142 marginRight: 8,143 },144 stateText: {145 color: '#fff',146 fontSize: 18,147 fontWeight: '500',148 },149 errorBox: {150 backgroundColor: 'rgba(255, 68, 68, 0.1)',151 borderRadius: 8,152 padding: 12,153 marginBottom: 16,154 width: '100%',155 },156 errorText: {157 color: '#ff4444',158 textAlign: 'center',159 },160 bubble: {161 width: '100%',162 padding: 16,163 borderRadius: 12,164 marginBottom: 16,165 },166 userBubble: {167 backgroundColor: 'rgba(0, 122, 255, 0.1)',168 },169 assistantBubble: {170 backgroundColor: 'rgba(68, 255, 68, 0.1)',171 },172 bubbleLabel: {173 color: '#888',174 fontSize: 12,175 marginBottom: 4,176 },177 bubbleText: {178 color: '#fff',179 fontSize: 16,180 },181 spacer: {182 flex: 1,183 },184 mainButton: {185 width: 100,186 height: 100,187 borderRadius: 50,188 justifyContent: 'center',189 alignItems: 'center',190 },191 buttonIdle: {192 backgroundColor: '#007AFF',193 },194 buttonActive: {195 backgroundColor: '#ff4444',196 },197 buttonIcon: {198 fontSize: 36,199 },200 hintText: {201 color: '#666',202 fontSize: 12,203 marginTop: 16,204 },205 warningText: {206 color: '#ffaa00',207 fontSize: 12,208 marginTop: 8,209 },210});

Best Practices
1. Preload Models on App Start
1// In App.tsx or a dedicated initialization screen2async function preloadModels() {3 await downloadAndLoadLLM('lfm2-350m-q4_k_m')4 await downloadAndLoadSTT('sherpa-onnx-whisper-tiny.en')5 await downloadAndLoadTTS('vits-piper-en_US-lessac-medium')6}
2. Audio Format Summary
| Component | Sample Rate | Format | Channels |
|---|---|---|---|
| Recording | 16,000 Hz | Int16 | 1 |
| Whisper STT | 16,000 Hz | Int16 | 1 |
| Piper TTS Output | 22,050 Hz | Float32 (base64) | 1 |
| Audio Playback | Any | WAV/Int16 | 1-2 |
3. Check Model State
1async function isVoiceAgentReady(): Promise<boolean> {2 const [llm, stt, tts] = await Promise.all([3 RunAnywhere.isModelLoaded(),4 RunAnywhere.isSTTModelLoaded(),5 RunAnywhere.isTTSVoiceLoaded(),6 ])7 return llm && stt && tts8}
4. Prevent Concurrent Operations
1const start = useCallback(async () => {2 if (state !== 'idle') return // Prevent double-starts3 // ...4}, [state])
5. Tune VAD for Your Environment
The default thresholds work for quiet environments. Adjust for noisy settings:
1const SPEECH_THRESHOLD = 0.05 // Higher for noisy environments2const SILENCE_THRESHOLD = 0.02 // Higher for noisy environments3const SILENCE_DURATION_MS = 2000 // Longer pause tolerance
Models Reference
| Type | Model ID | Size | Notes |
|---|---|---|---|
| LLM | lfm2-350m-q4_k_m | ~250MB | LiquidAI, fast, efficient |
| STT | sherpa-onnx-whisper-tiny.en | ~75MB | English |
| TTS | vits-piper-en_US-lessac-medium | ~65MB | US English |
Conclusion
You've built a complete voice assistant that:
- Listens with automatic speech detection
- Transcribes using on-device Whisper
- Thinks with a local LLM
- Responds with natural TTS
All processing happens on-device. No data ever leaves the phone. No API keys. No cloud costs. And it works on both iOS and Android from a single codebase.
This is the future of private, cross-platform voice AI.
Complete Source Code
The full source code is available on GitHub:
Includes:
- Complete React Native app with all features
- TypeScript throughout
- Zustand state management
- Tab navigation
Resources
Questions? Open an issue on GitHub or reach out on Twitter/X.