RunAnywhere React Native SDK Part 4: Building a Voice Assistant with VAD

A Complete Voice Assistant Running Entirely On-Device

This is Part 4 of our RunAnywhere React Native SDK tutorial series:

Chat with LLMs — Project setup and streaming text generation
Speech-to-Text — Real-time transcription with Whisper
Text-to-Speech — Natural voice synthesis with Piper
Voice Pipeline (this post) — Full voice assistant with VAD

This is the culmination of the series: a voice assistant that automatically detects when you stop speaking, processes your request with an LLM, and responds with synthesized speech—all running on-device across iOS and Android.

Prerequisites

Complete Parts 1-3 to have all three model types (LLM, STT, TTS) working in your project
Physical device required — the pipeline uses microphone input
All three models downloaded (~390MB total: 250 + 75 + 65)

Android Note: A physical ARM64 device is required. Emulators will NOT work. See Part 1's Android Setup for complete configuration instructions.

The Voice Pipeline Flow

text

1┌─────────────────────────────────────────────────────────────────┐
2│                     Voice Assistant Pipeline                      │
3├─────────────────────────────────────────────────────────────────┤
4│                                                                   │
5│   ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐      │
6│   │  Record │ -> │   STT   │ -> │   LLM   │ -> │   TTS   │      │
7│   │  + VAD  │    │ Whisper │    │  LFM2   │    │  Piper  │      │
8│   └─────────┘    └─────────┘    └─────────┘    └─────────┘      │
9│       │                                              │           │
10│       │          Auto-stop when                      │           │
11│       └──────────  silence detected  ────────────────┘           │
12│                                                                   │
13└─────────────────────────────────────────────────────────────────┘

Pipeline State Machine

Create src/hooks/useVoicePipeline.ts:

typescript

1import { useState, useCallback, useRef } from 'react'
2import { RunAnywhere } from '@runanywhere/core'
3import { AudioService } from '../services/AudioService'
4import { TTSAudioPlayer } from '../services/TTSAudioPlayer'
5
6// --- Energy-Based Voice Activity Detector ---
7// Monitors audio input levels to detect speech start and end.
8
9const SPEECH_THRESHOLD = 0.02 // Level to detect speech start
10const SILENCE_THRESHOLD = 0.01 // Level to detect speech end
11const SILENCE_DURATION_MS = 1500 // Milliseconds of silence before auto-stop
12
13class VoiceActivityDetector {
14  private isSpeechDetected = false
15  private silenceStartTime: number | null = null
16  private vadInterval: NodeJS.Timeout | null = null
17
18  onSpeechEnded: (() => void) | null = null
19
20  startMonitoring() {
21    this.isSpeechDetected = false
22    this.silenceStartTime = null
23
24    this.vadInterval = setInterval(() => {
25      const level = AudioService.getInputLevel()
26
27      // Detect speech start
28      if (!this.isSpeechDetected && level > SPEECH_THRESHOLD) {
29        this.isSpeechDetected = true
30        this.silenceStartTime = null
31        console.log('[VAD] Speech detected')
32      }
33
34      // Detect speech end (only after speech was detected)
35      if (this.isSpeechDetected) {
36        if (level < SILENCE_THRESHOLD) {
37          if (this.silenceStartTime === null) {
38            this.silenceStartTime = Date.now()
39          } else if (Date.now() - this.silenceStartTime >= SILENCE_DURATION_MS) {
40            console.log('[VAD] Auto-stopping after silence')
41            this.stopMonitoring()
42            this.onSpeechEnded?.()
43          }
44        } else {
45          this.silenceStartTime = null // Speech resumed
46        }
47      }
48    }, 100) // Check every 100ms
49  }
50
51  stopMonitoring() {
52    if (this.vadInterval) {
53      clearInterval(this.vadInterval)
54      this.vadInterval = null
55    }
56  }
57}
58
59// --- Pipeline Hook ---
60
61export type PipelineState = 'idle' | 'listening' | 'transcribing' | 'thinking' | 'speaking'
62
63export function useVoicePipeline() {
64  const [state, setState] = useState<PipelineState>('idle')
65  const [transcribedText, setTranscribedText] = useState('')
66  const [responseText, setResponseText] = useState('')
67  const [error, setError] = useState<string | null>(null)
68
69  const audioPlayerRef = useRef(new TTSAudioPlayer())
70  const vadRef = useRef(new VoiceActivityDetector())
71
72  const isReady = useCallback(async (): Promise<boolean> => {
73    const isLLMLoaded = await RunAnywhere.isModelLoaded()
74    const isSTTLoaded = await RunAnywhere.isSTTModelLoaded()
75    const isTTSLoaded = await RunAnywhere.isTTSVoiceLoaded()
76    return isLLMLoaded && isSTTLoaded && isTTSLoaded
77  }, [])
78
79  const processRecording = useCallback(async () => {
80    // 1. Stop recording
81    setState('transcribing')
82
83    try {
84      const audioData = await AudioService.stopRecording()
85
86      // 2. Transcribe
87      const userText = await RunAnywhere.transcribe(audioData)
88      setTranscribedText(userText)
89
90      if (!userText.trim()) {
91        setState('idle')
92        return
93      }
94
95      // 3. Generate LLM response
96      setState('thinking')
97
98      const prompt = `You are a helpful voice assistant. Keep responses SHORT (2-3 sentences max).
99Be conversational and friendly.
100
101User: ${userText}
102Assistant:`
103
104      const streamResult = await RunAnywhere.generateStream(prompt, {
105        maxTokens: 100,
106        temperature: 0.7,
107      })
108
109      let response = ''
110      for await (const token of streamResult.stream) {
111        response += token
112        setResponseText(response)
113      }
114
115      // 4. Speak the response
116      setState('speaking')
117
118      const ttsResult = await RunAnywhere.synthesize(response, {
119        rate: 1.0,
120        pitch: 1.0,
121        volume: 1.0,
122      })
123
124      await audioPlayerRef.current.playTTSAudio(ttsResult.audio, ttsResult.sampleRate)
125    } catch (e) {
126      console.error('Pipeline error:', e)
127      setError(e instanceof Error ? e.message : 'Unknown error')
128    }
129
130    setState('idle')
131  }, [])
132
133  const start = useCallback(async () => {
134    if (state !== 'idle') return
135
136    const ready = await isReady()
137    if (!ready) {
138      setError('Models not loaded. Please load LLM, STT, and TTS first.')
139      return
140    }
141
142    setState('listening')
143    setTranscribedText('')
144    setResponseText('')
145    setError(null)
146
147    try {
148      await AudioService.initialize()
149      AudioService.startRecording()
150
151      // Start energy-based VAD monitoring
152      vadRef.current.onSpeechEnded = () => {
153        processRecording()
154      }
155      vadRef.current.startMonitoring()
156    } catch (e) {
157      setError(e instanceof Error ? e.message : 'Failed to start')
158      setState('idle')
159    }
160  }, [state, isReady, processRecording])
161
162  const stopManually = useCallback(async () => {
163    vadRef.current.stopMonitoring()
164    await processRecording()
165  }, [processRecording])
166
167  const cancel = useCallback(() => {
168    vadRef.current.stopMonitoring()
169    audioPlayerRef.current.stop()
170    setState('idle')
171  }, [])
172
173  return {
174    state,
175    transcribedText,
176    responseText,
177    error,
178    start,
179    stopManually,
180    cancel,
181    isReady,
182  }
183}

AudioService.getInputLevel(): You need to add a getInputLevel() static method to the AudioService from Part 2. This returns the current RMS audio amplitude (0.0 to 1.0) so the VAD can monitor input levels:
typescript
1// Add to AudioService from Part 2
2static getInputLevel(): number {
3  // Calculate RMS from the current recording buffer
4  if (!this.currentBuffer || this.currentBuffer.length === 0) return 0
5  const samples = this.currentBuffer
6  let sum = 0
7  for (let i = 0; i < samples.length; i++) {
8    sum += samples[i] * samples[i]
9  }
10  return Math.sqrt(sum / samples.length)
11}

Voice Pipeline Screen

Create src/screens/VoiceAssistantScreen.tsx:

typescript

1import React, { useEffect, useState } from 'react';
2import {
3  View,
4  Text,
5  TouchableOpacity,
6  StyleSheet,
7} from 'react-native';
8import { useVoicePipeline, PipelineState } from '../hooks/useVoicePipeline';
9
10export function VoiceAssistantScreen() {
11  const {
12    state,
13    transcribedText,
14    responseText,
15    error,
16    start,
17    stopManually,
18    isReady,
19  } = useVoicePipeline();
20
21  const [modelsReady, setModelsReady] = useState(false);
22
23  useEffect(() => {
24    isReady().then(setModelsReady);
25  }, [isReady]);
26
27  function getStateColor(): string {
28    switch (state) {
29      case 'idle': return '#666';
30      case 'listening': return '#ff4444';
31      case 'transcribing':
32      case 'thinking': return '#ffaa00';
33      case 'speaking': return '#44ff44';
34      default: return '#666';
35    }
36  }
37
38  function getStateText(): string {
39    switch (state) {
40      case 'idle': return 'Ready';
41      case 'listening': return 'Listening...';
42      case 'transcribing': return 'Transcribing...';
43      case 'thinking': return 'Thinking...';
44      case 'speaking': return 'Speaking...';
45      default: return 'Ready';
46    }
47  }
48
49  function getStateHint(): string {
50    switch (state) {
51      case 'idle': return 'Tap to start';
52      case 'listening': return 'Stops automatically when you pause';
53      case 'transcribing': return 'Converting speech to text...';
54      case 'thinking': return 'Generating response...';
55      case 'speaking': return 'Playing audio response...';
56      default: return '';
57    }
58  }
59
60  function handleButtonPress() {
61    if (state === 'idle') {
62      start();
63    } else if (state === 'listening') {
64      stopManually();
65    }
66  }
67
68  return (
69    <View style={styles.container}>
70      {/* State indicator */}
71      <View style={styles.stateIndicator}>
72        <View style={[styles.stateDot, { backgroundColor: getStateColor() }]} />
73        <Text style={styles.stateText}>{getStateText()}</Text>
74      </View>
75
76      {/* Error message */}
77      {error && (
78        <View style={styles.errorBox}>
79          <Text style={styles.errorText}>{error}</Text>
80        </View>
81      )}
82
83      {/* Transcription */}
84      {transcribedText !== '' && (
85        <View style={[styles.bubble, styles.userBubble]}>
86          <Text style={styles.bubbleLabel}>You said:</Text>
87          <Text style={styles.bubbleText}>{transcribedText}</Text>
88        </View>
89      )}
90
91      {/* Response */}
92      {responseText !== '' && (
93        <View style={[styles.bubble, styles.assistantBubble]}>
94          <Text style={styles.bubbleLabel}>Assistant:</Text>
95          <Text style={styles.bubbleText}>{responseText}</Text>
96        </View>
97      )}
98
99      <View style={styles.spacer} />
100
101      {/* Main button */}
102      <TouchableOpacity
103        style={[
104          styles.mainButton,
105          state === 'idle' ? styles.buttonIdle : styles.buttonActive,
106        ]}
107        onPress={handleButtonPress}
108        disabled={!modelsReady || (state !== 'idle' && state !== 'listening')}
109      >
110        <Text style={styles.buttonIcon}>
111          {state === 'idle' ? '🎤' : '⬛'}
112        </Text>
113      </TouchableOpacity>
114
115      <Text style={styles.hintText}>{getStateHint()}</Text>
116
117      {!modelsReady && (
118        <Text style={styles.warningText}>
119          Please load LLM, STT, and TTS models first
120        </Text>
121      )}
122    </View>
123  );
124}
125
126const styles = StyleSheet.create({
127  container: {
128    flex: 1,
129    backgroundColor: '#000',
130    padding: 24,
131    alignItems: 'center',
132  },
133  stateIndicator: {
134    flexDirection: 'row',
135    alignItems: 'center',
136    marginBottom: 24,
137  },
138  stateDot: {
139    width: 12,
140    height: 12,
141    borderRadius: 6,
142    marginRight: 8,
143  },
144  stateText: {
145    color: '#fff',
146    fontSize: 18,
147    fontWeight: '500',
148  },
149  errorBox: {
150    backgroundColor: 'rgba(255, 68, 68, 0.1)',
151    borderRadius: 8,
152    padding: 12,
153    marginBottom: 16,
154    width: '100%',
155  },
156  errorText: {
157    color: '#ff4444',
158    textAlign: 'center',
159  },
160  bubble: {
161    width: '100%',
162    padding: 16,
163    borderRadius: 12,
164    marginBottom: 16,
165  },
166  userBubble: {
167    backgroundColor: 'rgba(0, 122, 255, 0.1)',
168  },
169  assistantBubble: {
170    backgroundColor: 'rgba(68, 255, 68, 0.1)',
171  },
172  bubbleLabel: {
173    color: '#888',
174    fontSize: 12,
175    marginBottom: 4,
176  },
177  bubbleText: {
178    color: '#fff',
179    fontSize: 16,
180  },
181  spacer: {
182    flex: 1,
183  },
184  mainButton: {
185    width: 100,
186    height: 100,
187    borderRadius: 50,
188    justifyContent: 'center',
189    alignItems: 'center',
190  },
191  buttonIdle: {
192    backgroundColor: '#007AFF',
193  },
194  buttonActive: {
195    backgroundColor: '#ff4444',
196  },
197  buttonIcon: {
198    fontSize: 36,
199  },
200  hintText: {
201    color: '#666',
202    fontSize: 12,
203    marginTop: 16,
204  },
205  warningText: {
206    color: '#ffaa00',
207    fontSize: 12,
208    marginTop: 8,
209  },
210});

Best Practices

1. Preload Models on App Start

typescript

1// In App.tsx or a dedicated initialization screen
2async function preloadModels() {
3  await downloadAndLoadLLM('lfm2-350m-q4_k_m')
4  await downloadAndLoadSTT('sherpa-onnx-whisper-tiny.en')
5  await downloadAndLoadTTS('vits-piper-en_US-lessac-medium')
6}

2. Audio Format Summary

Component	Sample Rate	Format	Channels
Recording	16,000 Hz	Int16	1
Whisper STT	16,000 Hz	Int16	1
Piper TTS Output	22,050 Hz	Float32 (base64)	1
Audio Playback	Any	WAV/Int16	1-2

3. Check Model State

typescript

1async function isVoiceAgentReady(): Promise<boolean> {
2  const [llm, stt, tts] = await Promise.all([
3    RunAnywhere.isModelLoaded(),
4    RunAnywhere.isSTTModelLoaded(),
5    RunAnywhere.isTTSVoiceLoaded(),
6  ])
7  return llm && stt && tts
8}

4. Prevent Concurrent Operations

typescript

1const start = useCallback(async () => {
2  if (state !== 'idle') return // Prevent double-starts
3  // ...
4}, [state])

5. Tune VAD for Your Environment

The default thresholds work for quiet environments. Adjust for noisy settings:

typescript

1const SPEECH_THRESHOLD = 0.05 // Higher for noisy environments
2const SILENCE_THRESHOLD = 0.02 // Higher for noisy environments
3const SILENCE_DURATION_MS = 2000 // Longer pause tolerance

Models Reference

Type	Model ID	Size	Notes
LLM	lfm2-350m-q4_k_m	~250MB	LiquidAI, fast, efficient
STT	sherpa-onnx-whisper-tiny.en	~75MB	English
TTS	vits-piper-en_US-lessac-medium	~65MB	US English

Conclusion

You've built a complete voice assistant that:

Listens with automatic speech detection
Transcribes using on-device Whisper
Thinks with a local LLM
Responds with natural TTS

All processing happens on-device. No data ever leaves the phone. No API keys. No cloud costs. And it works on both iOS and Android from a single codebase.

This is the future of private, cross-platform voice AI.

Complete Source Code

The full source code is available on GitHub:

React Native Starter App

Includes:

Complete React Native app with all features
TypeScript throughout
Zustand state management
Tab navigation

Resources

Questions? Open an issue on GitHub or reach out on Twitter/X.