February 1, 2026

·

RunAnywhere React Native SDK Part 2: Speech-to-Text with Whisper

RunAnywhere React Native SDK Part 2: Speech-to-Text with Whisper
DEVELOPERS

Real-Time Transcription with On-Device Whisper


This is Part 2 of our RunAnywhere React Native SDK tutorial series:

  1. Chat with LLMs — Project setup and streaming text generation
  2. Speech-to-Text (this post) — Real-time transcription with Whisper
  3. Text-to-Speech — Natural voice synthesis with Piper
  4. Voice Pipeline — Full voice assistant with VAD

Speech recognition unlocks natural interaction with your app. With RunAnywhere, you can run Whisper entirely on-device—no network requests, no privacy concerns, no API costs—across both iOS and Android.

The key challenge in React Native is handling native audio recording while ensuring the output format matches what Whisper expects.

Prerequisites

  • Complete Part 1 first to set up your project with the RunAnywhere SDK
  • Physical device required — simulators have limited microphone support
  • ~75MB additional storage for the Whisper model

Android Note: A physical ARM64 device is required. Emulators will NOT work. See Part 1's Android Setup for complete configuration instructions.

Register the STT Model

Add Whisper to your model registration in App.tsx:

typescript
1import { RunAnywhere, ModelCategory } from '@runanywhere/core'
2import { ModelArtifactType } from '@runanywhere/onnx'
3
4// Register STT model (Whisper)
5RunAnywhere.registerModel({
6 id: 'sherpa-onnx-whisper-tiny.en',
7 name: 'Whisper Tiny English',
8 url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
9 framework: 'onnx',
10 modality: ModelCategory.SpeechRecognition,
11 artifactType: ModelArtifactType.TarGzArchive,
12 memoryRequirement: 75_000_000,
13})

Critical: Audio Format Requirements

Whisper requires a very specific audio format:

ParameterRequired Value
Sample Rate16,000 Hz
Channels1 (mono)
Format16-bit signed integer (Int16) PCM

React Native doesn't have built-in audio recording, so you'll need to use a native module or library.

Setting Up Audio Recording

Install the audio recording library and file system access:

bash
1npm install react-native-audio-record react-native-fs
2cd ios && pod install && cd ..

Create src/services/AudioService.ts:

typescript
1import AudioRecord from 'react-native-audio-record'
2import RNFS from 'react-native-fs'
3import { PermissionsAndroid, Platform } from 'react-native'
4
5class AudioServiceClass {
6 private isInitialized = false
7
8 async requestPermission(): Promise<boolean> {
9 if (Platform.OS === 'android') {
10 const result = await PermissionsAndroid.request(PermissionsAndroid.PERMISSIONS.RECORD_AUDIO, {
11 title: 'Microphone Permission',
12 message: 'This app needs microphone access for voice AI features.',
13 buttonPositive: 'Grant',
14 buttonNegative: 'Deny',
15 })
16 return result === PermissionsAndroid.RESULTS.GRANTED
17 }
18 // iOS handles permissions via Info.plist
19 return true
20 }
21
22 async initialize(): Promise<void> {
23 if (this.isInitialized) return
24
25 const hasPermission = await this.requestPermission()
26 if (!hasPermission) {
27 throw new Error('Microphone permission denied')
28 }
29
30 // Configure for Whisper: 16kHz, mono, PCM
31 AudioRecord.init({
32 sampleRate: 16000,
33 channels: 1,
34 bitsPerSample: 16,
35 wavFile: 'recording.wav',
36 })
37
38 this.isInitialized = true
39 }
40
41 startRecording(): void {
42 AudioRecord.start()
43 console.log('Recording started')
44 }
45
46 async stopRecording(): Promise<Uint8Array> {
47 const audioPath = await AudioRecord.stop()
48 console.log('Recording stopped:', audioPath)
49
50 // Read the WAV file as bytes for transcription
51 const base64 = await RNFS.readFile(audioPath, 'base64')
52 const binary = atob(base64)
53 const bytes = new Uint8Array(binary.length)
54 for (let i = 0; i < binary.length; i++) {
55 bytes[i] = binary.charCodeAt(i)
56 }
57
58 return bytes
59 }
60}
61
62export const AudioService = new AudioServiceClass()

Important: The 16kHz sample rate and mono channel configuration are non-negotiable. Sending audio in a different format will produce garbage output.

Loading and Using STT

Create src/hooks/useSTT.ts:

typescript
1import { useState, useCallback } from 'react'
2import { RunAnywhere } from '@runanywhere/core'
3
4export function useSTT() {
5 const [isLoaded, setIsLoaded] = useState(false)
6 const [isLoading, setIsLoading] = useState(false)
7 const [downloadProgress, setDownloadProgress] = useState(0)
8
9 const loadModel = useCallback(async () => {
10 setIsLoading(true)
11 const modelId = 'sherpa-onnx-whisper-tiny.en'
12
13 try {
14 // Check if already downloaded
15 const isDownloaded = await RunAnywhere.isModelDownloaded(modelId)
16
17 if (!isDownloaded) {
18 await RunAnywhere.downloadModel(modelId, (progress) => {
19 setDownloadProgress(progress.progress)
20 })
21 }
22
23 // Load STT model into memory
24 await RunAnywhere.loadSTTModel(modelId)
25 setIsLoaded(true)
26 console.log('STT model loaded successfully')
27 } catch (e) {
28 console.error('STT load error:', e)
29 throw e
30 } finally {
31 setIsLoading(false)
32 }
33 }, [])
34
35 const transcribe = useCallback(
36 async (audioData: Uint8Array): Promise<string> => {
37 if (!isLoaded) throw new Error('STT model not loaded')
38
39 // Transcribe raw audio bytes (must be 16kHz Int16 PCM!)
40 const text = await RunAnywhere.transcribe(audioData)
41 return text
42 },
43 [isLoaded]
44 )
45
46 return {
47 isLoaded,
48 isLoading,
49 downloadProgress,
50 loadModel,
51 transcribe,
52 }
53}

Why loadSTTModel() instead of loadModel()? The SDK uses separate methods for each modality: loadModel() for LLMs, loadSTTModel() for speech-to-text, and loadTTSVoice() for text-to-speech. This reflects that each uses a different runtime (LlamaCPP vs ONNX) and can be loaded simultaneously without conflicts.

Complete STT Screen

Create src/screens/STTScreen.tsx:

typescript
1import React, { useState, useEffect } from 'react';
2import {
3 View,
4 Text,
5 TouchableOpacity,
6 StyleSheet,
7 ActivityIndicator,
8} from 'react-native';
9import { AudioService } from '../services/AudioService';
10import { useSTT } from '../hooks/useSTT';
11
12export function STTScreen() {
13 const [isRecording, setIsRecording] = useState(false);
14 const [isTranscribing, setIsTranscribing] = useState(false);
15 const [transcription, setTranscription] = useState('');
16
17 const { isLoaded, isLoading, downloadProgress, loadModel, transcribe } = useSTT();
18
19 useEffect(() => {
20 async function setup() {
21 await AudioService.initialize();
22 await loadModel();
23 }
24 setup();
25 }, [loadModel]);
26
27 async function toggleRecording() {
28 if (isRecording) {
29 await stopAndTranscribe();
30 } else {
31 await startRecording();
32 }
33 }
34
35 async function startRecording() {
36 try {
37 AudioService.startRecording();
38 setIsRecording(true);
39 setTranscription('');
40 } catch (e) {
41 console.error('Failed to start recording:', e);
42 }
43 }
44
45 async function stopAndTranscribe() {
46 setIsRecording(false);
47 setIsTranscribing(true);
48
49 try {
50 const audioData = await AudioService.stopRecording();
51 const text = await transcribe(audioData);
52 setTranscription(text);
53 } catch (e) {
54 setTranscription(`Error: ${e instanceof Error ? e.message : 'Unknown error'}`);
55 } finally {
56 setIsTranscribing(false);
57 }
58 }
59
60 if (isLoading) {
61 return (
62 <View style={styles.container}>
63 <Text style={styles.statusText}>
64 Downloading model... {(downloadProgress * 100).toFixed(0)}%
65 </Text>
66 <View style={styles.progressBar}>
67 <View style={[styles.progressFill, { width: `${downloadProgress * 100}%` }]} />
68 </View>
69 </View>
70 );
71 }
72
73 return (
74 <View style={styles.container}>
75 {/* Transcription display */}
76 <View style={styles.transcriptionBox}>
77 <Text style={styles.transcriptionText}>
78 {transcription || 'Tap the microphone to record...'}
79 </Text>
80 </View>
81
82 {/* Record button */}
83 <TouchableOpacity
84 style={[styles.recordButton, isRecording && styles.recordingActive]}
85 onPress={toggleRecording}
86 disabled={!isLoaded || isTranscribing}
87 >
88 <Text style={styles.recordButtonIcon}>
89 {isRecording ? '⬛' : '🎤'}
90 </Text>
91 </TouchableOpacity>
92
93 {isTranscribing && (
94 <View style={styles.transcribingRow}>
95 <ActivityIndicator size="small" color="#fff" />
96 <Text style={styles.transcribingText}>Transcribing...</Text>
97 </View>
98 )}
99 </View>
100 );
101}
102
103const styles = StyleSheet.create({
104 container: {
105 flex: 1,
106 backgroundColor: '#000',
107 padding: 24,
108 justifyContent: 'center',
109 alignItems: 'center',
110 },
111 statusText: {
112 color: '#fff',
113 fontSize: 16,
114 marginBottom: 16,
115 },
116 progressBar: {
117 width: '100%',
118 height: 8,
119 backgroundColor: '#333',
120 borderRadius: 4,
121 overflow: 'hidden',
122 },
123 progressFill: {
124 height: '100%',
125 backgroundColor: '#007AFF',
126 },
127 transcriptionBox: {
128 width: '100%',
129 minHeight: 100,
130 backgroundColor: '#111',
131 borderRadius: 12,
132 padding: 16,
133 marginBottom: 48,
134 },
135 transcriptionText: {
136 color: '#fff',
137 fontSize: 16,
138 lineHeight: 24,
139 },
140 recordButton: {
141 width: 100,
142 height: 100,
143 borderRadius: 50,
144 backgroundColor: '#007AFF',
145 justifyContent: 'center',
146 alignItems: 'center',
147 },
148 recordingActive: {
149 backgroundColor: '#ff4444',
150 },
151 recordButtonIcon: {
152 fontSize: 36,
153 },
154 transcribingRow: {
155 flexDirection: 'row',
156 alignItems: 'center',
157 marginTop: 24,
158 },
159 transcribingText: {
160 color: '#fff',
161 marginLeft: 8,
162 },
163});

Memory Management

When you're done with STT, unload the model to free memory:

typescript
1// Unload STT model to free memory
2await RunAnywhere.unloadSTTModel()

STT models can be loaded independently alongside the LLM—they don't conflict.

Speech-to-text recording interface on device

Models Reference

Model IDSizeNotes
sherpa-onnx-whisper-tiny.en~75MBEnglish, real-time capable

What's Next

In Part 3, we'll add text-to-speech with Piper, including audio playback across both platforms.


Resources


Questions? Open an issue on GitHub or reach out on Twitter/X.

RunAnywhere Logo

RunAnywhere

Connect with developers, share ideas, get support, and stay updated on the latest features. Our Discord community is the heart of everything we build.

Company

Copyright © 2025 RunAnywhere, Inc.