RunAnywhere React Native SDK Part 2: Speech-to-Text with Whisper

Real-Time Transcription with On-Device Whisper

This is Part 2 of our RunAnywhere React Native SDK tutorial series:

Chat with LLMs — Project setup and streaming text generation
Speech-to-Text (this post) — Real-time transcription with Whisper
Text-to-Speech — Natural voice synthesis with Piper
Voice Pipeline — Full voice assistant with VAD

Speech recognition unlocks natural interaction with your app. With RunAnywhere, you can run Whisper entirely on-device—no network requests, no privacy concerns, no API costs—across both iOS and Android.

The key challenge in React Native is handling native audio recording while ensuring the output format matches what Whisper expects.

Prerequisites

Complete Part 1 first to set up your project with the RunAnywhere SDK
Physical device required — simulators have limited microphone support
~75MB additional storage for the Whisper model

Android Note: A physical ARM64 device is required. Emulators will NOT work. See Part 1's Android Setup for complete configuration instructions.

Register the STT Model

Add Whisper to your model registration in App.tsx:

typescript

1import { RunAnywhere, ModelCategory } from '@runanywhere/core'
2import { ModelArtifactType } from '@runanywhere/onnx'
3
4// Register STT model (Whisper)
5RunAnywhere.registerModel({
6  id: 'sherpa-onnx-whisper-tiny.en',
7  name: 'Whisper Tiny English',
8  url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
9  framework: 'onnx',
10  modality: ModelCategory.SpeechRecognition,
11  artifactType: ModelArtifactType.TarGzArchive,
12  memoryRequirement: 75_000_000,
13})

Critical: Audio Format Requirements

Whisper requires a very specific audio format:

Parameter	Required Value
Sample Rate	16,000 Hz
Channels	1 (mono)
Format	16-bit signed integer (Int16) PCM

React Native doesn't have built-in audio recording, so you'll need to use a native module or library.

Setting Up Audio Recording

Install the audio recording library and file system access:

bash

1npm install react-native-audio-record react-native-fs
2cd ios && pod install && cd ..

Create src/services/AudioService.ts:

typescript

1import AudioRecord from 'react-native-audio-record'
2import RNFS from 'react-native-fs'
3import { PermissionsAndroid, Platform } from 'react-native'
4
5class AudioServiceClass {
6  private isInitialized = false
7
8  async requestPermission(): Promise<boolean> {
9    if (Platform.OS === 'android') {
10      const result = await PermissionsAndroid.request(PermissionsAndroid.PERMISSIONS.RECORD_AUDIO, {
11        title: 'Microphone Permission',
12        message: 'This app needs microphone access for voice AI features.',
13        buttonPositive: 'Grant',
14        buttonNegative: 'Deny',
15      })
16      return result === PermissionsAndroid.RESULTS.GRANTED
17    }
18    // iOS handles permissions via Info.plist
19    return true
20  }
21
22  async initialize(): Promise<void> {
23    if (this.isInitialized) return
24
25    const hasPermission = await this.requestPermission()
26    if (!hasPermission) {
27      throw new Error('Microphone permission denied')
28    }
29
30    // Configure for Whisper: 16kHz, mono, PCM
31    AudioRecord.init({
32      sampleRate: 16000,
33      channels: 1,
34      bitsPerSample: 16,
35      wavFile: 'recording.wav',
36    })
37
38    this.isInitialized = true
39  }
40
41  startRecording(): void {
42    AudioRecord.start()
43    console.log('Recording started')
44  }
45
46  async stopRecording(): Promise<Uint8Array> {
47    const audioPath = await AudioRecord.stop()
48    console.log('Recording stopped:', audioPath)
49
50    // Read the WAV file as bytes for transcription
51    const base64 = await RNFS.readFile(audioPath, 'base64')
52    const binary = atob(base64)
53    const bytes = new Uint8Array(binary.length)
54    for (let i = 0; i < binary.length; i++) {
55      bytes[i] = binary.charCodeAt(i)
56    }
57
58    return bytes
59  }
60}
61
62export const AudioService = new AudioServiceClass()

Important: The 16kHz sample rate and mono channel configuration are non-negotiable. Sending audio in a different format will produce garbage output.

Loading and Using STT

Create src/hooks/useSTT.ts:

typescript

1import { useState, useCallback } from 'react'
2import { RunAnywhere } from '@runanywhere/core'
3
4export function useSTT() {
5  const [isLoaded, setIsLoaded] = useState(false)
6  const [isLoading, setIsLoading] = useState(false)
7  const [downloadProgress, setDownloadProgress] = useState(0)
8
9  const loadModel = useCallback(async () => {
10    setIsLoading(true)
11    const modelId = 'sherpa-onnx-whisper-tiny.en'
12
13    try {
14      // Check if already downloaded
15      const isDownloaded = await RunAnywhere.isModelDownloaded(modelId)
16
17      if (!isDownloaded) {
18        await RunAnywhere.downloadModel(modelId, (progress) => {
19          setDownloadProgress(progress.progress)
20        })
21      }
22
23      // Load STT model into memory
24      await RunAnywhere.loadSTTModel(modelId)
25      setIsLoaded(true)
26      console.log('STT model loaded successfully')
27    } catch (e) {
28      console.error('STT load error:', e)
29      throw e
30    } finally {
31      setIsLoading(false)
32    }
33  }, [])
34
35  const transcribe = useCallback(
36    async (audioData: Uint8Array): Promise<string> => {
37      if (!isLoaded) throw new Error('STT model not loaded')
38
39      // Transcribe raw audio bytes (must be 16kHz Int16 PCM!)
40      const text = await RunAnywhere.transcribe(audioData)
41      return text
42    },
43    [isLoaded]
44  )
45
46  return {
47    isLoaded,
48    isLoading,
49    downloadProgress,
50    loadModel,
51    transcribe,
52  }
53}

Why loadSTTModel() instead of loadModel()? The SDK uses separate methods for each modality: loadModel() for LLMs, loadSTTModel() for speech-to-text, and loadTTSVoice() for text-to-speech. This reflects that each uses a different runtime (LlamaCPP vs ONNX) and can be loaded simultaneously without conflicts.

Complete STT Screen

Create src/screens/STTScreen.tsx:

typescript

1import React, { useState, useEffect } from 'react';
2import {
3  View,
4  Text,
5  TouchableOpacity,
6  StyleSheet,
7  ActivityIndicator,
8} from 'react-native';
9import { AudioService } from '../services/AudioService';
10import { useSTT } from '../hooks/useSTT';
11
12export function STTScreen() {
13  const [isRecording, setIsRecording] = useState(false);
14  const [isTranscribing, setIsTranscribing] = useState(false);
15  const [transcription, setTranscription] = useState('');
16
17  const { isLoaded, isLoading, downloadProgress, loadModel, transcribe } = useSTT();
18
19  useEffect(() => {
20    async function setup() {
21      await AudioService.initialize();
22      await loadModel();
23    }
24    setup();
25  }, [loadModel]);
26
27  async function toggleRecording() {
28    if (isRecording) {
29      await stopAndTranscribe();
30    } else {
31      await startRecording();
32    }
33  }
34
35  async function startRecording() {
36    try {
37      AudioService.startRecording();
38      setIsRecording(true);
39      setTranscription('');
40    } catch (e) {
41      console.error('Failed to start recording:', e);
42    }
43  }
44
45  async function stopAndTranscribe() {
46    setIsRecording(false);
47    setIsTranscribing(true);
48
49    try {
50      const audioData = await AudioService.stopRecording();
51      const text = await transcribe(audioData);
52      setTranscription(text);
53    } catch (e) {
54      setTranscription(`Error: ${e instanceof Error ? e.message : 'Unknown error'}`);
55    } finally {
56      setIsTranscribing(false);
57    }
58  }
59
60  if (isLoading) {
61    return (
62      <View style={styles.container}>
63        <Text style={styles.statusText}>
64          Downloading model... {(downloadProgress * 100).toFixed(0)}%
65        </Text>
66        <View style={styles.progressBar}>
67          <View style={[styles.progressFill, { width: `${downloadProgress * 100}%` }]} />
68        </View>
69      </View>
70    );
71  }
72
73  return (
74    <View style={styles.container}>
75      {/* Transcription display */}
76      <View style={styles.transcriptionBox}>
77        <Text style={styles.transcriptionText}>
78          {transcription || 'Tap the microphone to record...'}
79        </Text>
80      </View>
81
82      {/* Record button */}
83      <TouchableOpacity
84        style={[styles.recordButton, isRecording && styles.recordingActive]}
85        onPress={toggleRecording}
86        disabled={!isLoaded || isTranscribing}
87      >
88        <Text style={styles.recordButtonIcon}>
89          {isRecording ? '⬛' : '🎤'}
90        </Text>
91      </TouchableOpacity>
92
93      {isTranscribing && (
94        <View style={styles.transcribingRow}>
95          <ActivityIndicator size="small" color="#fff" />
96          <Text style={styles.transcribingText}>Transcribing...</Text>
97        </View>
98      )}
99    </View>
100  );
101}
102
103const styles = StyleSheet.create({
104  container: {
105    flex: 1,
106    backgroundColor: '#000',
107    padding: 24,
108    justifyContent: 'center',
109    alignItems: 'center',
110  },
111  statusText: {
112    color: '#fff',
113    fontSize: 16,
114    marginBottom: 16,
115  },
116  progressBar: {
117    width: '100%',
118    height: 8,
119    backgroundColor: '#333',
120    borderRadius: 4,
121    overflow: 'hidden',
122  },
123  progressFill: {
124    height: '100%',
125    backgroundColor: '#007AFF',
126  },
127  transcriptionBox: {
128    width: '100%',
129    minHeight: 100,
130    backgroundColor: '#111',
131    borderRadius: 12,
132    padding: 16,
133    marginBottom: 48,
134  },
135  transcriptionText: {
136    color: '#fff',
137    fontSize: 16,
138    lineHeight: 24,
139  },
140  recordButton: {
141    width: 100,
142    height: 100,
143    borderRadius: 50,
144    backgroundColor: '#007AFF',
145    justifyContent: 'center',
146    alignItems: 'center',
147  },
148  recordingActive: {
149    backgroundColor: '#ff4444',
150  },
151  recordButtonIcon: {
152    fontSize: 36,
153  },
154  transcribingRow: {
155    flexDirection: 'row',
156    alignItems: 'center',
157    marginTop: 24,
158  },
159  transcribingText: {
160    color: '#fff',
161    marginLeft: 8,
162  },
163});

Memory Management

When you're done with STT, unload the model to free memory:

typescript

1// Unload STT model to free memory
2await RunAnywhere.unloadSTTModel()

STT models can be loaded independently alongside the LLM—they don't conflict.

Speech-to-text recording interface on device

Models Reference

Model ID	Size	Notes
sherpa-onnx-whisper-tiny.en	~75MB	English, real-time capable

What's Next

In Part 3, we'll add text-to-speech with Piper, including audio playback across both platforms.

Resources

Questions? Open an issue on GitHub or reach out on Twitter/X.