RunAnywhere Flutter SDK Part 4: Building a Voice Assistant with VAD

A Complete Voice Assistant Running Entirely On-Device

This is Part 4 of our RunAnywhere Flutter SDK tutorial series:

Chat with LLMs — Project setup and streaming text generation
Speech-to-Text — Real-time transcription with Whisper
Text-to-Speech — Natural voice synthesis with Piper
Voice Pipeline (this post) — Full voice assistant with VAD

This is the culmination of the series: a voice assistant that automatically detects when you stop speaking, processes your request with an LLM, and responds with synthesized speech—all running on-device across iOS and Android.

The key feature is Voice Activity Detection (VAD): the assistant knows when you've finished speaking without requiring a button press.

Prerequisites

Complete Parts 1-3 to have all three model types (LLM, STT, TTS) working in your project
Physical device required — the pipeline uses microphone input
All three models downloaded (~390MB total: 250 + 75 + 65)

The Voice Pipeline Flow

text

1┌─────────────────────────────────────────────────────────────────┐
2│                     Voice Assistant Pipeline                      │
3├─────────────────────────────────────────────────────────────────┤
4│                                                                   │
5│   ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐      │
6│   │  Record │ -> │   STT   │ -> │   LLM   │ -> │   TTS   │      │
7│   │  + VAD  │    │ Whisper │    │  LFM2   │    │  Piper  │      │
8│   └─────────┘    └─────────┘    └─────────┘    └─────────┘      │
9│       │                                              │           │
10│       │          Auto-stop when                      │           │
11│       └──────────  silence detected  ────────────────┘           │
12│                                                                   │
13└─────────────────────────────────────────────────────────────────┘

Pipeline State Machine

Create lib/features/voice/voice_pipeline.dart:

dart

1import 'dart:async';
2import 'dart:typed_data';
3import 'package:flutter/foundation.dart';
4import 'package:runanywhere/runanywhere.dart';
5import '../../services/audio_recording_service.dart';
6import '../../services/audio_playback_service.dart';
7
8enum PipelineState {
9  idle,
10  listening,
11  transcribing,
12  thinking,
13  speaking,
14}
15
16class VoicePipeline extends ChangeNotifier {
17  final AudioRecordingService _audioService = AudioRecordingService();
18  final AudioPlaybackService _playbackService = AudioPlaybackService();
19
20  PipelineState _state = PipelineState.idle;
21  String _transcribedText = '';
22  String _responseText = '';
23  String? _errorMessage;
24  Timer? _vadTimer;
25
26  // VAD thresholds (tune these for your environment)
27  static const double speechThreshold = 0.02;   // Level to detect speech start
28  static const double silenceThreshold = 0.01;  // Level to detect speech end
29  static const double silenceDuration = 1.5;    // Seconds of silence before auto-stop
30
31  // VAD state
32  bool _isSpeechDetected = false;
33  DateTime? _silenceStartTime;
34
35  PipelineState get state => _state;
36  String get transcribedText => _transcribedText;
37  String get responseText => _responseText;
38  String? get errorMessage => _errorMessage;
39
40  Future<void> start() async {
41    if (_state != PipelineState.idle) return;
42
43    // Ensure all models are loaded
44    final isReady = await _isReady();
45    if (!isReady) {
46      _errorMessage = 'Models not loaded. Please load LLM, STT, and TTS first.';
47      notifyListeners();
48      return;
49    }
50
51    _state = PipelineState.listening;
52    _transcribedText = '';
53    _responseText = '';
54    _errorMessage = null;
55    notifyListeners();
56
57    try {
58      await _audioService.startRecording();
59
60      // Start energy-based VAD monitoring
61      _startVADMonitoring();
62
63    } catch (e) {
64      _errorMessage = e.toString();
65      _state = PipelineState.idle;
66      notifyListeners();
67    }
68  }
69
70  void _startVADMonitoring() {
71    _isSpeechDetected = false;
72    _silenceStartTime = null;
73
74    _vadTimer = Timer.periodic(
75      const Duration(milliseconds: 100),
76      (_) => _checkAudioLevel(),
77    );
78  }
79
80  void _checkAudioLevel() {
81    final amplitude = _audioService.getAmplitude();
82
83    // Detect speech start
84    if (!_isSpeechDetected && amplitude > speechThreshold) {
85      _isSpeechDetected = true;
86      _silenceStartTime = null;
87      debugPrint('Speech detected');
88    }
89
90    // Detect speech end (only after speech was detected)
91    if (_isSpeechDetected) {
92      if (amplitude < silenceThreshold) {
93        _silenceStartTime ??= DateTime.now();
94
95        final elapsed = DateTime.now().difference(_silenceStartTime!).inMilliseconds;
96        if (elapsed >= (silenceDuration * 1000).toInt()) {
97          debugPrint('Auto-stopping after silence');
98          _stopVADMonitoring();
99          _processRecording();
100        }
101      } else {
102        _silenceStartTime = null; // Speech resumed
103      }
104    }
105  }
106
107  void _stopVADMonitoring() {
108    _vadTimer?.cancel();
109    _vadTimer = null;
110  }
111
112  Future<void> stopManually() async {
113    _stopVADMonitoring();
114    await _processRecording();
115  }
116
117  Future<void> _processRecording() async {
118    if (_state != PipelineState.listening) return;
119
120    // 1. Stop recording and get audio
121    _state = PipelineState.transcribing;
122    notifyListeners();
123
124    try {
125      final audioData = await _audioService.stopRecording();
126
127      if (audioData == null || audioData.isEmpty) {
128        _state = PipelineState.idle;
129        notifyListeners();
130        return;
131      }
132
133      // 2. Transcribe
134      final text = await RunAnywhere.transcribe(audioData);
135      _transcribedText = text;
136      notifyListeners();
137
138      if (text.trim().isEmpty) {
139        _state = PipelineState.idle;
140        notifyListeners();
141        return;
142      }
143
144      // 3. Generate LLM response
145      _state = PipelineState.thinking;
146      notifyListeners();
147
148      final prompt = '''
149You are a helpful voice assistant. Keep responses SHORT (2-3 sentences max).
150Be conversational and friendly.
151
152User: $text
153Assistant:''';
154
155      final options = LLMGenerationOptions(
156        maxTokens: 100,
157        temperature: 0.7,
158      );
159
160      final streamResult = await RunAnywhere.generateStream(prompt, options: options);
161
162      String response = '';
163      await for (final token in streamResult.stream) {
164        response += token;
165        _responseText = response;
166        notifyListeners();
167      }
168
169      // 4. Speak the response
170      _state = PipelineState.speaking;
171      notifyListeners();
172
173      final ttsResult = await RunAnywhere.synthesize(
174        response,
175        rate: 1.0,
176        pitch: 1.0,
177        volume: 1.0,
178      );
179
180      await _playbackService.playFloat32Audio(
181        ttsResult.samples,
182        ttsResult.sampleRate,
183      );
184
185      // Wait for audio to finish (approximate)
186      await Future.delayed(Duration(
187        milliseconds: (ttsResult.duration * 1000).toInt() + 500,
188      ));
189
190    } catch (e) {
191      debugPrint('Pipeline error: $e');
192      _errorMessage = e.toString();
193    }
194
195    _state = PipelineState.idle;
196    notifyListeners();
197  }
198
199  Future<bool> _isReady() async {
200    return RunAnywhere.isModelLoaded &&
201           RunAnywhere.isSTTModelLoaded &&
202           RunAnywhere.isTTSVoiceLoaded;
203  }
204
205  @override
206  void dispose() {
207    _stopVADMonitoring();
208    _audioService.dispose();
209    super.dispose();
210  }
211}

Voice Pipeline UI

Create lib/features/voice/voice_assistant_view.dart:

dart

1import 'package:flutter/material.dart';
2import 'package:provider/provider.dart';
3import 'voice_pipeline.dart';
4
5class VoiceAssistantView extends StatelessWidget {
6  const VoiceAssistantView({super.key});
7
8  @override
9  Widget build(BuildContext context) {
10    return ChangeNotifierProvider(
11      create: (_) => VoicePipeline(),
12      child: const _VoiceAssistantContent(),
13    );
14  }
15}
16
17class _VoiceAssistantContent extends StatelessWidget {
18  const _VoiceAssistantContent();
19
20  @override
21  Widget build(BuildContext context) {
22    final pipeline = context.watch<VoicePipeline>();
23
24    return Scaffold(
25      appBar: AppBar(
26        title: const Text('Voice Assistant'),
27      ),
28      body: Padding(
29        padding: const EdgeInsets.all(24),
30        child: Column(
31          children: [
32            // State indicator
33            _StateIndicator(state: pipeline.state),
34
35            const SizedBox(height: 24),
36
37            // Error message
38            if (pipeline.errorMessage != null)
39              Container(
40                padding: const EdgeInsets.all(12),
41                decoration: BoxDecoration(
42                  color: Colors.red.withOpacity(0.1),
43                  borderRadius: BorderRadius.circular(8),
44                ),
45                child: Text(
46                  pipeline.errorMessage!,
47                  style: const TextStyle(color: Colors.red),
48                ),
49              ),
50
51            // Transcription
52            if (pipeline.transcribedText.isNotEmpty)
53              _ConversationBubble(
54                label: 'You said:',
55                text: pipeline.transcribedText,
56                color: Colors.blue,
57              ),
58
59            const SizedBox(height: 16),
60
61            // Response
62            if (pipeline.responseText.isNotEmpty)
63              _ConversationBubble(
64                label: 'Assistant:',
65                text: pipeline.responseText,
66                color: Colors.green,
67              ),
68
69            const Spacer(),
70
71            // Main button
72            _MainButton(
73              state: pipeline.state,
74              onPressed: () {
75                if (pipeline.state == PipelineState.idle) {
76                  pipeline.start();
77                } else if (pipeline.state == PipelineState.listening) {
78                  pipeline.stopManually();
79                }
80              },
81            ),
82
83            const SizedBox(height: 16),
84
85            Text(
86              _getStateHint(pipeline.state),
87              style: TextStyle(
88                color: Colors.grey[600],
89                fontSize: 12,
90              ),
91            ),
92          ],
93        ),
94      ),
95    );
96  }
97
98  String _getStateHint(PipelineState state) {
99    switch (state) {
100      case PipelineState.idle:
101        return 'Tap to start';
102      case PipelineState.listening:
103        return 'Stops automatically when you pause';
104      case PipelineState.transcribing:
105        return 'Converting speech to text...';
106      case PipelineState.thinking:
107        return 'Generating response...';
108      case PipelineState.speaking:
109        return 'Playing audio response...';
110    }
111  }
112}
113
114class _StateIndicator extends StatelessWidget {
115  final PipelineState state;
116
117  const _StateIndicator({required this.state});
118
119  @override
120  Widget build(BuildContext context) {
121    return Row(
122      mainAxisAlignment: MainAxisAlignment.center,
123      children: [
124        Container(
125          width: 12,
126          height: 12,
127          decoration: BoxDecoration(
128            shape: BoxShape.circle,
129            color: _getStateColor(),
130          ),
131        ),
132        const SizedBox(width: 8),
133        Text(
134          _getStateText(),
135          style: const TextStyle(
136            fontSize: 16,
137            fontWeight: FontWeight.w500,
138          ),
139        ),
140      ],
141    );
142  }
143
144  Color _getStateColor() {
145    switch (state) {
146      case PipelineState.idle:
147        return Colors.grey;
148      case PipelineState.listening:
149        return Colors.red;
150      case PipelineState.transcribing:
151      case PipelineState.thinking:
152        return Colors.orange;
153      case PipelineState.speaking:
154        return Colors.green;
155    }
156  }
157
158  String _getStateText() {
159    switch (state) {
160      case PipelineState.idle:
161        return 'Ready';
162      case PipelineState.listening:
163        return 'Listening...';
164      case PipelineState.transcribing:
165        return 'Transcribing...';
166      case PipelineState.thinking:
167        return 'Thinking...';
168      case PipelineState.speaking:
169        return 'Speaking...';
170    }
171  }
172}
173
174class _ConversationBubble extends StatelessWidget {
175  final String label;
176  final String text;
177  final Color color;
178
179  const _ConversationBubble({
180    required this.label,
181    required this.text,
182    required this.color,
183  });
184
185  @override
186  Widget build(BuildContext context) {
187    return Container(
188      width: double.infinity,
189      padding: const EdgeInsets.all(16),
190      decoration: BoxDecoration(
191        color: color.withOpacity(0.1),
192        borderRadius: BorderRadius.circular(12),
193      ),
194      child: Column(
195        crossAxisAlignment: CrossAxisAlignment.start,
196        children: [
197          Text(
198            label,
199            style: TextStyle(
200              color: Colors.grey[600],
201              fontSize: 12,
202            ),
203          ),
204          const SizedBox(height: 4),
205          Text(
206            text,
207            style: const TextStyle(fontSize: 16),
208          ),
209        ],
210      ),
211    );
212  }
213}
214
215class _MainButton extends StatelessWidget {
216  final PipelineState state;
217  final VoidCallback onPressed;
218
219  const _MainButton({
220    required this.state,
221    required this.onPressed,
222  });
223
224  @override
225  Widget build(BuildContext context) {
226    final isActive = state == PipelineState.idle || state == PipelineState.listening;
227
228    return GestureDetector(
229      onTap: isActive ? onPressed : null,
230      child: Container(
231        width: 80,
232        height: 80,
233        decoration: BoxDecoration(
234          shape: BoxShape.circle,
235          color: state == PipelineState.idle ? Colors.blue : Colors.red,
236        ),
237        child: Icon(
238          state == PipelineState.idle ? Icons.mic : Icons.stop,
239          size: 36,
240          color: Colors.white,
241        ),
242      ),
243    );
244  }
245}

Best Practices

1. Preload Models During Onboarding

dart

1// Download and load all models sequentially
2await modelService.downloadAndLoadLLM();
3await modelService.downloadAndLoadSTT();
4await modelService.downloadAndLoadTTS();

2. Handle Memory Pressure

dart

1// Unload when not needed
2await RunAnywhere.unloadModel();
3await RunAnywhere.unloadSTTModel();
4await RunAnywhere.unloadTTSVoice();

3. Audio Format Summary

Component	Sample Rate	Format	Channels
Recording	16,000 Hz	Int16	1
Whisper STT	16,000 Hz	Int16	1
Piper TTS Output	22,050 Hz	Float32	1
Audio Playback	Any	WAV/Int16	1-2

Always match audio formats!

4. Prevent Concurrent Operations

dart

1Future<void> start() async {
2  if (_state != PipelineState.idle) return;  // Prevent double-starts
3  // ...
4}

5. Tune VAD for Your Environment

The default thresholds work for quiet environments. Adjust for noisy settings:

dart

1static const double speechThreshold = 0.05;   // Higher for noisy environments
2static const double silenceThreshold = 0.02;  // Higher for noisy environments
3static const double silenceDuration = 2.0;    // Longer pause tolerance

6. Check Model State Before Operations

dart

1bool get isVoiceAgentReady {
2  return RunAnywhere.isModelLoaded &&
3         RunAnywhere.isSTTModelLoaded &&
4         RunAnywhere.isTTSVoiceLoaded;
5}

Models Reference

Type	Model ID	Size	Notes
LLM	lfm2-350m-q4_k_m	~250MB	LiquidAI, fast, efficient
STT	sherpa-onnx-whisper-tiny.en	~75MB	English
TTS	vits-piper-en_US-lessac-medium	~65MB	US English

Completed Voice Assistant screen

Voice Assistant app fully set up with conversation bubbles, speaking status, and audio playback

Conclusion

You've built a complete voice assistant that:

Listens with automatic speech detection
Transcribes using on-device Whisper
Thinks with a local LLM
Responds with natural TTS

All processing happens on-device. No data ever leaves the phone. No API keys. No cloud costs. And it works identically on both iOS and Android.

This is the future of private, cross-platform AI applications.

Complete Source Code

The full source code is available on GitHub:

Flutter Starter App

Includes:

Complete Flutter app with all features
Provider-based state management
Platform-specific audio handling
Reusable components and design system

Built-in VoiceSession API

For a higher-level API, RunAnywhere also provides a built-in VoiceSession that handles the full pipeline with events:

dart

1final session = await RunAnywhere.startVoiceSession(
2  config: VoiceSessionConfig(
3    autoDetectSilence: true,
4    silenceThreshold: 1.5,
5  ),
6);
7
8session.events.listen((event) {
9  if (event is VoiceSessionTranscribed) {
10    debugPrint('User said: ${event.text}');
11  } else if (event is VoiceSessionResponded) {
12    debugPrint('AI response: ${event.text}');
13  }
14});

This is useful when you want the SDK to manage the full STT, LLM, and TTS pipeline for you without implementing each step manually.

Resources

Questions? Open an issue on GitHub or reach out on Twitter/X.

RunAnywhere Flutter SDK Part 4: Building a Voice Assistant with VAD

Prerequisites

The Voice Pipeline Flow

Pipeline State Machine

Voice Pipeline UI

Best Practices

1. Preload Models During Onboarding

2. Handle Memory Pressure

3. Audio Format Summary

4. Prevent Concurrent Operations

5. Tune VAD for Your Environment

6. Check Model State Before Operations

Models Reference

Completed Voice Assistant screen

Conclusion

Complete Source Code

Built-in VoiceSession API

Resources

Frequently Asked Questions

"Models not loaded" when I tap to start—what do I need to do?

In what order should I load the LLM, STT, and TTS models?

AudioRecordingService has no getAmplitude()—how do I implement VAD?

VAD stops too early or keeps recording after I stop speaking.

I hear no audio during "Playing audio response...".

Where does the provider package come from?

Does the voice pipeline work on simulator/emulator?

How much RAM does the full pipeline use with all three models?

The ttsResult.duration causes an error—what is the correct property?