RunAnywhere Flutter SDK Part 1: Chat with LLMs On-Device

Run LLMs Entirely On-Device with Flutter

This is Part 1 of our RunAnywhere Flutter SDK tutorial series:

Chat with LLMs (this post) — Project setup and streaming text generation
Speech-to-Text — Real-time transcription with Whisper
Text-to-Speech — Natural voice synthesis with Piper
Voice Pipeline — Full voice assistant with VAD

Flutter's "write once, run anywhere" promise meets on-device AI. With RunAnywhere, you can build cross-platform apps that run powerful language models directly on iOS and Android devices—no cloud, no API keys, complete privacy.

In this tutorial, we'll set up the Flutter SDK and build a streaming chat interface that works offline on both platforms.

This tutorial targets RunAnywhere 0.15.x (latest). The reference sample project is local_ai_playground; the code below matches the current SDK API.

Why On-Device AI?

Aspect	Cloud AI	On-Device AI
Privacy	Data sent to servers	Data stays on device
Latency	Network round-trip	Instant local processing
Offline	Requires internet	Works anywhere
Cost	Per-request billing	One-time download

For cross-platform apps handling sensitive data, on-device processing provides the privacy users expect across both iOS and Android.

Prerequisites

Flutter 3.10+ with Dart 3.0+
Xcode 14+ (for iOS builds)
Android Studio with SDK 24+ (for Android builds; matches minSdkVersion 24 below)
Physical device recommended (iOS or Android)
~250MB storage for the LLM model (Parts 2-4 add ~140MB more)

Project Setup

1. Create a New Flutter Project

bash

1flutter create local_ai_playground
2cd local_ai_playground

2. Add the RunAnywhere SDK

Add the following dependencies to your pubspec.yaml:

yaml

1dependencies:
2  flutter:
3    sdk: flutter
4  runanywhere: ^0.17.4
5  runanywhere_llamacpp: ^0.17.4
6  runanywhere_onnx: ^0.17.4
7  provider: ^6.0.0
8  # Audio recording & playback (used in Parts 2-4)
9  path_provider: ^2.1.0
10  record: ^5.1.0
11  audioplayers: ^6.0.0

Then run:

bash

1flutter pub get

pubspec.yaml with RunAnywhere dependencies

3. iOS Configuration

For iOS, add or ensure these lines in your existing ios/Podfile (inside the target 'Runner' do block for use_frameworks!; do not replace the entire file):

ruby

1platform :ios, '14.0'
2
3# Critical: Use static linking for RunAnywhere
4use_frameworks! :linkage => :static

Why static linking? RunAnywhere's native iOS libraries are distributed as static frameworks. The :linkage => :static flag tells CocoaPods to link them statically, avoiding "image not found" crashes at runtime. This is required for Flutter projects using RunAnywhere on iOS.

Then install pods:

bash

1cd ios && pod install && cd ..

4. Android Configuration

Set minSdk 24 (Android 7.0+) in your app-level build file—add or update that line in defaultConfig; don’t replace the whole file. Your project will have one or the other of the files below; only edit the one that exists (don’t update both).

text

1android/app/
2  ├── build.gradle       ← Use this snippet if you have this file (Groovy)
3  └── build.gradle.kts   ← Use this snippet if you have this file (Kotlin DSL)

Groovy — file: android/app/build.gradle

groovy

1android {
2    defaultConfig {
3        minSdkVersion 24  // Required for RunAnywhere (Android 7.0+)
4    }
5}

Kotlin DSL — file: android/app/build.gradle.kts

kotlin

1android {
2    defaultConfig {
3        minSdk = 24  // Required for RunAnywhere (Android 7.0+)
4    }
5}

If your project uses minSdk = flutter.minSdkVersion (or minSdkVersion flutter.minSdkVersion), replace or override it with 24, since Flutter’s default is 21 and the SDK needs 24.

Add these permissions to android/app/src/main/AndroidManifest.xml (merge with any existing <uses-permission> tags):

INTERNET — Required for downloading models in this part (and for any cloud fallback).
RECORD_AUDIO — Required for Parts 2–4 (Speech-to-Text and Voice). Safe to add now.

xml

1<uses-permission android:name="android.permission.INTERNET" />
2<uses-permission android:name="android.permission.RECORD_AUDIO" />

Sample app and setup notes

The local_ai_playground sample (and the Flutter Example App in the SDK repo) is aligned with RunAnywhere 0.15.x:

lib/features/chat/chat_view.dart — Chat UI with model download, load, and streaming generation using the current SDK API.
ios/Flutter/Profile.xcconfig — Ensures the iOS Profile build configuration includes CocoaPods settings and avoids the "CocoaPods did not set the base configuration" warning.
API migration — If you're coming from an older SDK, check the sample app's docs/ (e.g. RUNANYWHERE_API_UPDATES.md) or the repository for old vs new API snippets.

SDK Initialization

The SDK requires a specific initialization order. Create lib/app/app_initializer.dart:

dart

1import 'package:flutter/foundation.dart';
2import 'package:runanywhere/runanywhere.dart';
3import 'package:runanywhere_llamacpp/runanywhere_llamacpp.dart';
4import 'package:runanywhere_onnx/runanywhere_onnx.dart';
5
6class AppInitializer {
7  static Future<void> initialize() async {
8    try {
9      // Step 1: Initialize core SDK
10      await RunAnywhere.initialize();
11      debugPrint('SDK: RunAnywhere initialized');
12
13      // Step 2: Register backends BEFORE adding models
14      await LlamaCpp.register();
15      debugPrint('SDK: LlamaCpp backend registered');
16
17      await Onnx.register();
18      debugPrint('SDK: ONNX backend registered');
19
20      // Step 3: Register the LLM model
21      RunAnywhere.registerModel(
22        id: 'lfm2-350m-q4_k_m',
23        name: 'LiquidAI LFM2 350M',
24        url: 'https://huggingface.co/LiquidAI/LFM2-350M-GGUF/resolve/main/LFM2-350M-Q4_K_M.gguf',
25        framework: InferenceFramework.llamaCpp,
26        memoryRequirement: 250000000,
27      );
28
29      debugPrint('SDK: Model registered successfully');
30    } catch (e) {
31      debugPrint('SDK: Initialization failed: $e');
32      rethrow;
33    }
34  }
35}

Update lib/main.dart:

dart

1import 'package:flutter/material.dart';
2import 'app/app_initializer.dart';
3import 'features/chat/chat_view.dart';
4
5void main() async {
6  WidgetsFlutterBinding.ensureInitialized();
7
8  await AppInitializer.initialize();
9
10  runApp(const MyApp());
11}
12
13class MyApp extends StatelessWidget {
14  const MyApp({super.key});
15
16  @override
17  Widget build(BuildContext context) {
18    return MaterialApp(
19      title: 'Local AI Playground',
20      theme: ThemeData.dark(),
21      home: const ChatView(),
22    );
23  }
24}

Architecture Overview

text

1┌─────────────────────────────────────────────────────┐
2│                 RunAnywhere Core                     │
3│         (Unified API, Model Management)              │
4├───────────────────────┬─────────────────────────────┤
5│   LlamaCpp Backend    │      ONNX Backend           │
6│   ─────────────────   │   ─────────────────         │
7│   • Text Generation   │   • Speech-to-Text          │
8│   • Chat Completion   │   • Text-to-Speech          │
9│   • Streaming         │   • Voice Activity (VAD)    │
10└───────────────────────┴─────────────────────────────┘

Downloading & Loading Models

Create lib/services/model_service.dart:

dart

1import 'package:flutter/foundation.dart';
2import 'package:runanywhere/runanywhere.dart';
3
4class ModelService extends ChangeNotifier {
5  double _downloadProgress = 0.0;
6  bool _isDownloading = false;
7  bool _isModelLoaded = false;
8  String? _error;
9
10  double get downloadProgress => _downloadProgress;
11  bool get isDownloading => _isDownloading;
12  bool get isModelLoaded => _isModelLoaded;
13  String? get error => _error;
14
15  Future<void> downloadAndLoadModel(String modelId) async {
16    _isDownloading = true;
17    _error = null;
18    notifyListeners();
19
20    try {
21      // Check if already downloaded
22      final isDownloaded =
23          (await RunAnywhere.availableModels()).any((m) => m.id == modelId && m.localPath != null);
24
25      if (!isDownloaded) {
26        // Download with progress tracking
27        await for (final progress in RunAnywhere.downloadModel(modelId)) {
28          _downloadProgress = progress.percentage;
29          notifyListeners();
30
31          debugPrint('Download: ${(_downloadProgress * 100).toStringAsFixed(1)}%');
32
33          if (progress.state.isCompleted) break;
34        }
35      }
36
37      // Load into memory
38      await RunAnywhere.loadModel(modelId);
39
40      _isModelLoaded = true;
41      _isDownloading = false;
42      notifyListeners();
43
44      debugPrint('Model loaded successfully');
45    } catch (e) {
46      _error = e.toString();
47      _isDownloading = false;
48      notifyListeners();
49      debugPrint('Model error: $e');
50    }
51  }
52}

Note: Only one LLM model can be loaded at a time. Loading a different model automatically unloads the current one. The SDK uses loadModel() for LLMs; Parts 2-3 use loadSTTModel() and loadTTSVoice() for speech models—these use separate memory pools and can be loaded simultaneously.

Streaming Text Generation

Now for the fun part—generating text with your on-device LLM. Create lib/features/chat/chat_view.dart:

dart

1import 'package:flutter/material.dart';
2import 'package:runanywhere/runanywhere.dart';
3
4class ChatView extends StatefulWidget {
5  const ChatView({super.key});
6
7  @override
8  State<ChatView> createState() => _ChatViewState();
9}
10
11class _ChatViewState extends State<ChatView> {
12  final TextEditingController _controller = TextEditingController();
13  final List<ChatMessage> _messages = [];
14  bool _isGenerating = false;
15  bool _isModelLoaded = false;
16  double _downloadProgress = 0.0;
17
18  @override
19  void initState() {
20    super.initState();
21    _loadModel();
22  }
23
24  Future<void> _loadModel() async {
25    const modelId = 'lfm2-350m-q4_k_m';
26
27    final isDownloaded =
28        (await RunAnywhere.availableModels()).any((m) => m.id == modelId && m.localPath != null);
29
30    if (!isDownloaded) {
31      await for (final progress in RunAnywhere.downloadModel(modelId)) {
32        setState(() {
33          _downloadProgress = progress.percentage;
34        });
35        if (progress.state.isCompleted) break;
36      }
37    }
38
39    await RunAnywhere.loadModel(modelId);
40    setState(() {
41      _isModelLoaded = true;
42    });
43  }
44
45  Future<void> _sendMessage() async {
46    final text = _controller.text.trim();
47    if (text.isEmpty || _isGenerating) return;
48
49    _controller.clear();
50
51    setState(() {
52      _messages.add(ChatMessage(role: 'user', content: text));
53      _messages.add(ChatMessage(role: 'assistant', content: ''));
54      _isGenerating = true;
55    });
56
57    try {
58      final options = LLMGenerationOptions(
59        maxTokens: 256,
60        temperature: 0.7,
61      );
62
63      final streamResult = await RunAnywhere.generateStream(text, options: options);
64
65      String fullResponse = '';
66      await for (final token in streamResult.stream) {
67        fullResponse += token;
68        setState(() {
69          _messages.last = ChatMessage(role: 'assistant', content: fullResponse);
70        });
71      }
72
73      // Get final metrics
74      final metrics = await streamResult.result;
75      debugPrint('Speed: ${metrics.tokensPerSecond.toStringAsFixed(1)} tok/s');
76
77    } catch (e) {
78      setState(() {
79        _messages.last = ChatMessage(
80          role: 'assistant',
81          content: 'Error: ${e.toString()}',
82        );
83      });
84    } finally {
85      setState(() {
86        _isGenerating = false;
87      });
88    }
89  }
90
91  @override
92  Widget build(BuildContext context) {
93    return Scaffold(
94      appBar: AppBar(
95        title: const Text('On-Device Chat'),
96      ),
97      body: Column(
98        children: [
99          if (!_isModelLoaded)
100            LinearProgressIndicator(value: _downloadProgress),
101
102          Expanded(
103            child: ListView.builder(
104              padding: const EdgeInsets.all(16),
105              itemCount: _messages.length,
106              itemBuilder: (context, index) {
107                final message = _messages[index];
108                return MessageBubble(message: message);
109              },
110            ),
111          ),
112
113          Padding(
114            padding: const EdgeInsets.all(16),
115            child: Row(
116              children: [
117                Expanded(
118                  child: TextField(
119                    controller: _controller,
120                    decoration: const InputDecoration(
121                      hintText: 'Type a message...',
122                      border: OutlineInputBorder(),
123                    ),
124                    enabled: _isModelLoaded && !_isGenerating,
125                    onSubmitted: (_) => _sendMessage(),
126                  ),
127                ),
128                const SizedBox(width: 8),
129                IconButton(
130                  icon: Icon(_isGenerating ? Icons.stop : Icons.send),
131                  onPressed: _isModelLoaded && !_isGenerating ? _sendMessage : null,
132                ),
133              ],
134            ),
135          ),
136        ],
137      ),
138    );
139  }
140}
141
142class ChatMessage {
143  final String role;
144  final String content;
145
146  ChatMessage({required this.role, required this.content});
147}
148
149class MessageBubble extends StatelessWidget {
150  final ChatMessage message;
151
152  const MessageBubble({super.key, required this.message});
153
154  @override
155  Widget build(BuildContext context) {
156    final isUser = message.role == 'user';
157
158    return Align(
159      alignment: isUser ? Alignment.centerRight : Alignment.centerLeft,
160      child: Container(
161        margin: const EdgeInsets.symmetric(vertical: 4),
162        padding: const EdgeInsets.all(12),
163        constraints: BoxConstraints(
164          maxWidth: MediaQuery.of(context).size.width * 0.75,
165        ),
166        decoration: BoxDecoration(
167          color: isUser ? Colors.blue : Colors.grey[800],
168          borderRadius: BorderRadius.circular(12),
169        ),
170        child: Text(
171          message.content.isEmpty ? '...' : message.content,
172          style: const TextStyle(color: Colors.white),
173        ),
174      ),
175    );
176  }
177}

Non-Streaming Generation

For simpler use cases, you can also use non-streaming generation:

dart

1final result = await RunAnywhere.generate(
2  prompt,
3  options: LLMGenerationOptions(maxTokens: 256),
4);
5
6print('Response: ${result.text}');
7print('Speed: ${result.tokensPerSecond} tok/s');

Models Reference

Model ID	Size	Notes
lfm2-350m-q4_k_m	~250MB	LiquidAI LFM2, fast, efficient

Completed Chat screen

Chat interface with streaming response on device

Troubleshooting

Issue	Solution
CocoaPods install failure	Run ,[object Object], first, ensure Xcode 14+
iOS crash on launch ("image not found")	Ensure ,[object Object], in Podfile, then ,[object Object]
Android Gradle sync fails	Ensure ,[object Object], in build.gradle, JDK 17+
Model download hangs	Check ,[object Object], permission in AndroidManifest.xml
[object Object], fails on RunAnywhere	Ensure you're using Flutter 3.10+ and Dart 3.0+

What's Next

In Part 2, we'll add speech-to-text capabilities using Whisper, including the audio format handling that's critical for accurate transcription.

Resources

RunAnywhere Documentation
SDK Repository
Flutter Example App — Kept in sync with the latest SDK (0.15.x); the README and docs/ folder include setup notes and API migration details.

Questions? Open an issue on GitHub or reach out on Twitter/X.

RunAnywhere Flutter SDK Part 1: Chat with LLMs On-Device

Why On-Device AI?

Prerequisites

Project Setup

1. Create a New Flutter Project

2. Add the RunAnywhere SDK

3. iOS Configuration

4. Android Configuration

Sample app and setup notes

SDK Initialization

Architecture Overview

Downloading & Loading Models

Streaming Text Generation

Non-Streaming Generation

Models Reference

Completed Chat screen

Troubleshooting

What's Next

Resources

Frequently Asked Questions

Does the app work offline after the model is downloaded?

What devices are supported?

Can I add RunAnywhere to an existing Flutter project?

Why do I need both runanywhere_llamacpp and runanywhere_onnx?

Why do I need use_frameworks! :linkage => :static in my Podfile?

Do I need a physical iOS device?

My project uses minSdk = flutter.minSdkVersion. Do I need to change it?

Can I use an Android emulator?

Why does the SDK initialization order matter?

The model download seems stuck. What should I do?

What is the difference between generateStream and generate?

The model responses seem short or low quality. Is that expected?

I get "image not found" when launching on iOS.

Android Gradle sync fails.

How much RAM does the model use?

Can I use a different or larger model?