RunAnywhere Swift SDK
Production-grade, on-device AI platform for iOS and macOS applications
Introduction
The RunAnywhere Swift SDK is a production-grade, on-device AI platform for iOS and macOS applications. It provides a unified API for running AI models locally on Apple devices, offering:
- •LLM (Large Language Model) – Text generation with streaming support
- •STT (Speech-to-Text) – Audio transcription with multiple backends
- •TTS (Text-to-Speech) – Neural and system voice synthesis
- •VAD (Voice Activity Detection) – Real-time speech detection
- •Voice Agent – Complete voice pipeline orchestration
Core Philosophy
- On-Device First: All AI inference runs locally, ensuring low latency and data privacy.
- Plugin Architecture: Backend engines are optional modules—include only what you need.
- Privacy by Design: Audio and text data never leaves the device unless explicitly configured.
- Event-Driven: Subscribe to SDK events for reactive UI updates and observability.
Getting Started
Installation
Add RunAnywhere to your project using Swift Package Manager. Add the following to your Package.swift:
dependencies: [
.package(url: "https://github.com/RunanywhereAI/runanywhere-sdks.git", from: "0.1.0")
]Add the products you need to your target dependencies:
.target(
name: "YourApp",
dependencies: [
.product(name: "RunAnywhere", package: "runanywhere-swift"),
.product(name: "RunAnywhereLlamaCPP", package: "runanywhere-swift"), // For LLM
.product(name: "RunAnywhereONNX", package: "runanywhere-swift"), // For STT/TTS
]
)Quick Start
Here's a complete example to get you started with text generation:
import RunAnywhere
import LlamaCPPRuntime
// 1. Initialize SDK at app launch
@main
struct MyApp: App {
init() {
do {
try RunAnywhere.initialize(
apiKey: "your-api-key",
baseURL: "https://api.runanywhere.ai",
environment: .production
)
// Register LlamaCPP module
Task { @MainActor in
ModuleRegistry.shared.register(LlamaCPP.self)
}
} catch {
print("SDK init failed: \(error)")
}
}
var body: some Scene {
WindowGroup {
ContentView()
}
}
}
// 2. Use in a SwiftUI view
struct ContentView: View {
@State private var response = ""
@State private var isLoading = false
var body: some View {
VStack {
Text(response)
.padding()
Button("Generate") {
Task {
await generateText()
}
}
.disabled(isLoading)
}
}
func generateText() async {
isLoading = true
defer { isLoading = false }
do {
// Load model if not already loaded
if await !RunAnywhere.isModelLoaded {
try await RunAnywhere.loadModel("my-llama-model")
}
// Generate text
let result = try await RunAnywhere.generate(
"Explain quantum computing in simple terms",
options: LLMGenerationOptions(
maxTokens: 200,
temperature: 0.7
)
)
response = result.text
print("Generated in \(result.latencyMs)ms at \(result.tokensPerSecond) tok/s")
} catch {
response = "Error: \(error.localizedDescription)"
}
}
}Initialization
Initialize the SDK once at app launch with your configuration:
try RunAnywhere.initialize(
apiKey: "your-api-key", // Required (can be empty for development)
baseURL: "https://api.runanywhere.ai",
environment: .production // .development | .staging | .production
)| Environment | Log Level | Telemetry | Mock Data |
|---|---|---|---|
| .development | Debug | Yes | Yes |
| .staging | Info | Yes | No |
| .production | Warning | Yes | No |
LLM (Language Models)
The SDK provides both simple and advanced text generation APIs with streaming support.
chat()
Simple one-liner for quick text responses:
// One-liner for quick responses
let response = try await RunAnywhere.chat("What is the capital of France?")
print(response) // "The capital of France is Paris."generate()
Full generation with detailed metrics and options:
let result = try await RunAnywhere.generate(
"Write a haiku about Swift programming",
options: LLMGenerationOptions(
maxTokens: 50,
temperature: 1.0,
topP: 0.9,
stopSequences: ["###"]
)
)
print("Response: \(result.text)")
print("Model: \(result.modelUsed)")
print("Tokens: \(result.tokensUsed)")
print("Speed: \(result.tokensPerSecond) tok/s")
print("Latency: \(result.latencyMs)ms")
// For reasoning models
if let thinking = result.thinkingContent {
print("Reasoning: \(thinking)")
}generateStream()
Stream tokens as they are generated for better UX:
let streamResult = try await RunAnywhere.generateStream(
"Tell me a story",
options: LLMGenerationOptions(maxTokens: 500)
)
// Display tokens as they arrive
for try await token in streamResult.stream {
print(token, terminator: "")
textView.text += token
}
// Get final metrics after streaming completes
let metrics = try await streamResult.result.value
print("\n\nGenerated \(metrics.tokensUsed) tokens")System Prompts
Configure model behavior with system prompts:
let options = LLMGenerationOptions(
maxTokens: 200,
systemPrompt: """
You are a senior Swift developer.
Answer questions with code examples.
Use modern Swift conventions.
"""
)
let result = try await RunAnywhere.generate(
"How do I parse JSON in Swift?",
options: options
)STT (Speech-to-Text)
Transcribe audio with support for multiple backends, languages, and streaming.
transcribe()
Basic transcription from audio data or buffer:
// From Data
let audioData = try Data(contentsOf: audioFileURL)
let text = try await RunAnywhere.transcribe(audioData)
// With options
let output = try await RunAnywhere.transcribeWithOptions(
audioData,
options: STTOptions(
language: "en",
enableTimestamps: true,
enablePunctuation: true
)
)
print("Text: \(output.text)")
if let segments = output.segments {
for segment in segments {
print("[\(segment.startTime)-\(segment.endTime)]: \(segment.text)")
}
}
// From Audio Buffer
import AVFoundation
let buffer: AVAudioPCMBuffer = ... // From microphone or file
let output = try await RunAnywhere.transcribeBuffer(buffer, language: "en")transcribeStream()
Stream audio for real-time transcription:
// Create audio stream (e.g., from microphone)
let audioStream: AsyncStream<Data> = microphoneManager.audioStream
let transcriptionStream = try await RunAnywhere.transcribeStream(
audioStream,
options: STTOptions(language: "en")
)
for try await partialText in transcriptionStream {
transcriptionLabel.text = partialText
}STT Options
Configure transcription behavior:
let options = STTOptions(
language: "en", // BCP-47 code
detectLanguage: true, // Auto-detect spoken language
enablePunctuation: true, // Add punctuation
enableDiarization: false, // Identify speakers
maxSpeakers: 2, // Max speakers to identify
enableTimestamps: true, // Word-level timestamps
vocabularyFilter: ["RunAnywhere", "LlamaCPP"],
sampleRate: 16000 // Input sample rate
)TTS (Text-to-Speech)
Synthesize speech from text with neural voices and customization options.
synthesize()
Basic text-to-speech synthesis:
// Load a TTS voice
try await RunAnywhere.loadTTSVoice("piper-en-us-amy")
// Synthesize
let output = try await RunAnywhere.synthesize(
"Hello, welcome to RunAnywhere!",
options: TTSOptions(rate: 1.0, pitch: 1.0)
)
// Play audio
import AVFoundation
let player = try AVAudioPlayer(data: output.audioData)
player.play()synthesizeStream()
Stream audio synthesis for lower latency:
let audioStream = await RunAnywhere.synthesizeStream(
"This is a very long text that will be synthesized in chunks...",
options: TTSOptions()
)
for try await audioChunk in audioStream {
// Play chunks as they arrive for lower latency
audioQueue.enqueue(audioChunk)
}
// Stop synthesis if needed
await RunAnywhere.stopSynthesis()Available Voices
List and select from available TTS voices:
let voices = await RunAnywhere.availableTTSVoices
for voice in voices {
print(voice)
}
// Use specific voice
let options = TTSOptions(
voice: "en-US-Neural2-F",
rate: 0.8, // 0.0-2.0, slower speed
pitch: 1.3, // 0.0-2.0, higher pitch
volume: 1.0 // 0.0-1.0
)VAD (Voice Activity Detection)
Detect when speech starts and stops in an audio stream for voice-activated interfaces.
// Initialize VAD with custom configuration
let config = VADConfiguration(
energyThreshold: 0.02,
enableAutoCalibration: true,
calibrationMultiplier: 2.5
)
try await RunAnywhere.initializeVAD(config)
// Detect speech in audio
import AVFoundation
let buffer: AVAudioPCMBuffer = ... // From microphone
let result = try await RunAnywhere.detectSpeech(in: buffer)
if result.isSpeech {
print("Speech detected!")
}
// Speech Activity Callback
await RunAnywhere.setVADSpeechActivityCallback { event in
switch event {
case .speechStarted:
print("User started speaking")
startRecording()
case .speechEnded(let duration):
print("User stopped speaking after \(duration)s")
stopRecording()
}
}
// Start VAD processing
await RunAnywhere.startVAD()
// Cleanup when done
await RunAnywhere.stopVAD()
await RunAnywhere.cleanupVAD()Voice Agent (Full Pipeline)
The Voice Agent orchestrates the complete voice interaction pipeline: VAD → STT → LLM → TTS.
// Initialize Voice Agent
try await RunAnywhere.initializeVoiceAgent(
sttModelId: "whisper-base-onnx",
llmModelId: "llama-3.2-1b-q4",
ttsVoice: "piper-en-us-amy"
)
// Process complete voice turn
let audioData = audioRecorder.capturedAudio
let result = try await RunAnywhere.processVoiceTurn(audioData)
if result.speechDetected {
print("User said: \(result.transcription ?? "")")
print("AI response: \(result.response ?? "")")
// Play synthesized response
if let audio = result.synthesizedAudio {
audioPlayer.play(audio)
}
}
// Stream voice processing
let audioStream: AsyncStream<Data> = microphoneManager.audioStream
let eventStream = await RunAnywhere.processVoiceStream(audioStream)
for try await event in eventStream {
switch event {
case .vadTriggered(let isSpeaking):
updateMicrophoneUI(isActive: isSpeaking)
case .transcriptionAvailable(let text):
transcriptionLabel.text = text
case .responseGenerated(let response):
responseLabel.text = response
case .audioSynthesized(let audio):
audioPlayer.play(audio)
case .error(let error):
showError(error)
case .processed(let result):
// Complete turn finished
break
}
}Configuration
Detailed configuration options for all SDK features.
LLM Generation Options
public struct LLMGenerationOptions: Sendable {
let maxTokens: Int // Default: 100
let temperature: Float // 0.0-2.0, default: 0.8
let topP: Float // 0.0-1.0, default: 1.0
let stopSequences: [String] // Stop generation at these strings
let streamingEnabled: Bool // Enable token-by-token streaming
let systemPrompt: String? // System prompt for behavior
}Memory Considerations
| Model Type | Typical Memory | Notes |
|---|---|---|
| LLM Q4 (1B) | 1-2 GB | Suitable for all devices |
| LLM Q4 (3B) | 3-4 GB | iPhone Pro / iPad |
| LLM Q4 (7B) | 6-8 GB | M1+ Macs only |
| STT (Whisper Base) | 150 MB | Universal |
| TTS (Piper) | 50-100 MB | Universal |
Error Handling
All SDK errors conform to RunAnywhereError with detailed messages and recovery suggestions.
do {
try await RunAnywhere.loadModel("my-model")
let result = try await RunAnywhere.generate("Hello")
} catch RunAnywhereError.notInitialized {
showError("Please restart the app")
} catch RunAnywhereError.modelNotFound(let modelId) {
showError("Model '\(modelId)' not found. Please download it first.")
} catch RunAnywhereError.modelLoadFailed(let modelId, let underlying) {
print("Failed to load \(modelId): \(underlying?.localizedDescription ?? "unknown")")
} catch RunAnywhereError.generationTimeout(_) {
showError("Request timed out. Try a shorter prompt.")
} catch RunAnywhereError.insufficientStorage(let required, let available) {
let formatter = ByteCountFormatter()
showError("Need \(formatter.string(fromByteCount: required)), only \(formatter.string(fromByteCount: available)) available")
} catch {
showError(error.localizedDescription)
}Error Categories
.initialization
notInitialized, invalidAPIKey
.model
modelNotFound, modelLoadFailed
.generation
generationFailed, contextTooLong
.storage
insufficientStorage, storageFull
Best Practices
Memory Management
- •Only load one LLM at a time to conserve memory
- •Unload models when switching tasks or backgrounding the app
- •Monitor
ModelInfo.memoryRequiredbefore loading - •Use quantized models (Q4) for better performance on mobile devices
Performance Optimization
- •Preload commonly used models at app launch for faster response times
- •Use streaming for long text generations to improve perceived performance
- •Set appropriate maxTokens to balance response quality and speed
- •Test on real devices, not simulators, for accurate performance metrics
App Lifecycle
// Handle app backgrounding
NotificationCenter.default.addObserver(
forName: UIApplication.didEnterBackgroundNotification,
object: nil,
queue: .main
) { _ in
Task {
await RunAnywhere.stopVAD()
try? await RunAnywhere.unloadModel()
}
}Other SDKs
Looking for Kotlin, React Native, or Flutter? Our SDKs are available on GitHub. Full documentation is coming soon as we finalize these platforms.
Kotlin Multiplatform
Cross-platform SDK for Android and iOS using Kotlin Multiplatform (KMP)
React Native
JavaScript/TypeScript SDK for React Native applications on iOS and Android
Flutter
Dart SDK for Flutter applications with seamless platform integration
Need help? Check out the GitHub repository or contact support.
© 2025 RunAnywhere, Inc. All rights reserved.