RunAnywhere Swift SDK

Production-grade, on-device AI platform for iOS and macOS applications

SDK Version: 0.1.0iOS 17+ / macOS 14+Swift 5.9+

Introduction

The RunAnywhere Swift SDK is a production-grade, on-device AI platform for iOS and macOS applications. It provides a unified API for running AI models locally on Apple devices, offering:

•LLM (Large Language Model) – Text generation with streaming support
•STT (Speech-to-Text) – Audio transcription with multiple backends
•TTS (Text-to-Speech) – Neural and system voice synthesis
•VAD (Voice Activity Detection) – Real-time speech detection
•Voice Agent – Complete voice pipeline orchestration

Core Philosophy

On-Device First: All AI inference runs locally, ensuring low latency and data privacy.
Plugin Architecture: Backend engines are optional modules—include only what you need.
Privacy by Design: Audio and text data never leaves the device unless explicitly configured.
Event-Driven: Subscribe to SDK events for reactive UI updates and observability.

Getting Started

Installation

Add RunAnywhere to your project using Swift Package Manager. Add the following to your Package.swift:

swift

dependencies: [
    .package(url: "https://github.com/RunanywhereAI/runanywhere-sdks.git", from: "0.1.0")
]

Add the products you need to your target dependencies:

swift

.target(
    name: "YourApp",
    dependencies: [
        .product(name: "RunAnywhere", package: "runanywhere-swift"),
        .product(name: "RunAnywhereLlamaCPP", package: "runanywhere-swift"),  // For LLM
        .product(name: "RunAnywhereONNX", package: "runanywhere-swift"),      // For STT/TTS
    ]
)

Quick Start

Here's a complete example to get you started with text generation:

swift

import RunAnywhere
import LlamaCPPRuntime

// 1. Initialize SDK at app launch
@main
struct MyApp: App {
    init() {
        do {
            try RunAnywhere.initialize(
                apiKey: "your-api-key",
                baseURL: "https://api.runanywhere.ai",
                environment: .production
            )

            // Register LlamaCPP module
            Task { @MainActor in
                ModuleRegistry.shared.register(LlamaCPP.self)
            }
        } catch {
            print("SDK init failed: \(error)")
        }
    }

    var body: some Scene {
        WindowGroup {
            ContentView()
        }
    }
}

// 2. Use in a SwiftUI view
struct ContentView: View {
    @State private var response = ""
    @State private var isLoading = false

    var body: some View {
        VStack {
            Text(response)
                .padding()

            Button("Generate") {
                Task {
                    await generateText()
                }
            }
            .disabled(isLoading)
        }
    }

    func generateText() async {
        isLoading = true
        defer { isLoading = false }

        do {
            // Load model if not already loaded
            if await !RunAnywhere.isModelLoaded {
                try await RunAnywhere.loadModel("my-llama-model")
            }

            // Generate text
            let result = try await RunAnywhere.generate(
                "Explain quantum computing in simple terms",
                options: LLMGenerationOptions(
                    maxTokens: 200,
                    temperature: 0.7
                )
            )

            response = result.text
            print("Generated in \(result.latencyMs)ms at \(result.tokensPerSecond) tok/s")

        } catch {
            response = "Error: \(error.localizedDescription)"
        }
    }
}

Initialization

Initialize the SDK once at app launch with your configuration:

swift

try RunAnywhere.initialize(
    apiKey: "your-api-key",       // Required (can be empty for development)
    baseURL: "https://api.runanywhere.ai",
    environment: .production       // .development | .staging | .production
)

Environment	Log Level	Telemetry	Mock Data
.development	Debug	Yes	Yes
.staging	Info	Yes	No
.production	Warning	Yes	No

LLM (Language Models)

The SDK provides both simple and advanced text generation APIs with streaming support.

chat()

Simple one-liner for quick text responses:

swift

// One-liner for quick responses
let response = try await RunAnywhere.chat("What is the capital of France?")
print(response)  // "The capital of France is Paris."

generate()

Full generation with detailed metrics and options:

swift

let result = try await RunAnywhere.generate(
    "Write a haiku about Swift programming",
    options: LLMGenerationOptions(
        maxTokens: 50,
        temperature: 1.0,
        topP: 0.9,
        stopSequences: ["###"]
    )
)

print("Response: \(result.text)")
print("Model: \(result.modelUsed)")
print("Tokens: \(result.tokensUsed)")
print("Speed: \(result.tokensPerSecond) tok/s")
print("Latency: \(result.latencyMs)ms")

// For reasoning models
if let thinking = result.thinkingContent {
    print("Reasoning: \(thinking)")
}

generateStream()

Stream tokens as they are generated for better UX:

swift

let streamResult = try await RunAnywhere.generateStream(
    "Tell me a story",
    options: LLMGenerationOptions(maxTokens: 500)
)

// Display tokens as they arrive
for try await token in streamResult.stream {
    print(token, terminator: "")
    textView.text += token
}

// Get final metrics after streaming completes
let metrics = try await streamResult.result.value
print("\n\nGenerated \(metrics.tokensUsed) tokens")

System Prompts

Configure model behavior with system prompts:

swift

let options = LLMGenerationOptions(
    maxTokens: 200,
    systemPrompt: """
    You are a senior Swift developer.
    Answer questions with code examples.
    Use modern Swift conventions.
    """
)

let result = try await RunAnywhere.generate(
    "How do I parse JSON in Swift?",
    options: options
)

STT (Speech-to-Text)

Transcribe audio with support for multiple backends, languages, and streaming.

transcribe()

Basic transcription from audio data or buffer:

swift

// From Data
let audioData = try Data(contentsOf: audioFileURL)
let text = try await RunAnywhere.transcribe(audioData)

// With options
let output = try await RunAnywhere.transcribeWithOptions(
    audioData,
    options: STTOptions(
        language: "en",
        enableTimestamps: true,
        enablePunctuation: true
    )
)

print("Text: \(output.text)")
if let segments = output.segments {
    for segment in segments {
        print("[\(segment.startTime)-\(segment.endTime)]: \(segment.text)")
    }
}

// From Audio Buffer
import AVFoundation

let buffer: AVAudioPCMBuffer = ... // From microphone or file
let output = try await RunAnywhere.transcribeBuffer(buffer, language: "en")

transcribeStream()

Stream audio for real-time transcription:

swift

// Create audio stream (e.g., from microphone)
let audioStream: AsyncStream<Data> = microphoneManager.audioStream

let transcriptionStream = try await RunAnywhere.transcribeStream(
    audioStream,
    options: STTOptions(language: "en")
)

for try await partialText in transcriptionStream {
    transcriptionLabel.text = partialText
}

STT Options

Configure transcription behavior:

swift

let options = STTOptions(
    language: "en",              // BCP-47 code
    detectLanguage: true,        // Auto-detect spoken language
    enablePunctuation: true,     // Add punctuation
    enableDiarization: false,    // Identify speakers
    maxSpeakers: 2,              // Max speakers to identify
    enableTimestamps: true,      // Word-level timestamps
    vocabularyFilter: ["RunAnywhere", "LlamaCPP"],
    sampleRate: 16000            // Input sample rate
)

TTS (Text-to-Speech)

Synthesize speech from text with neural voices and customization options.

synthesize()

Basic text-to-speech synthesis:

swift

// Load a TTS voice
try await RunAnywhere.loadTTSVoice("piper-en-us-amy")

// Synthesize
let output = try await RunAnywhere.synthesize(
    "Hello, welcome to RunAnywhere!",
    options: TTSOptions(rate: 1.0, pitch: 1.0)
)

// Play audio
import AVFoundation
let player = try AVAudioPlayer(data: output.audioData)
player.play()

synthesizeStream()

Stream audio synthesis for lower latency:

swift

let audioStream = await RunAnywhere.synthesizeStream(
    "This is a very long text that will be synthesized in chunks...",
    options: TTSOptions()
)

for try await audioChunk in audioStream {
    // Play chunks as they arrive for lower latency
    audioQueue.enqueue(audioChunk)
}

// Stop synthesis if needed
await RunAnywhere.stopSynthesis()

Available Voices

List and select from available TTS voices:

swift

let voices = await RunAnywhere.availableTTSVoices
for voice in voices {
    print(voice)
}

// Use specific voice
let options = TTSOptions(
    voice: "en-US-Neural2-F",
    rate: 0.8,      // 0.0-2.0, slower speed
    pitch: 1.3,     // 0.0-2.0, higher pitch
    volume: 1.0     // 0.0-1.0
)

VAD (Voice Activity Detection)

Detect when speech starts and stops in an audio stream for voice-activated interfaces.

swift

// Initialize VAD with custom configuration
let config = VADConfiguration(
    energyThreshold: 0.02,
    enableAutoCalibration: true,
    calibrationMultiplier: 2.5
)
try await RunAnywhere.initializeVAD(config)

// Detect speech in audio
import AVFoundation
let buffer: AVAudioPCMBuffer = ... // From microphone
let result = try await RunAnywhere.detectSpeech(in: buffer)

if result.isSpeech {
    print("Speech detected!")
}

// Speech Activity Callback
await RunAnywhere.setVADSpeechActivityCallback { event in
    switch event {
    case .speechStarted:
        print("User started speaking")
        startRecording()
    case .speechEnded(let duration):
        print("User stopped speaking after \(duration)s")
        stopRecording()
    }
}

// Start VAD processing
await RunAnywhere.startVAD()

// Cleanup when done
await RunAnywhere.stopVAD()
await RunAnywhere.cleanupVAD()

Voice Agent (Full Pipeline)

The Voice Agent orchestrates the complete voice interaction pipeline: VAD → STT → LLM → TTS.

swift

// Initialize Voice Agent
try await RunAnywhere.initializeVoiceAgent(
    sttModelId: "whisper-base-onnx",
    llmModelId: "llama-3.2-1b-q4",
    ttsVoice: "piper-en-us-amy"
)

// Process complete voice turn
let audioData = audioRecorder.capturedAudio
let result = try await RunAnywhere.processVoiceTurn(audioData)

if result.speechDetected {
    print("User said: \(result.transcription ?? "")")
    print("AI response: \(result.response ?? "")")

    // Play synthesized response
    if let audio = result.synthesizedAudio {
        audioPlayer.play(audio)
    }
}

// Stream voice processing
let audioStream: AsyncStream<Data> = microphoneManager.audioStream
let eventStream = await RunAnywhere.processVoiceStream(audioStream)

for try await event in eventStream {
    switch event {
    case .vadTriggered(let isSpeaking):
        updateMicrophoneUI(isActive: isSpeaking)

    case .transcriptionAvailable(let text):
        transcriptionLabel.text = text

    case .responseGenerated(let response):
        responseLabel.text = response

    case .audioSynthesized(let audio):
        audioPlayer.play(audio)

    case .error(let error):
        showError(error)

    case .processed(let result):
        // Complete turn finished
        break
    }
}

Configuration

Detailed configuration options for all SDK features.

LLM Generation Options

swift

public struct LLMGenerationOptions: Sendable {
    let maxTokens: Int           // Default: 100
    let temperature: Float       // 0.0-2.0, default: 0.8
    let topP: Float              // 0.0-1.0, default: 1.0
    let stopSequences: [String]  // Stop generation at these strings
    let streamingEnabled: Bool   // Enable token-by-token streaming
    let systemPrompt: String?    // System prompt for behavior
}

Memory Considerations

Model Type	Typical Memory	Notes
LLM Q4 (1B)	1-2 GB	Suitable for all devices
LLM Q4 (3B)	3-4 GB	iPhone Pro / iPad
LLM Q4 (7B)	6-8 GB	M1+ Macs only
STT (Whisper Base)	150 MB	Universal
TTS (Piper)	50-100 MB	Universal

Error Handling

All SDK errors conform to RunAnywhereError with detailed messages and recovery suggestions.

swift

do {
    try await RunAnywhere.loadModel("my-model")
    let result = try await RunAnywhere.generate("Hello")

} catch RunAnywhereError.notInitialized {
    showError("Please restart the app")

} catch RunAnywhereError.modelNotFound(let modelId) {
    showError("Model '\(modelId)' not found. Please download it first.")

} catch RunAnywhereError.modelLoadFailed(let modelId, let underlying) {
    print("Failed to load \(modelId): \(underlying?.localizedDescription ?? "unknown")")

} catch RunAnywhereError.generationTimeout(_) {
    showError("Request timed out. Try a shorter prompt.")

} catch RunAnywhereError.insufficientStorage(let required, let available) {
    let formatter = ByteCountFormatter()
    showError("Need \(formatter.string(fromByteCount: required)), only \(formatter.string(fromByteCount: available)) available")

} catch {
    showError(error.localizedDescription)
}

Error Categories

.initialization

notInitialized, invalidAPIKey

.model

modelNotFound, modelLoadFailed

.generation

generationFailed, contextTooLong

.storage

insufficientStorage, storageFull

Best Practices

Memory Management

•Only load one LLM at a time to conserve memory
•Unload models when switching tasks or backgrounding the app
•Monitor ModelInfo.memoryRequired before loading
•Use quantized models (Q4) for better performance on mobile devices

Performance Optimization

•Preload commonly used models at app launch for faster response times
•Use streaming for long text generations to improve perceived performance
•Set appropriate maxTokens to balance response quality and speed
•Test on real devices, not simulators, for accurate performance metrics

App Lifecycle

swift

// Handle app backgrounding
NotificationCenter.default.addObserver(
    forName: UIApplication.didEnterBackgroundNotification,
    object: nil,
    queue: .main
) { _ in
    Task {
        await RunAnywhere.stopVAD()
        try? await RunAnywhere.unloadModel()
    }
}

Other SDKs

Looking for Kotlin, React Native, or Flutter? Our SDKs are available on GitHub. Full documentation is coming soon as we finalize these platforms.

Kotlin Multiplatform

Cross-platform SDK for Android and iOS using Kotlin Multiplatform (KMP)

SDK AvailableDocs Coming Soon

React Native

JavaScript/TypeScript SDK for React Native applications on iOS and Android

SDK AvailableDocs Coming Soon

Flutter

Dart SDK for Flutter applications with seamless platform integration

SDK AvailableDocs Coming Soon

Need help? Check out the GitHub repository or contact support.