RunAnywhere Swift SDK

Production-grade, on-device AI platform for iOS and macOS applications

SDK Version: 0.1.0iOS 17+ / macOS 14+Swift 5.9+

Introduction

The RunAnywhere Swift SDK is a production-grade, on-device AI platform for iOS and macOS applications. It provides a unified API for running AI models locally on Apple devices, offering:

  • LLM (Large Language Model) – Text generation with streaming support
  • STT (Speech-to-Text) – Audio transcription with multiple backends
  • TTS (Text-to-Speech) – Neural and system voice synthesis
  • VAD (Voice Activity Detection) – Real-time speech detection
  • Voice Agent – Complete voice pipeline orchestration

Core Philosophy

  • On-Device First: All AI inference runs locally, ensuring low latency and data privacy.
  • Plugin Architecture: Backend engines are optional modules—include only what you need.
  • Privacy by Design: Audio and text data never leaves the device unless explicitly configured.
  • Event-Driven: Subscribe to SDK events for reactive UI updates and observability.

Getting Started

Installation

Add RunAnywhere to your project using Swift Package Manager. Add the following to your Package.swift:

swift
dependencies: [
    .package(url: "https://github.com/RunanywhereAI/runanywhere-sdks.git", from: "0.1.0")
]

Add the products you need to your target dependencies:

swift
.target(
    name: "YourApp",
    dependencies: [
        .product(name: "RunAnywhere", package: "runanywhere-swift"),
        .product(name: "RunAnywhereLlamaCPP", package: "runanywhere-swift"),  // For LLM
        .product(name: "RunAnywhereONNX", package: "runanywhere-swift"),      // For STT/TTS
    ]
)

Quick Start

Here's a complete example to get you started with text generation:

swift
import RunAnywhere
import LlamaCPPRuntime

// 1. Initialize SDK at app launch
@main
struct MyApp: App {
    init() {
        do {
            try RunAnywhere.initialize(
                apiKey: "your-api-key",
                baseURL: "https://api.runanywhere.ai",
                environment: .production
            )

            // Register LlamaCPP module
            Task { @MainActor in
                ModuleRegistry.shared.register(LlamaCPP.self)
            }
        } catch {
            print("SDK init failed: \(error)")
        }
    }

    var body: some Scene {
        WindowGroup {
            ContentView()
        }
    }
}

// 2. Use in a SwiftUI view
struct ContentView: View {
    @State private var response = ""
    @State private var isLoading = false

    var body: some View {
        VStack {
            Text(response)
                .padding()

            Button("Generate") {
                Task {
                    await generateText()
                }
            }
            .disabled(isLoading)
        }
    }

    func generateText() async {
        isLoading = true
        defer { isLoading = false }

        do {
            // Load model if not already loaded
            if await !RunAnywhere.isModelLoaded {
                try await RunAnywhere.loadModel("my-llama-model")
            }

            // Generate text
            let result = try await RunAnywhere.generate(
                "Explain quantum computing in simple terms",
                options: LLMGenerationOptions(
                    maxTokens: 200,
                    temperature: 0.7
                )
            )

            response = result.text
            print("Generated in \(result.latencyMs)ms at \(result.tokensPerSecond) tok/s")

        } catch {
            response = "Error: \(error.localizedDescription)"
        }
    }
}

Initialization

Initialize the SDK once at app launch with your configuration:

swift
try RunAnywhere.initialize(
    apiKey: "your-api-key",       // Required (can be empty for development)
    baseURL: "https://api.runanywhere.ai",
    environment: .production       // .development | .staging | .production
)
EnvironmentLog LevelTelemetryMock Data
.developmentDebugYesYes
.stagingInfoYesNo
.productionWarningYesNo

LLM (Language Models)

The SDK provides both simple and advanced text generation APIs with streaming support.

chat()

Simple one-liner for quick text responses:

swift
// One-liner for quick responses
let response = try await RunAnywhere.chat("What is the capital of France?")
print(response)  // "The capital of France is Paris."

generate()

Full generation with detailed metrics and options:

swift
let result = try await RunAnywhere.generate(
    "Write a haiku about Swift programming",
    options: LLMGenerationOptions(
        maxTokens: 50,
        temperature: 1.0,
        topP: 0.9,
        stopSequences: ["###"]
    )
)

print("Response: \(result.text)")
print("Model: \(result.modelUsed)")
print("Tokens: \(result.tokensUsed)")
print("Speed: \(result.tokensPerSecond) tok/s")
print("Latency: \(result.latencyMs)ms")

// For reasoning models
if let thinking = result.thinkingContent {
    print("Reasoning: \(thinking)")
}

generateStream()

Stream tokens as they are generated for better UX:

swift
let streamResult = try await RunAnywhere.generateStream(
    "Tell me a story",
    options: LLMGenerationOptions(maxTokens: 500)
)

// Display tokens as they arrive
for try await token in streamResult.stream {
    print(token, terminator: "")
    textView.text += token
}

// Get final metrics after streaming completes
let metrics = try await streamResult.result.value
print("\n\nGenerated \(metrics.tokensUsed) tokens")

System Prompts

Configure model behavior with system prompts:

swift
let options = LLMGenerationOptions(
    maxTokens: 200,
    systemPrompt: """
    You are a senior Swift developer.
    Answer questions with code examples.
    Use modern Swift conventions.
    """
)

let result = try await RunAnywhere.generate(
    "How do I parse JSON in Swift?",
    options: options
)

STT (Speech-to-Text)

Transcribe audio with support for multiple backends, languages, and streaming.

transcribe()

Basic transcription from audio data or buffer:

swift
// From Data
let audioData = try Data(contentsOf: audioFileURL)
let text = try await RunAnywhere.transcribe(audioData)

// With options
let output = try await RunAnywhere.transcribeWithOptions(
    audioData,
    options: STTOptions(
        language: "en",
        enableTimestamps: true,
        enablePunctuation: true
    )
)

print("Text: \(output.text)")
if let segments = output.segments {
    for segment in segments {
        print("[\(segment.startTime)-\(segment.endTime)]: \(segment.text)")
    }
}

// From Audio Buffer
import AVFoundation

let buffer: AVAudioPCMBuffer = ... // From microphone or file
let output = try await RunAnywhere.transcribeBuffer(buffer, language: "en")

transcribeStream()

Stream audio for real-time transcription:

swift
// Create audio stream (e.g., from microphone)
let audioStream: AsyncStream<Data> = microphoneManager.audioStream

let transcriptionStream = try await RunAnywhere.transcribeStream(
    audioStream,
    options: STTOptions(language: "en")
)

for try await partialText in transcriptionStream {
    transcriptionLabel.text = partialText
}

STT Options

Configure transcription behavior:

swift
let options = STTOptions(
    language: "en",              // BCP-47 code
    detectLanguage: true,        // Auto-detect spoken language
    enablePunctuation: true,     // Add punctuation
    enableDiarization: false,    // Identify speakers
    maxSpeakers: 2,              // Max speakers to identify
    enableTimestamps: true,      // Word-level timestamps
    vocabularyFilter: ["RunAnywhere", "LlamaCPP"],
    sampleRate: 16000            // Input sample rate
)

TTS (Text-to-Speech)

Synthesize speech from text with neural voices and customization options.

synthesize()

Basic text-to-speech synthesis:

swift
// Load a TTS voice
try await RunAnywhere.loadTTSVoice("piper-en-us-amy")

// Synthesize
let output = try await RunAnywhere.synthesize(
    "Hello, welcome to RunAnywhere!",
    options: TTSOptions(rate: 1.0, pitch: 1.0)
)

// Play audio
import AVFoundation
let player = try AVAudioPlayer(data: output.audioData)
player.play()

synthesizeStream()

Stream audio synthesis for lower latency:

swift
let audioStream = await RunAnywhere.synthesizeStream(
    "This is a very long text that will be synthesized in chunks...",
    options: TTSOptions()
)

for try await audioChunk in audioStream {
    // Play chunks as they arrive for lower latency
    audioQueue.enqueue(audioChunk)
}

// Stop synthesis if needed
await RunAnywhere.stopSynthesis()

Available Voices

List and select from available TTS voices:

swift
let voices = await RunAnywhere.availableTTSVoices
for voice in voices {
    print(voice)
}

// Use specific voice
let options = TTSOptions(
    voice: "en-US-Neural2-F",
    rate: 0.8,      // 0.0-2.0, slower speed
    pitch: 1.3,     // 0.0-2.0, higher pitch
    volume: 1.0     // 0.0-1.0
)

VAD (Voice Activity Detection)

Detect when speech starts and stops in an audio stream for voice-activated interfaces.

swift
// Initialize VAD with custom configuration
let config = VADConfiguration(
    energyThreshold: 0.02,
    enableAutoCalibration: true,
    calibrationMultiplier: 2.5
)
try await RunAnywhere.initializeVAD(config)

// Detect speech in audio
import AVFoundation
let buffer: AVAudioPCMBuffer = ... // From microphone
let result = try await RunAnywhere.detectSpeech(in: buffer)

if result.isSpeech {
    print("Speech detected!")
}

// Speech Activity Callback
await RunAnywhere.setVADSpeechActivityCallback { event in
    switch event {
    case .speechStarted:
        print("User started speaking")
        startRecording()
    case .speechEnded(let duration):
        print("User stopped speaking after \(duration)s")
        stopRecording()
    }
}

// Start VAD processing
await RunAnywhere.startVAD()

// Cleanup when done
await RunAnywhere.stopVAD()
await RunAnywhere.cleanupVAD()

Voice Agent (Full Pipeline)

The Voice Agent orchestrates the complete voice interaction pipeline: VAD → STT → LLM → TTS.

swift
// Initialize Voice Agent
try await RunAnywhere.initializeVoiceAgent(
    sttModelId: "whisper-base-onnx",
    llmModelId: "llama-3.2-1b-q4",
    ttsVoice: "piper-en-us-amy"
)

// Process complete voice turn
let audioData = audioRecorder.capturedAudio
let result = try await RunAnywhere.processVoiceTurn(audioData)

if result.speechDetected {
    print("User said: \(result.transcription ?? "")")
    print("AI response: \(result.response ?? "")")

    // Play synthesized response
    if let audio = result.synthesizedAudio {
        audioPlayer.play(audio)
    }
}

// Stream voice processing
let audioStream: AsyncStream<Data> = microphoneManager.audioStream
let eventStream = await RunAnywhere.processVoiceStream(audioStream)

for try await event in eventStream {
    switch event {
    case .vadTriggered(let isSpeaking):
        updateMicrophoneUI(isActive: isSpeaking)

    case .transcriptionAvailable(let text):
        transcriptionLabel.text = text

    case .responseGenerated(let response):
        responseLabel.text = response

    case .audioSynthesized(let audio):
        audioPlayer.play(audio)

    case .error(let error):
        showError(error)

    case .processed(let result):
        // Complete turn finished
        break
    }
}

Configuration

Detailed configuration options for all SDK features.

LLM Generation Options

swift
public struct LLMGenerationOptions: Sendable {
    let maxTokens: Int           // Default: 100
    let temperature: Float       // 0.0-2.0, default: 0.8
    let topP: Float              // 0.0-1.0, default: 1.0
    let stopSequences: [String]  // Stop generation at these strings
    let streamingEnabled: Bool   // Enable token-by-token streaming
    let systemPrompt: String?    // System prompt for behavior
}

Memory Considerations

Model TypeTypical MemoryNotes
LLM Q4 (1B)1-2 GBSuitable for all devices
LLM Q4 (3B)3-4 GBiPhone Pro / iPad
LLM Q4 (7B)6-8 GBM1+ Macs only
STT (Whisper Base)150 MBUniversal
TTS (Piper)50-100 MBUniversal

Error Handling

All SDK errors conform to RunAnywhereError with detailed messages and recovery suggestions.

swift
do {
    try await RunAnywhere.loadModel("my-model")
    let result = try await RunAnywhere.generate("Hello")

} catch RunAnywhereError.notInitialized {
    showError("Please restart the app")

} catch RunAnywhereError.modelNotFound(let modelId) {
    showError("Model '\(modelId)' not found. Please download it first.")

} catch RunAnywhereError.modelLoadFailed(let modelId, let underlying) {
    print("Failed to load \(modelId): \(underlying?.localizedDescription ?? "unknown")")

} catch RunAnywhereError.generationTimeout(_) {
    showError("Request timed out. Try a shorter prompt.")

} catch RunAnywhereError.insufficientStorage(let required, let available) {
    let formatter = ByteCountFormatter()
    showError("Need \(formatter.string(fromByteCount: required)), only \(formatter.string(fromByteCount: available)) available")

} catch {
    showError(error.localizedDescription)
}

Error Categories

.initialization

notInitialized, invalidAPIKey

.model

modelNotFound, modelLoadFailed

.generation

generationFailed, contextTooLong

.storage

insufficientStorage, storageFull

Best Practices

Memory Management

  • Only load one LLM at a time to conserve memory
  • Unload models when switching tasks or backgrounding the app
  • Monitor ModelInfo.memoryRequired before loading
  • Use quantized models (Q4) for better performance on mobile devices

Performance Optimization

  • Preload commonly used models at app launch for faster response times
  • Use streaming for long text generations to improve perceived performance
  • Set appropriate maxTokens to balance response quality and speed
  • Test on real devices, not simulators, for accurate performance metrics

App Lifecycle

swift
// Handle app backgrounding
NotificationCenter.default.addObserver(
    forName: UIApplication.didEnterBackgroundNotification,
    object: nil,
    queue: .main
) { _ in
    Task {
        await RunAnywhere.stopVAD()
        try? await RunAnywhere.unloadModel()
    }
}

Other SDKs

Looking for Kotlin, React Native, or Flutter? Our SDKs are available on GitHub. Full documentation is coming soon as we finalize these platforms.

K

Kotlin Multiplatform

Cross-platform SDK for Android and iOS using Kotlin Multiplatform (KMP)

SDK AvailableDocs Coming Soon
RN

React Native

JavaScript/TypeScript SDK for React Native applications on iOS and Android

SDK AvailableDocs Coming Soon
F

Flutter

Dart SDK for Flutter applications with seamless platform integration

SDK AvailableDocs Coming Soon

Need help? Check out the GitHub repository or contact support.

© 2025 RunAnywhere, Inc. All rights reserved.