Blog

Tutorials, demos, and guides for building on-device AI apps

March 15, 2026

MetalRT Now Does Speech-to-Speech. 1.52x Faster Than mlx-audio.

MetalRT adds native speech-to-speech support. 1.68s end-to-end latency, 123 tok/s generation throughput, 1.52x faster than mlx-audio on a single M4 Max.

MetalRT Now Runs Vision Language Models. Fastest on Apple Silicon.

DEVELOPERS

March 13, 2026

MetalRT Now Runs Vision Language Models. Fastest on Apple Silicon.

MetalRT adds VLM support and wins every decode benchmark. 279 tok/s vision decode, 92ms time-to-output, 1.22x faster than mlx-vlm across all resolutions on a single M4 Max.

DEVELOPERS

March 13, 2026

How RunAnywhere SDK Powers On-Device AI Coaching in PickleRite

A deep-dive into how PickleRite — a pickleball performance tracker — runs a specialized LLM entirely on-device using RunAnywhere SDK. Zero cloud costs, full offline support, complete privacy.

MetalRT: The First Complete AI Inference Engine for Apple Silicon. Now with Speech.

DEVELOPERS

March 9, 2026

MetalRT: The First Complete AI Inference Engine for Apple Silicon. Now with Speech.

MetalRT becomes the first inference engine to handle LLMs, Speech-to-Text, and Text-to-Speech on Apple Silicon. 101ms to transcribe 70 seconds of audio. 178ms to synthesize speech. 4.6x faster than Apple MLX.

We Built the Fastest LLM Decode Engine for Apple Silicon. Here Are the Numbers.

DEVELOPERS

March 3, 2026

We Built the Fastest LLM Decode Engine for Apple Silicon. Here Are the Numbers.

MetalRT delivers 658 tok/s decode and 6.6ms time-to-first-token, winning decode on 3 of 4 models we tested on a single M4 Max.

FastVoice RAG: Sub-200ms Voice AI with Retrieval-Augmented Generation, Entirely On-Device

DEVELOPERS

February 24, 2026

FastVoice RAG: Sub-200ms Voice AI with Retrieval-Augmented Generation, Entirely On-Device

We added hybrid retrieval (BM25 + vector search) to our on-device voice pipeline. Retrieval adds less than 4ms. The real cost is LLM prefill — but word-level flushing absorbs it. Sub-200ms first-audio on 5,016 chunks with zero cloud dependencies.

FastVoice: 63ms First-Audio Latency for On-Device Voice AI on Apple Silicon

DEVELOPERS

February 22, 2026

FastVoice: 63ms First-Audio Latency for On-Device Voice AI on Apple Silicon

FastVoice achieves 63ms first-audio latency — well under the 200ms perceptual threshold — by composing STT, LLM, and TTS into a single C++ pipeline on Apple Silicon. No cloud. No network. Just speed.

I Built a Fully Offline AI Agent on Android. It Listens, Thinks, Acts, and Speaks Back.

DEVELOPERS

February 21, 2026

I Built a Fully Offline AI Agent on Android. It Listens, Thinks, Acts, and Speaks Back.

No server. No API key. No internet. Just a phone doing things on its own.

I Tried Running an LLM on a $150 Android Phone. Here's What Actually Happened.

DEVELOPERS

February 19, 2026

I Tried Running an LLM on a $150 Android Phone. Here's What Actually Happened.

And the rabbit hole that taught me more about Android internals than 3 years of app development.

On-Device Browser Agent: AI Web Automation Without the Cloud

DEVELOPERS

February 9, 2026

On-Device Browser Agent: AI Web Automation Without the Cloud

Automate web tasks with natural language using a Chrome extension powered by on-device AI. No API keys, no data leaving your browser, complete privacy.