Stefan Schmitt
← Back to blog

WWDC 2026: a good year to be building on-device AI

Drafted with AI; researched, edited, and fact-checked by me β€” how I write.

The year the whole stack moved

In most years, one or two of the things I care about move and the rest stays put. WWDC 2026 was not like that: the models moved, the runtime beneath them moved, the way I ship them moved, the app layer moved, and the tools I write all of it in moved β€” in a single week. For someone whose entire product is on-device AI on Apple silicon, that is about as good as it gets.

What follows is a tour of everything from this year's conference that touches what I build, and an account of why, reservations included, I came away optimistic. The betas are available now, with general release this autumn.

WWDC 2026 drawn as a six-layer stack under the caption 'the whole stack moved,' with an upward arrow on the left. From the foundation up: Delivery β€” Background Assets downloads only the language you need; Runtime β€” MLX gets faster and reaches the Hugging Face catalogue; Models β€” Foundation Models, one session backing many models; Measurement β€” built-in token counting and an evaluations harness; App layer β€” SwiftData and SwiftUI quality-of-life upgrades; Tools β€” Xcode 27 puts coding agents in the editor.

The models matured

The Foundation Models framework received the largest set of changes this year, and most of them point in the same direction: more choice, less plumbing.

There is now a public protocol, so a single LanguageModelSession can be backed by the on-device model, Apple's larger model on Private Cloud Compute, a downloaded MLX model, or β€” through Swift packages from Anthropic and Google β€” Claude and Gemini. The on-device model now accepts images as input, and the framework ships built-in tools that the model can call: OCR, a barcode reader, and a Spotlight-backed search tool for entirely local retrieval.

The change I keep returning to is that Private Cloud Compute is now available to developers β€” with no API key to store, and free within limits for smaller developers. For two years my mental model was a strict binary: either a small model on the user's device, or running my own cloud service and carrying the keys and the bill that come with it. A privacy-preserving middle tier that I did not have to build myself is genuinely welcome, provided I am honest in the interface about what runs where.

A faster, more reachable runtime

Beneath all of that sits MLX, the runtime on which I actually ship a 3B model. It became faster this year: Metal 4 support, GPU Neural Accelerators, and training that scales across several Macs over Thunderbolt. A production runtime that grows quicker with no work on my part is an upgrade I am glad to accept.

The greater reach is the more interesting part. MLXLanguageModel connects the entire mlx-community catalogue on Hugging Face β€” thousands of pre-quantised models β€” to that same session API. Prototyping a new model once meant wiring up a runtime; now it amounts to little more than changing a string, which is a genuine shortcut for the parts of the work that are purely experimental.

Measuring what I used to estimate by hand

In my runtime field notes I admitted to two slightly embarrassing habits: counting tokens by hand through the chat template, and never having pinned down an exact processing time. This year's release quietly addresses both. The framework exposes contextSize and tokenCount(for:), and responses now carry a usage value β€” including how many input tokens were served from cache and how many output tokens were spent on reasoning. That is precisely the measurement I had been approximating in log lines.

There is also a new Swift Evaluations framework for quantifying feature quality and catching regressions as prompts change. This was the announcement I found most striking. I had already built a DSPy-inspired tuning harness for Narration Room's pipeline: golden fixtures that pair each stage's input with an expected output and thresholds, every stage scored and run several times so that I can observe the run-to-run variance, and a comparison step that flags when a prompt change quietly makes a metric worse. Apple's framework is the measurement half of that same idea, now provided natively. Mine goes a step further and also optimises β€” a teacher model proposes prompt rewrites and keeps only those that score better β€” which Apple has not shipped. But to see a first-party framework arrive at the very shape I had been running in a side project is reassuring; it suggests I was on a sensible path rather than merely indulging a preference.

Smarter delivery

Shipping a multi-gigabyte model is a problem in its own right, as I described at length in an earlier post. This year, iOS 27 adds localised asset packs to Managed Background Assets, so that a user downloads only what their language requires rather than the whole download. My own models are not keyed by language, so it is not a perfect fit for my case, but smaller, more selective downloads are precisely the axis I had been reasoning about, and it is good to see Apple moving in that direction.

A more pleasant app layer

This is the collection of small improvements that accumulate. In SwiftData, @Attribute(.codable) is a sanctioned way to persist a Codable value type without promoting it to an entity of its own β€” the precise escape hatch I had wanted for storing a structured value alongside its row.

SwiftUI gained a number of quality-of-life changes: Liquid Glass refinements that your views adopt automatically with no extra code, a reorderable() modifier for drag-to-reorder, AsyncImage with built-in caching, and finer toolbar control as the interface resizes. None of these is a headline in isolation, but together they remove a series of small annoyances β€” which is much of what makes a year of application work feel lighter.

A genuine route to discoverability

This is one I am watching with interest. App Intents gained App Schemas and Entity Schemas, which allow an app to contribute its own content to the Spotlight semantic index and to expose natural-language actions to Siri and Apple Intelligence. For an app whose entire purpose is the material a user creates, a supported means for that content to be searchable and actionable from outside the app is exactly the sort of capability I am glad to see on the platform. I am not promising anything here; I am simply noting that the route is now a documented one.

The tools met me where I work

Xcode 27 is where I expected little and was pleasantly surprised. The most significant change for the way I work is that coding agents now live in the editor. The conversation sits in an editor pane alongside your code, a /plan command gathers context before making any changes, and the agent can launch sub-agents to explore in parallel. I already work with coding agents every day, so having Xcode meet me where I work, rather than in a separate window, is a genuine improvement.

Two further changes speak directly to my workflow. Localisation now has an agent that reads your code and sets up a String Catalog, together with a Generate Translations button that draws on project context and per-language style guidance. I ship in English, French, and German, so I welcome it β€” though for anything published I finish the translations by hand, because tone across three languages remains a matter of human judgement. The redesigned Organizer also adds a storage metric that breaks down how much space an app and its data occupy on the device. When an app downloads gigabytes of model weights, knowing precisely where that space goes is far from a luxury.

Dictation and voices

Speech and voices got genuine attention this year. iOS 27 introduces a smarter systemwide dictation experience built into the keyboard β€” handling punctuation, capitalisation, and corrections as you speak β€” and Siri's voice became more natural, with adjustable pace and expressivity. The catch is hardware: the more capable dictation and the expressive voices run on a larger on-device model that needs a recent device with around 12 GB of unified memory, so they are limited to the newest iPhones, iPads, and Macs.

For a developer building on speech, the picture is more measured. These are system and consumer features β€” a better keyboard, a more natural Siri β€” rather than a new Speech framework to call directly. The speech-adjacent developer session this year was generated subtitles, which builds on the same on-device transcription rather than introducing a new API. What I build my own transcription on is still last year's SpeechAnalyzer, and I still ship my own synthesis rather than the system voices. So it was a substantial year for speech as a user, and a steadier one for speech as something I program against.

What it amounts to

Step back, and the theme is hard to miss: the platform has moved towards the kind of application I am already building β€” on-device first, flexible about models, mindful of privacy, with the runtime, the delivery and the tooling all advanced together. A good deal of the scaffolding I had built by hand now has an official equivalent, which is not something to regret. It is the most useful thing a platform can do: take the tedious work off your hands so that you can spend your time on the parts that are genuinely your own.

What's next

If this sort of field note is useful, the newsletter is the best way to catch the next one.