Architecture
App Use has four runtime components. Each owns exactly one job; together they let an agent describe and drive any participating app — without bespoke per-app glue.
The components at a glance
+--------------------------------------------------+
| App process |
| |
+--------+ | +------------------+ +-------------------+ |
| Agent |--MCP-->| AppUseEndpoint |--->| IAppUseSurface | |
+--------+ | | (loopback SSE) | | (your code) | |
| +------------------+ +-------------------+ |
| | |
| +-- writes instance file |
+--------------------------------------------------+
|
v (file-drop / HTTP register)
+--------------------------------------------------+
| aiappuse-hub binary |
| |
| HubRegistrationHttpListener (Kestrel + TLS + |
| bearer / Entra JWT validation) |
| |
| AppUseHubMcpServer (apps.list/launch/stop/ |
| app.call proxy with per-tenant + per-tool ACLs)|
+--------------------------------------------------+
^
| MCP
+--------------------------------------------------+
| App-Use Console |
| |
| Spec / Live / Drive / Audit / Tokens / |
| Capture tabs - multi-host Hub watcher - |
| sign in with Microsoft - record + replay flows |
+--------------------------------------------------+
The app + surface
The application implements IAppUseSurface — declaring its AppSpec (screens, elements, actions) and answering read_screen, get_value, set_value, invoke, and a few more. The surface is your code's only mandatory contribution. Everything else in the stack is provided for you.
The SDK / AppUseEndpoint
The SDK hosts your surface over a loopback SSE MCP server inside the app process. It does the wire work so you don't: MCP transport, bearer minting, instance registration, consent prompts, audit, and tap streaming. You implement the methods on the surface and ignore the protocol entirely.
The hub
An agent that wants to drive many apps does not want N separate MCP connections — it wants one connection to the hub and lets the hub broker through. The aiappuse-hub binary brokers many apps to one MCP client, and centralises the things an operator must control once: per-tenant and per-tool ACLs, plus operator identity via Microsoft Entra.
The Console
A human operator needs to see what every agent is doing in every app. The App-Use Console gives that visibility, plus the ability to issue and revoke tokens, audit calls, inspect screenshots, mediate consent prompts, and record and replay flows. It subscribes to the same tap and audit surfaces over MCP that the hub exposes, so the operator sees everything live.
How a call flows
- Your app calls
AppUseHost.StartAsync(opts)at startup. The SDK stands up anAppUseEndpointon a random loopback port, mints an admin bearer, and writes<AIAPPUSE_HOME>/instances/<id>.jsonwith the URL, token, pid, and launch manifest. - The hub binary (running locally or remotely) watches the same directory and discovers the instance.
- An agent connects to the hub via MCP, calls
apps.list, sees your app, then callsapp.call <id> app.describeto retrieve your AppSpec. - From there the agent drives the surface:
read_screen,set_value,invoke, and so on. - The router gates every call against the bearer's scope, the in-process consent prompt (for Write/Execute), the per-tenant ACL, and the per-tool ACL. Successful calls write to the tap stream and the audit log.
A note on federation
The picture above is single-hub. When hubs federate into a cluster, each hub additionally runs a separate peer listener on 8767 for hub-to-hub traffic and an admin listener on 8768 for read-only cluster admin state — both distinct from the app-registration listener on 8766. Federation traffic uses mTLS pinned to a private cluster CA, so the OS trust store is irrelevant. See Federation for the full picture.
Where to next
- The AppSpec model — the declarative shape the surface publishes.
- Glossary — every term in one place.
- Security & consent — how the router gates each call.
- Federation — one agent driving apps across many devices.