Droidrun launched in 2025 as an open-source framework that lets AI agents control real phones using natural language. No test scripts. No element locators. Just tell the AI what you want done, and it figures out the taps, swipes, and inputs.

After raising €2.1M in pre-seed funding and hitting 91.4% on Google’s AndroidWorld benchmark, Droidrun has quickly become the most-watched project in mobile AI automation. But does it live up to the hype?

This review covers everything you need to know: what it does, how it works, what it costs, and whether it’s ready for your team.

What Is Droidrun?

Droidrun is an open-source Python framework that enables large language models (LLMs) to control Android and iOS devices. Founded in Osnabrück, Germany, it’s built on a simple idea: instead of writing brittle test scripts, describe what you want in natural language and let an AI agent execute it.

The company offers two products:

  • Droidrun SDK — the open-source framework you run locally (free)
  • MobileRun — a cloud platform with hosted virtual devices (waitlist, paid)

Key Features

Natural Language Control

The core value proposition. Instead of writing Appium scripts or Espresso tests, you write commands like:

agent = DroidAgent(model="gemini-2.5-pro")
await agent.run("Open Instagram, go to my profile, and count my followers")

The LLM reads the screen’s accessibility tree, decides what to do, and executes. No element IDs, no XPath, no CSS selectors.

Accessibility Tree Architecture

This is what sets Droidrun apart from other AI agents. Most competitors (AppAgent, Mobile-Agent) use screenshots and computer vision to understand the screen. Droidrun reads the accessibility tree directly — the structured data layer that screen readers use.

The result: faster processing, more accurate element identification, and no dependence on visual model quality. This architectural choice is why Droidrun’s benchmark scores are dramatically higher than vision-based agents.

LLM Agnostic

Droidrun works with any major LLM provider:

ProviderModelBest For
GoogleGemini 2.5 ProBest benchmark performance
OpenAIGPT-4o, GPT-4.1Good balance of speed and accuracy
AnthropicClaude 3.5/4Strong reasoning on complex tasks
DeepSeekDeepSeek V3Cost-effective alternative
OllamaLlama, Mistral, etc.Privacy — runs locally, no data leaves your machine

Multi-Step Reasoning

Droidrun doesn’t just execute one action at a time. It plans multi-step workflows, tracks state across screens, and recovers from errors. If a button isn’t where it expected, the agent reasons about what changed and adapts.

Cross-App Automation

Unlike traditional test frameworks that test one app in isolation, Droidrun can chain actions across multiple apps. Book a hotel in one app, copy the confirmation, paste it into a messaging app — all in one workflow.

Cloud Platform (MobileRun)

MobileRun wraps the SDK into a hosted cloud service:

  • Virtual Android devices spun up on demand
  • No USB cables, no local setup
  • 50+ pre-installed apps (Instagram, Amazon, TikTok, DoorDash, etc.)
  • Credential management for secure account storage
  • Parallel execution across multiple devices
  • Auto-replay to record and re-run workflows

MobileRun is currently in waitlist mode.

Pricing

Droidrun’s pricing has two components:

SDK (free): The core framework is open-source under MIT license. You pay nothing for the code itself.

LLM API costs (variable): Every automation run sends data to your LLM provider. Based on benchmark data and community reports:

Task ComplexityEstimated CostExample
Simple (1-3 steps)$0.01-0.03Open app, tap button, read text
Medium (4-8 steps)$0.03-0.08Fill form, navigate menus, verify results
Complex (9+ steps)$0.08-0.20Multi-app workflow with decision points

Using cheaper models (DeepSeek, Ollama) can reduce costs significantly, but may lower success rates.

MobileRun Cloud: Pricing not yet public. Expected to include device time and API costs.

Cost comparison with alternatives:

  • Appium: $0 (open-source, self-hosted)
  • BrowserStack: $29-199/month for device cloud
  • MobileRun: TBD

Benchmark Performance

Droidrun’s strongest selling point is its AndroidWorld benchmark results. AndroidWorld is a benchmark created by Google Research featuring 116 diverse tasks across 20 real-world Android apps.

Droidrun’s scores over time:

DateScoreModel Used
June 202563%Gemini + Droidrun framework
Late 202591.4%Gemini 2.5 Pro + enhanced reasoning

Compared to other mobile AI agents:

AgentSuccess RateApproach
Droidrun91.4%Accessibility tree + LLM reasoning
Mobile-UseClaims 100%*Accessibility + vision hybrid
Mobile-Agent29%Visual UI perception
AutoDroid14%Minimal reasoning
AppAgent7%Vision-based screenshots

*Mobile-Use’s 100% claim is on a different evaluation methodology. Direct comparison should be made carefully.

Pros

  • Dramatically lower learning curve — describe tasks in English instead of writing code
  • Resilient to UI changes — the LLM adapts; no brittle locators to maintain
  • Open-source SDK — inspect, modify, contribute
  • LLM agnostic — not locked into any AI provider
  • Cross-app workflows — automate across multiple applications
  • Active development — the team ships updates frequently and engages with the community
  • Strong benchmark performance — 91.4% on AndroidWorld is leading the category

Cons

  • Non-deterministic — the same command may take different paths each run
  • Per-task LLM costs — no free lunches once you scale
  • Slower than scripted automation — LLM reasoning adds 30-90 seconds per task
  • Young project — founded 2025, still evolving rapidly
  • Cloud platform not yet available — MobileRun is waitlist-only
  • 91.4% ≠ 100% — roughly 1 in 10 complex tasks may need retry
  • Debugging is harder — when a task fails, understanding why an LLM made a wrong decision is less straightforward than debugging a script

Who Is Droidrun For?

Great fit:

  • QA teams tired of maintaining flaky Appium/Espresso scripts
  • Startups that need mobile automation without dedicated SDET resources
  • Workflow automation — automating repetitive phone tasks beyond just testing
  • Teams doing exploratory testing who want AI to find edge cases
  • Developers who want to prototype mobile automations quickly

Not the best fit:

  • Teams needing 100% deterministic test execution
  • Large organizations with mature Appium infrastructure that works well
  • Budget-sensitive teams running thousands of tests daily (LLM costs)
  • Environments that can’t use external LLM APIs (regulated industries without Ollama option)

Getting Started

# Install the SDK
pip install droidrun

# Connect your Android device (USB debugging enabled)
adb devices

# Set your LLM API key
export GOOGLE_API_KEY="your-gemini-key"

# Run your first automation
droidrun run "Open Settings and turn on Dark Mode"

The quickstart guide walks through setup in detail. Most developers report getting their first automation running in under 10 minutes.

Verdict

Droidrun is the most impressive mobile AI agent framework available today. The accessibility-tree architecture is technically sound, the benchmark results are strong, and the open-source approach is the right call for building community trust.

It’s not a drop-in replacement for Appium or Espresso — the non-deterministic nature and per-task costs mean it serves different use cases. But for teams drowning in test maintenance, exploring AI-powered QA, or automating mobile workflows beyond traditional testing, Droidrun is the tool to try first.

The €2.1M pre-seed and growing community suggest this is just the beginning. Keep an eye on MobileRun’s cloud platform launch — if the pricing is competitive, it could become the default way teams interact with mobile devices programmatically.

Rating: 4.2/5 — Technically excellent, limited only by the inherent constraints of LLM-based automation and the ecosystem’s early maturity.

Frequently Asked Questions

Is Droidrun free to use?

The SDK is free and open-source. Running automations requires LLM API keys with their own costs — typically $0.02-0.08 per task. The cloud platform (MobileRun) is a separate paid product, currently in waitlist.

What devices does Droidrun support?

Android and iOS. On Android, it installs a companion APK accessing the Accessibility API. MobileRun’s cloud provides hosted virtual devices without needing physical hardware.

How does Droidrun compare to Appium?

Appium uses scripted test automation. Droidrun uses AI agents with natural language. Appium is faster and deterministic but fragile. Droidrun is resilient and easier to use but slower. Read our full Droidrun vs Appium comparison.

What LLMs work with Droidrun?

OpenAI (GPT-4o, GPT-4.1), Anthropic (Claude), Google (Gemini 2.5 Pro), DeepSeek, and local models via Ollama. Best benchmark results used Gemini 2.5 Pro.

Is Droidrun reliable enough for CI/CD?

At 91.4% on AndroidWorld, it’s strong for exploratory testing and workflow automation. For deterministic regression testing in CI/CD, most teams pair it with traditional frameworks for critical test paths.