Is Droidrun reliable enough for CI/CD pipelines?

Droidrun achieved 91.4% success on Google's AndroidWorld benchmark. For exploratory testing and workflow automation, this is strong. For deterministic regression testing in CI/CD, most teams pair Droidrun with traditional frameworks for critical paths.

Droidrun Review 2026: Features, Pricing, Pros & Cons

Q: Is Droidrun free to use?

The Droidrun SDK is free and open-source on GitHub. However, running automations requires LLM API keys (OpenAI, Anthropic, Google, etc.), which have their own costs — typically $0.02-0.08 per task. Droidrun's cloud platform (MobileRun) is a separate paid product currently in waitlist.

Q: What devices does Droidrun support?

Droidrun supports Android and iOS devices. On Android, it uses a companion APK that accesses the Accessibility API. The cloud platform (MobileRun) provides hosted virtual devices so you don't need physical hardware.

Q: How does Droidrun compare to Appium?

Appium uses scripted test automation with coded element locators. Droidrun uses AI agents with natural language commands. Appium is faster and deterministic but fragile to UI changes. Droidrun is more resilient and easier to use but slower and non-deterministic. See our full comparison in the Droidrun vs Appium guide.

Q: What LLMs work with Droidrun?

Droidrun supports OpenAI (GPT-4o, GPT-4.1), Anthropic (Claude), Google (Gemini 2.5 Pro), DeepSeek, and local models via Ollama. Their best benchmark results used Gemini 2.5 Pro.

Droidrun launched in 2025 as an open-source framework that lets AI agents control real phones using natural language. No test scripts. No element locators. Just tell the AI what you want done, and it figures out the taps, swipes, and inputs.

After raising €2.1M in pre-seed funding and hitting 91.4% on Google’s AndroidWorld benchmark, Droidrun has quickly become the most-watched project in mobile AI automation. But does it live up to the hype?

This review covers everything you need to know: what it does, how it works, what it costs, and whether it’s ready for your team.

What Is Droidrun?

Droidrun is an open-source Python framework that enables large language models (LLMs) to control Android and iOS devices. Founded in Osnabrück, Germany, it’s built on a simple idea: instead of writing brittle test scripts, describe what you want in natural language and let an AI agent execute it.

The company offers two products:

Droidrun SDK — the open-source framework you run locally (free)
MobileRun — a cloud platform with hosted virtual devices (waitlist, paid)

Key Features

Natural Language Control

The core value proposition. Instead of writing Appium scripts or Espresso tests, you write commands like:

agent = DroidAgent(model="gemini-2.5-pro")
await agent.run("Open Instagram, go to my profile, and count my followers")

The LLM reads the screen’s accessibility tree, decides what to do, and executes. No element IDs, no XPath, no CSS selectors.

Accessibility Tree Architecture

This is what sets Droidrun apart from other AI agents. Most competitors (AppAgent, Mobile-Agent) use screenshots and computer vision to understand the screen. Droidrun reads the accessibility tree directly — the structured data layer that screen readers use.

The result: faster processing, more accurate element identification, and no dependence on visual model quality. This architectural choice is why Droidrun’s benchmark scores are dramatically higher than vision-based agents.

LLM Agnostic

Droidrun works with any major LLM provider:

Provider	Model	Best For
Google	Gemini 2.5 Pro	Best benchmark performance
OpenAI	GPT-4o, GPT-4.1	Good balance of speed and accuracy
Anthropic	Claude 3.5/4	Strong reasoning on complex tasks
DeepSeek	DeepSeek V3	Cost-effective alternative
Ollama	Llama, Mistral, etc.	Privacy — runs locally, no data leaves your machine

Multi-Step Reasoning

Droidrun doesn’t just execute one action at a time. It plans multi-step workflows, tracks state across screens, and recovers from errors. If a button isn’t where it expected, the agent reasons about what changed and adapts.

Cross-App Automation

Unlike traditional test frameworks that test one app in isolation, Droidrun can chain actions across multiple apps. Book a hotel in one app, copy the confirmation, paste it into a messaging app — all in one workflow.

Cloud Platform (MobileRun)

MobileRun wraps the SDK into a hosted cloud service:

Virtual Android devices spun up on demand
No USB cables, no local setup
50+ pre-installed apps (Instagram, Amazon, TikTok, DoorDash, etc.)
Credential management for secure account storage
Parallel execution across multiple devices
Auto-replay to record and re-run workflows

MobileRun is currently in waitlist mode.

Pricing

Droidrun’s pricing has two components:

SDK (free): The core framework is open-source under MIT license. You pay nothing for the code itself.

LLM API costs (variable): Every automation run sends data to your LLM provider. Based on benchmark data and community reports:

Task Complexity	Estimated Cost	Example
Simple (1-3 steps)	$0.01-0.03	Open app, tap button, read text
Medium (4-8 steps)	$0.03-0.08	Fill form, navigate menus, verify results
Complex (9+ steps)	$0.08-0.20	Multi-app workflow with decision points

Using cheaper models (DeepSeek, Ollama) can reduce costs significantly, but may lower success rates.

MobileRun Cloud: Pricing not yet public. Expected to include device time and API costs.

Cost comparison with alternatives:

Appium: $0 (open-source, self-hosted)
BrowserStack: $29-199/month for device cloud
MobileRun: TBD

Benchmark Performance

Droidrun’s strongest selling point is its AndroidWorld benchmark results. AndroidWorld is a benchmark created by Google Research featuring 116 diverse tasks across 20 real-world Android apps.

Droidrun’s scores over time:

Date	Score	Model Used
June 2025	63%	Gemini + Droidrun framework
Late 2025	91.4%	Gemini 2.5 Pro + enhanced reasoning

Compared to other mobile AI agents:

Agent	Success Rate	Approach
Droidrun	91.4%	Accessibility tree + LLM reasoning
Mobile-Use	Claims 100%*	Accessibility + vision hybrid
Mobile-Agent	29%	Visual UI perception
AutoDroid	14%	Minimal reasoning
AppAgent	7%	Vision-based screenshots

*Mobile-Use’s 100% claim is on a different evaluation methodology. Direct comparison should be made carefully.

Pros

Dramatically lower learning curve — describe tasks in English instead of writing code
Resilient to UI changes — the LLM adapts; no brittle locators to maintain
Open-source SDK — inspect, modify, contribute
LLM agnostic — not locked into any AI provider
Cross-app workflows — automate across multiple applications
Active development — the team ships updates frequently and engages with the community
Strong benchmark performance — 91.4% on AndroidWorld is leading the category

Cons

Non-deterministic — the same command may take different paths each run
Per-task LLM costs — no free lunches once you scale
Slower than scripted automation — LLM reasoning adds 30-90 seconds per task
Young project — founded 2025, still evolving rapidly
Cloud platform not yet available — MobileRun is waitlist-only
91.4% ≠ 100% — roughly 1 in 10 complex tasks may need retry
Debugging is harder — when a task fails, understanding why an LLM made a wrong decision is less straightforward than debugging a script

Who Is Droidrun For?

Great fit:

QA teams tired of maintaining flaky Appium/Espresso scripts
Startups that need mobile automation without dedicated SDET resources
Workflow automation — automating repetitive phone tasks beyond just testing
Teams doing exploratory testing who want AI to find edge cases
Developers who want to prototype mobile automations quickly

Not the best fit:

Teams needing 100% deterministic test execution
Large organizations with mature Appium infrastructure that works well
Budget-sensitive teams running thousands of tests daily (LLM costs)
Environments that can’t use external LLM APIs (regulated industries without Ollama option)

Getting Started

# Install the SDK
pip install droidrun

# Connect your Android device (USB debugging enabled)
adb devices

# Set your LLM API key
export GOOGLE_API_KEY="your-gemini-key"

# Run your first automation
droidrun run "Open Settings and turn on Dark Mode"

The quickstart guide walks through setup in detail. Most developers report getting their first automation running in under 10 minutes.

Verdict

Droidrun is the most impressive mobile AI agent framework available today. The accessibility-tree architecture is technically sound, the benchmark results are strong, and the open-source approach is the right call for building community trust.

It’s not a drop-in replacement for Appium or Espresso — the non-deterministic nature and per-task costs mean it serves different use cases. But for teams drowning in test maintenance, exploring AI-powered QA, or automating mobile workflows beyond traditional testing, Droidrun is the tool to try first.

The €2.1M pre-seed and growing community suggest this is just the beginning. Keep an eye on MobileRun’s cloud platform launch — if the pricing is competitive, it could become the default way teams interact with mobile devices programmatically.

Rating: 4.2/5 — Technically excellent, limited only by the inherent constraints of LLM-based automation and the ecosystem’s early maturity.

Frequently Asked Questions

Is Droidrun free to use?

The SDK is free and open-source. Running automations requires LLM API keys with their own costs — typically $0.02-0.08 per task. The cloud platform (MobileRun) is a separate paid product, currently in waitlist.

What devices does Droidrun support?

Android and iOS. On Android, it installs a companion APK accessing the Accessibility API. MobileRun’s cloud provides hosted virtual devices without needing physical hardware.

How does Droidrun compare to Appium?

Appium uses scripted test automation. Droidrun uses AI agents with natural language. Appium is faster and deterministic but fragile. Droidrun is resilient and easier to use but slower. Read our full Droidrun vs Appium comparison.

What LLMs work with Droidrun?

OpenAI (GPT-4o, GPT-4.1), Anthropic (Claude), Google (Gemini 2.5 Pro), DeepSeek, and local models via Ollama. Best benchmark results used Gemini 2.5 Pro.

Is Droidrun reliable enough for CI/CD?

At 91.4% on AndroidWorld, it’s strong for exploratory testing and workflow automation. For deterministic regression testing in CI/CD, most teams pair it with traditional frameworks for critical test paths.