There’s a type of Mobile development work that doesn’t feel like engineering. A new feature ships. QA wants E2E coverage. You open the maestro/ folder, stare at an existing test for reference, copy the structure, swap out element IDs, run it, watch it fail because a timeout is too short, bump the timeout, run it again. Repeat until it passes. Submit the PR.
It’s not hard. It’s just slow and mechanical — exactly the kind of work AI should be handling.
The problem was: every time I tried to get Claude to generate a Maestro test, it would produce something that looked right but was wrong in subtle ways. Wrong selectors. Wrong timeout values. Missing post-login interruption handling.
So I stopped asking Claude to write Maestro tests directly. I built a skill that teaches it how to do it right — and before you ask: yes, Maestro MCP exists, and no, it wasn’t enough. More on that later.
The Setup
The skill — /create-maestro-test — works in two modes. You can describe what you want in plain English:
/create-maestro-test "Test navigating to inbox and opening the first conversation"
Or you can invoke it with no arguments and it walks you through a questionnaire: feature, action, test type (main flow vs. reusable subflow), user type, clean state, recording, build flavor.
Once it has what it needs, it doesn’t just write YAML from scratch. It reads your existing tests first — 2 or 3 similar ones — learns your element IDs, your timeout patterns, your label conventions, and builds from those. The output ends up looking like it belongs in your codebase because it actually learned from your codebase.
Building a Test From Scratch, No Prior Knowledge Needed
One of the most powerful aspects of the skill is that you don’t need to know anything about the existing test infrastructure to get started. The questionnaire handles all of it.
You answer: what feature, what action, which user type, clean state or not. The skill then does the heavy lifting — it loads the shared user credentials, reads the timeout constants, searches the codebase for similar existing tests, extracts the relevant element IDs from them, and assembles a complete test that already follows your project’s conventions.
For a brand new screen with no prior tests to reference, it still works. It falls back to the Compose and XML naming conventions documented in its guides, marks any uncertain IDs with # TODO: Verify element ID, and gives you a scaffold that's already 80% right. The remaining 20% is confirming that the IDs actually exist in the app — something that takes minutes with Layout Inspector.
The questionnaire isn’t just a UX improvement. It’s what makes the skill usable by anyone on the team, regardless of how much they know about Maestro or the existing test suite.
The Thing That Surprised Me Most
Before building this skill, I assumed the hard part would be getting Claude to write valid YAML. It wasn’t.
The hard part was teaching it how to find the right UI components to test. Specifically, how to distinguish between XML elements and Compose components, and how to know which selector to use for each.
Maestro uses id: as the universal selector for both XML resource IDs and Compose test tags — but there's a critical distinction in how you tag Compose components. If a developer uses Modifier.testTag("ProfileCard"), Maestro will not find it. The element simply doesn't appear. We kept getting "Element not found" errors and couldn't figure out why — the Layout Inspector clearly showed the component.
The fix: developers need to use Modifier.testTagAsId("ProfileCard") instead. Once tagged with testTagAsId, it's reachable in Maestro via id: 'ProfileCard' — the same selector you'd use for an XML view. The naming convention is the only visual cue that tells you which you're dealing with:
- XML →
snake_case (e.g., inbox_conversation_container) - Compose →
PascalCase (e.g., ProfileCard, SendMessageButton)
Once that distinction was clear and documented in the skill’s guides, the component-finding problem was effectively solved. The skill now searches the codebase for testTagAsId usages when targeting Compose screens, and falls back to the Layout Inspector instructions when nothing is found. That's the kind of thing that takes an hour to debug the first time and two seconds to fix once you know — and now nobody on the team has to rediscover it.
The Auto-Fix Loop
When a generated test fails, the skill doesn’t just report the error — it categorizes it and applies a targeted fix before retrying. There are three failure types it handles automatically:
Timeout / Element Not Found — The element exists but Maestro can’t find it in time or with the current selector. The skill increases the timeout to 50s, tries an alternative selector (checking whether an XML id should be a Compose testTagAsId or vice versa), and adds an extra wait before the failing step.
Element Not Tappable — The element is visible but can’t be interacted with, usually due to an overlay or an animation still in progress. The skill adds an extendedWaitUntil before the tap, tries tapping by visible text as a fallback, and checks for anything blocking the element.
Selector Ambiguity — Multiple elements match the selector. The skill adds index: 0 to target the first match and narrows the selector where possible.
It retries up to three times, each attempt applying a different fix strategy. After each retry it tells you what changed and why. If all three attempts fail, it hands you the full Maestro output with a clear explanation of the failure category — so you know exactly where to look, not just that something broke.
Most first-run failures fall into one of those three buckets. The auto-fix resolves the majority without any manual intervention.
In Practice
Here’s what the inbox test from this post’s intro looks like now:
/create-maestro-test "Navigate to inbox and open a conversation"
Total time from prompt to passing test: under 5 minutes, most of which is the emulator booting.
Why Not Maestro MCP?
The skill was built before Maestro MCP was available. When MCP did arrive, the reason to keep the skill became clear immediately.
MCP gives Claude knowledge of Maestro. It doesn’t give Claude knowledge of your app. It doesn’t know which element IDs your project uses, how your timeout constants are defined, or what your post-login interruption flow expects. Without that codebase-specific context, the output looks plausible but fails at runtime in exactly the ways described above — wrong selectors, mismatched timeouts, missing interruption handling.
The skill bridges that gap. It reads your codebase before generating anything, and that’s the difference between output that looks right and output that actually runs.
The Takeaway
Building this skill forced us to understand Maestro more deeply than we ever would have by just writing tests manually. The testTagAsId vs testTag distinction, the auto-fix categories, the selector priority rules — none of these would have ended up documented if we hadn't had to teach them to an AI precisely enough to generate correct output.
That’s the underrated benefit of this kind of work. You don’t just get automation. You get clarity about what you actually know — and a permanent, shareable record of it.
The questionnaire-driven approach also changed how we think about test authorship. It’s no longer a task that requires deep knowledge of the test infrastructure. Anyone who knows what they want to test can produce a valid, passing Maestro test in minutes. That’s the real unlock — not the YAML generation, but the democratization of E2E test coverage across the team.