Every QA engineer has felt this pain. You spend two days writing a test suite — thorough, clean, covering every user flow. Then the designer rounds a button's corners and changes one class name, and half your tests turn red overnight.
That's the fundamental problem with how we've been testing web applications for the past decade. Tools like Selenium and most codegen-based frameworks record scripts — rigid sequences of DOM queries frozen at a single moment in time. They don't understand the page. They match patterns.
A human QA engineer doesn't work that way. They look at the page, read it visually, decide what to click based on what they see, and verify the result with their own eyes. They don't care what the button's class name is — they click the blue button that says "Submit."
That insight is the entire premise behind Project Hawkeye: what if tests could see?
The Visual Agent Approach
At every step of a test, Hawkeye's agent does two things simultaneously. It takes a screenshot of the current browser state and captures the page's accessibility tree — a structured representation of every interactive element. Both go to a vision-capable language model.
The screenshot gives the agent spatial understanding: layout, color, visual hierarchy, error states. If there's a red toast notification in the corner, the agent sees it. If a modal has appeared, the agent sees that too. The accessibility tree gives the agent precision — exact references to elements it can click, type into, or select from.
This dual-input design is what separates Hawkeye from both traditional testing tools, which have no vision, and pure screenshot-based agents, which can see but lack precision for interaction.
Writing Tests That Adapt
Instead of recording a sequence of clicks, you describe a goal in plain language: log in to the app and add the first product to the shopping cart. The agent figures out how to accomplish that goal by looking at what's actually on the screen.
You can provide checkpoints — intermediate milestones the agent should hit in order. Or you can leave it completely open and let the agent plan its own route. The result is a test that doesn't break when you rename a class, move a button, or completely redesign the checkout flow.
What Happens When a Site Fights Back
Not every test succeeds — and that's intentional. When we ran the agent against a site with aggressive bot detection, it hit a wall and received a CAPTCHA. Against a site that required login credentials it didn't have, it correctly identified the blocker within two steps and stopped.
The agent doesn't crash on hostile sites. It reasons about what it sees, recognizes when the goal is blocked, and returns a clear result with its observations and reasoning. That behavior — graceful failure with evidence — is what makes it useful in a real CI pipeline rather than just a demo.
The Bigger Picture
The goal of Project Hawkeye isn't to replace human QA. It's to automate the part of QA that's currently too brittle to automate reliably: visual verification against real, changing user interfaces. When a test passes in Hawkeye, it means an agent looked at your application the way a user would, followed a goal, and visually confirmed the outcome. That's a different kind of confidence.