"A test that never fails is either useless or you're not trying hard enough."
I have been staring at our test suite for the better part of a day now. We have 33 tests. Twenty-six pass. Seven fail. The failing tests are not failing because something is broken, they are failing because our component changed but our tests did not. We refactored the pagination controls to show numbered links instead of prev/next buttons, and somewhere in that refactor, nobody told the tests.
This is not a critique. This is a window into how testing actually works in a project like ours, and more importantly, what we need to do differently now that we are starting to work with AI agents.
What We Actually Have
Let me walk you through what exists, because it is not nothing. It is actually a pretty solid foundation, just incomplete.
We run two types of tests. Unit tests through something called Vitest, and end-to-end tests through something called Playwright. The distinction matters, so let me explain what each does.
Vitest: The Fast Stuff
Vitest is our unit test runner. It takes individual functions and React components and tests them in isolation. When you run npm run test, this is what fires first.
Here is the configuration in vitest.config.ts:
The key piece here is jsdom. This is a library that pretends to be a web browser inside Node.js. Your tests can create elements, click buttons, type in input boxes, all without actually opening Chrome. It is fast. It is cheap to run. It is the first line of defense.
Our setup file (src/test/setup.ts) does something clever. It mocks Next.js navigation:
Why is this important? Because our components use useRouter from Next.js. If we tried to run tests without mocking this, everything would crash. The test would try to call a function that only exists in a browser, but we are running in Node. The mock says "here is a fake router, do not worry about the real one."
Playwright: The Real Browser Stuff
Then we have Playwright. This opens an actual Chromium browser, navigates to pages, clicks real buttons, and checks if things work. It is slower, but it catches bugs that unit tests miss.
Our Playwright configuration (playwright.config.ts) is lean:
The webServer block is elegant. When you run E2E tests, it automatically starts your dev server, waits for it to be ready, runs the tests, and cleans up. You do not have to manually start the server.
What We Actually Test
Let me show you what is covered and what is not.
Passing Tests
We have three test files that pass completely:
pagination-utils.test.ts- 16 tests for our pagination mathSearchBar.test.tsx- 7 tests for the search component- Two E2E files that run against the actual app
Here is one of the pagination tests, so you can see the pattern:
This is a unit test. It takes a function, gives it inputs, and checks the outputs. No browser required. No React rendering required. Pure logic.
The Failing Tests
And here is where it gets interesting. Our PaginationControls.test.tsx has 7 failing tests. Let me show you one:
This test expects to find the text "2 / 5" on the page. But when I looked at the actual component, it renders differently now. Let me show you what it actually outputs:
See the mismatch? The test expects:
- "2 / 5" text somewhere
- Prev/Next buttons
The component actually renders:
- Just numbered links
- No "2 / 5" text
- No Prev/Next buttons
Someone refactored the component and did not update the tests. This happens. This is why tests are maintenance.
The Plus Sides
Let me tell you what works well, because a lot does.
The Architecture Is Sound
We picked good tools. Vitest is fast and modern. Playwright is the current standard for E2E. Testing Library pushes us toward accessible testing practices. The setup is not wrong, it is just unfinished.
The Pattern Is Right
We put tests next to the code they test:
src/
├── components/shared/
│ ├── PaginationControls.tsx
│ └── __tests__/
│ └── PaginationControls.test.tsx
This is the right organizational instinct. When you see a component, the tests are right there.
The E2E Tests Are Smart
Look at this test from pagination.spec.ts:
This is not just checking if pagination renders. It checks if pagination actually works by verifying that page 1 and page 2 show different content. This is the kind of test that catches real bugs.
Edge Cases Are Considered
Our edge-cases.spec.ts tests things like:
- Invalid page numbers (
?page=abc) - Negative pages (
?page=-1) - Extremely large page numbers (
?page=999) - Very long search queries (200 characters)
- XSS attempts in the URL
This shows someone was thinking about what happens when things go wrong.
What Is Missing
Now for the honest part. Here is what we do not have:
No CI/CD Pipeline
We have no GitHub Actions workflow. No automated tests run when you push code. You have to manually remember to run npm run test and npm run test:e2e. Most of the time, people do not.
No Coverage Reports
The Vitest configuration mentions coverage:
But we never generate these reports. We do not know what percentage of our code is tested. We are flying blind.
Coverage Is Narrow
We test pagination and search. That is it. We do not test:
- How bloqs render
- How blips work
- Theme switching
- Form submissions
- Authentication flows
- API routes
- Error boundaries
The actual content of our site, the things users come to see, have zero test coverage.
No Cross-Browser Testing
Our Playwright config only runs Chromium:
We have no idea if our site works in Firefox or Safari.
Why This Matters Now More Than Ever
Here is where I need to connect this to the bigger picture. We are starting to work with AI agents. I am starting to use Claude and other language models to help write code. And this changes everything about testing.
When a human writes code, they have a mental model of what should happen. When they write tests, they are formalizing that mental model. The tests are a specification of intent.
When an agent writes code, it does not have that mental model. It generates code that satisfies patterns it has seen. Sometimes those patterns are correct. Sometimes they are subtly wrong. And here is the thing: the agent does not know the difference.
Without good tests, agents can introduce bugs that no one catches. The agent thinks it succeeded because the code looks right. But it is not right. And there is no test to say otherwise.
This is why the testing infrastructure is not optional. It is the foundation that makes agentic engineering possible. Without it, we cannot trust agents to help us build.
The Path Forward
I want to propose three phases of improvement. Not everything at once, but a direction.
Phase One: Fix What We Have
The first step is to fix the failing tests. Either update the PaginationControls tests to match the current implementation, or revert the component to match the tests. Pick one and be consistent.
Then add a simple GitHub Actions workflow to run tests on every push. It does not need to be fancy. Just run the tests and fail the build if they break.
Phase Two: Expand Coverage
Start adding tests for the parts that matter most:
- Test the bloq rendering logic
- Test the blip repository
- Test API route responses
- Add loading state tests
Also, start generating coverage reports. We do not need 100% coverage. We need to know where the gaps are.
Phase Three: Agentic Testing Infrastructure
This is the ambitious part. This is where we build testing infrastructure that supports agents working in parallel.
Imagine this: you have multiple agents, each responsible for testing a different aspect of the system. One agent runs functional tests. Another runs performance tests. A third runs accessibility audits. They run in parallel, report back, and together they give you a comprehensive view of whether your code is ready.
To get here, we need:
- Test orchestration - A way to run multiple test suites in parallel and aggregate results
- Test generation - Agents that can look at new code and write tests for it automatically
- Self-healing tests - Tests that can detect when a UI change broke a selector and fix themselves
- Better logging - Detailed traces of what agents did so humans can review
This is not science fiction. Tools like Claude and GPT can already write tests. The challenge is orchestration and trust. We need systems that let agents write tests, but verify those tests before we accept them.
What I Take Away
-
Our testing foundation is solid but incomplete. We have the right tools, just not enough coverage.
-
The failing tests are not a disaster. They are a symptom of a healthy but unfinished project. Refactoring happens, tests get left behind, we fix them and move on.
-
For agentic engineering to work, we need tests that agents can use as a source of truth. Without tests, agents are flying without instrumentation.
-
The path forward is not to test everything at once. It is to fix the immediate problems, expand gradually, and think about orchestration as a long-term goal.
-
Testing is not overhead. It is the contract between you today and you six months from now. It is how you tell your future self what you meant.
The infrastructure we have got us this far. The infrastructure we need will get us further. But we have to build it deliberately, one layer at a time.