The problem: coding accelerates, I can't keep up with tests. Either I review everything and we're not moving faster, either I don't and don't trust them
The idea: executable specs
Write plain text tests, i.e. revive Cucumber/Behat (the "gherkin" language), BUT without the implementation step. AI run the tests in a browser, like human QA would.
I've been trying it out for 3 weeks:
1. serves as specs for new features
2. serves as acceptance criteria (Claude/Codex runs them after implementing)
3. serves as regression testing: run the full test suite every night
`npx exspec` runs Claude Code under the hood (with your existing Pro/Max subscription) with Playwright, all built-in. So using exspec is as easy as running PHPUnit/Pest/Vitest/… tests, nothing to install.
To be clear, it comes on top of unit tests, it doesn't replace them. It's great for covering happy paths and important business logic. It helps increase confidence, and also helps specifying new features.
I am cautious with letting AI "interpret" how to execute each step. But so far I'm pleasantly surprised :
- Claude seems to respect the steps (and not fake it)
- AI being very resilient, it's able to figure things out and recover (e.g. it needs to create a user but the email is already used -> no problem, it tries again with a different email address)
- It uses the website UI and nothing else, it's like a human QA -> it cannot edit files, cannot touch the database, or run code
- It reports failure with plenty of details, making it super easy to create issues from failures
I'm still careful and I want to scale up testing the idea, that's why I'm sharing publicly.
The problem: coding accelerates, I can't keep up with tests. Either I review everything and we're not moving faster, either I don't and don't trust them
The idea: executable specs Write plain text tests, i.e. revive Cucumber/Behat (the "gherkin" language), BUT without the implementation step. AI run the tests in a browser, like human QA would.
I've been trying it out for 3 weeks:
1. serves as specs for new features 2. serves as acceptance criteria (Claude/Codex runs them after implementing) 3. serves as regression testing: run the full test suite every night
`npx exspec` runs Claude Code under the hood (with your existing Pro/Max subscription) with Playwright, all built-in. So using exspec is as easy as running PHPUnit/Pest/Vitest/… tests, nothing to install.
To be clear, it comes on top of unit tests, it doesn't replace them. It's great for covering happy paths and important business logic. It helps increase confidence, and also helps specifying new features.
I am cautious with letting AI "interpret" how to execute each step. But so far I'm pleasantly surprised : - Claude seems to respect the steps (and not fake it) - AI being very resilient, it's able to figure things out and recover (e.g. it needs to create a user but the email is already used -> no problem, it tries again with a different email address) - It uses the website UI and nothing else, it's like a human QA -> it cannot edit files, cannot touch the database, or run code - It reports failure with plenty of details, making it super easy to create issues from failures
I'm still careful and I want to scale up testing the idea, that's why I'm sharing publicly.