I hacked my own site with agentic AI. You probably should too.

If Claude Code with a few markdown files and browser access can hack your app in ten minutes, what do you think someone with actual bad intentions could do in two weeks?

That was the uncomfortable thought I had this morning while drinking coffee and eating French toast.

I have been using agents more in my normal development workflow. Not as fully automated GitHub robots that disappear and come back with PRs I do not understand, but more like another set of hands inside my own process. They read code, make changes, check the browser, run specs, and then I review the work like I would with any other developer.

I wrote more about that split in If it’s obvious an agent did the work, you’re doing it wrong.

This morning I pointed that same workflow at security.

I created a small pentesting Claude Code agent setup for ActionVFX: a few markdown files, instructions, and skills that helped it crawl a local version of our backend API and frontend site. I wanted it to behave more like a curious attacker, e.g.
Try endpoints.
Follow weird paths.
Compare similar flows.
Write proof specs.
Run them.
Bring back something I could verify.

It found real issues.

The most major issue was that our backend API would let a logged-in user download paid ActionVFX collections without first buying it. The normal download path checked ownership. Another path looked similar, but skipped the important part. Seriously, no bueno.

The agent gave me the endpoint, the request, the proof spec, and the passing test showing the bug existed.

Then I let my developer agent take a swing at fixing it. I reran the proof after the change, and the exploit was gone. The app behaved the way it should have behaved in the first place.

Honestly, I felt better after that.

The best part was the output. The pentest produced a markdown file for each exploit, written clearly enough that I could feed it directly into our project workflow as specs for the dev agent. Each file had the issue, the proof, the affected flow, and enough fix direction to turn the finding into work.

This was all less than an hour of work and not a whole lot of tokens. I think everyone should give it a try!