Spock infinte exploration

Uncategorized

The Bigger Infinty: Exploring the Exploration Gap

I want to talk about our testing efforts for AI features, but I’ll skip speaking to your conscience, which is already worried about quality. I know you’re losing sleep. This time, I want to look at it from a different… Continue Reading…

Uncategorized

Speaking In Tongues: How to Professionally Explain AI “Weirdness” to Your CTO

Last week, we talked about why debugging may be frustrating. But that’s only half the battle. When you don’t understand the magic (because, hey, it’s magic), how do you explain it to your bosses? But let’s say you’ve actually done… Continue Reading…

frustrated boromir: One does not simply debug AI

Uncategorized

Finding a Needle in Many Haystacks: Why AI Debugging Is So Frustrating

In the old days, like five years ago, debugging was a easy. Just kidding. But if you had the right tools in place, it could be easy-ish. We looked at the logs. We found a stack trace. It pointed us… Continue Reading…

Uncategorized

Stop Logging “Hallucinations”: A Better Taxonomy for AI Failures

Let’s talk about your bug reports. Specifically, let’s talk about the word “Hallucination.” If you are testing AI features today, I bet your Jira is full of tickets that say, “The model hallucinated.” How can I put it? Not really… Continue Reading…

Uncategorized

Stop Apologizing for Flaky Tests

Let’s talk about the dirty secret of our industry. We’ve all been there. You come in on a Monday morning, check the nightly run, and see red. Again. Not “we broke the build” red, but that annoying, flickering, “it worked… Continue Reading…

Uncategorized

“Charting the Course” Webinar Recording: AI Quality Needs Better Loops

The recording of Webinar III in the AI Quality series is up! In our previous sessions, we focused on the basics of AI testing. We learned how to paddle the “AI Kayak” in the safe harbor of development. But eventually,… Continue Reading…

Uncategorized

It’s not as bad as you think: Using scorecards in AI testing

Who doesn’t like asserts? We have a habit of confusing “simple” with “easy.” In traditional automation, defining quality was simple. It was binary. Assert.AreEqual(expected, actual). It either matched, or it didn’t. Green or Red. But with AI, “Good” isn’t binary.… Continue Reading…

Uncategorized

When “Good” Becomes “Not Good Enough”: Why Your AI Tests Go Stale

So, you did it. You built a fantastic “AI Testing Kayak.” You followed the AI Quality Funnel. You have developer tests, sanity checks, and a “Golden Dataset” that defines “good” responses. You even have an “Automated Scorecard” that runs in… Continue Reading…

Uncategorized

The AI Quality Funnel: A Core Methodology for AI Testing

Most people think of testing an AI feature like testing a chatbot – you assess the quality of a single response. But in our real-world systems, it’s almost never a single response. We build chains of calls. The response from… Continue Reading…

Uncategorized

The “Beyond the Basics” AI Testing Webinar

Last week, I published our new strategic model: Are You Building an “AI Testing Kayak” or an “AI Testing Ship”? That post explained the “why.” It defines the “Kayak” as the collection of fast, practical tactics for the practitioner in… Continue Reading…