Blog | TestinGil

When “Good” Becomes “Not Good Enough”: Why Your AI Tests Go Stale

November 11, 2025

So, you did it. You built a fantastic “AI Testing Kayak.” You followed the AI Quality Funnel. You have developer tests, sanity checks, and a “Golden Dataset” that defines “good” responses. You even have an “Automated Scorecard” that runs in… Continue Reading…

Uncategorized

The AI Quality Funnel: A Core Methodology for AI Testing

November 9, 2025

Most people think of testing an AI feature like testing a chatbot – you assess the quality of a single response. But in our real-world systems, it’s almost never a single response. We build chains of calls. The response from… Continue Reading…

Uncategorized

The “Beyond the Basics” AI Testing Webinar

November 14, 2025

Last week, I published our new strategic model: Are You Building an “AI Testing Kayak” or an “AI Testing Ship”? That post explained the “why.” It defines the “Kayak” as the collection of fast, practical tactics for the practitioner in… Continue Reading…

Uncategorized

If AI Is So Smart, Why Can’t It Grade Its Own Homework?

November 7, 2025

If AI is so smart, why can’t it tell me how much I can trust its answers? Well, maybe it can… We talked about creating “Golden Datasets” – benchmarks that help us evaluate an answer from an AI model. (If… Continue Reading…

That's the worst ship I've ever seen. It's more like a kayak

Uncategorized

Are You Building an AI Kayak or an AI Ship? A New Model for AI Quality

November 7, 2025

Ahoy, mateys! In our last webinar (recording is coming soon), “AI Testing: Beyond the Basics” we did something amazing: we built a functional “AI Testing Kayak.” We created a frameworkf for testing AI products, that’s fast, practical, and, most importantly,… Continue Reading…

Uncategorized

Your AI’s Answers Are Subjective. Your Tests Don’t Have to Be.

October 16, 2025

Let’s talk about AI responses. You ask your genie a question, you get an answer. But is it a good answer? How do we evaluate a response that isn’t deterministic, isn’t repeatable, and might have some nuance? LLM answers are… Continue Reading…

Uncategorized

It’s Not Your Tests, It’s Your Testability

October 28, 2025

Let’s talk about that test. The one that’s always flaky. The one that takes twenty minutes to run and fails for a different reason every time. Your first instinct is to blame the test. Maybe the locator is wrong, maybe… Continue Reading…

Uncategorized

Your CEO is in the AI Arms Race. Your Job is to Provide the Guardrails.

November 7, 2025

Let’s talk about the new math of the AI revolution. The promise is simple and seductive: your developers can now deliver five times faster. Your CEO loves it. Your product managers are ecstatic. The whole company is high on the… Continue Reading…

Uncategorized

Your AI is a Genie, Not a Guru: How to Supervise Your New Assistant

September 30, 2025

Let’s talk about our new AI assistants (soon to be overlords). After all, AI in software testing is all about help. Everyone’s excited about the promise: they can boost our productivity, generate tests, and free us to do the real… Continue Reading…

Meme: Disaster girl Ran a couple of prmopt just to be sure.

Uncategorized

Intro to AI App Testing – “From Guesswork to Confidence” Webinar recording

September 18, 2025

Testing AI applications can feel unpredictable. You might run the same prompt twice and get two different, valid answers, which raises the question: how do you build reliable automated tests for that? AI app testing is a modern challenge. If… Continue Reading…

Search