This Isn’t Just an Article. It’s an Interactive RAG Workbench.

I did not want to open this series with a static technical article full of terms, architecture diagrams, and pipeline boxes.

Because the most interesting thing about RAG is not how many diagrams you have seen, or how many new labels you can repeat back. The interesting part is this: when a real question comes in, how does the system actually move, where does it make decisions, where does it stay conservative, and where does it spend extra effort for quality?

So for this first piece, I wanted to invert the usual order.

Not definition first.
Not architecture first.
Not lessons first.

Instead, I wanted to put a real interactive RAG workbench in front of you.

You can ask it a live question.
You can switch query_mode.
You can decide whether this run should turn on workflow, tool routing, or retry loop.

And when you send the request, you will not only get an answer. You will also see:

how the planner rewrote the question
which retrieval path it took
whether workflow actually entered the run
whether tool routing was activated
whether the retry loop really retried
what the citation checks, runtime budget, and observability layers recorded along the way

That is also the attitude I want to establish for this entire series:

Production RAG is not just a system that answers questions. It is a system whose path, constraints, and reasoning surface can be inspected.

Before we judge the answer, we should look at the path

If you think of RAG as “retrieve some documents, send them to a model, and return the answer”, you will hit a limit very quickly.

You might know whether the answer sounds correct.
But you still do not know how the system arrived there.

That becomes a real problem the moment the system starts moving toward production.

You begin to care about questions like:

Was this request running in fast or creative mode?
Why did the planner rewrite the query?
Why was workflow not used?
Why did the retry loop stop without retrying?
Why did citation checking fail even though an answer was still returned?
Why did the runtime budget prevent a more expensive route?

That is where a production RAG system really diverges from a toy one.

What this Part 1 is doing inside the larger series

This is not the overview article, and it is not the foundational teaching article either.

It is better understood as the entry workbench for the entire series.

You can think of it as:

a live demo
a RAG trace viewer
a production controls showcase
a shared reference point that later articles can keep returning to

The later pieces will unpack the design behind these capabilities:

why we need multiple query_mode paths
why workflow should not simply be on by default
when tool routing is actually worth introducing
how retry loop changes both answer quality and latency
why citation check, faithfulness, and runtime budget are not optional if you care about production RAG

But before we get there, I would rather let you watch the system once with your own eyes.

Watch what a real request actually does on its way through.

How to use this workbench

There are three ways to read this page.

1. As a normal conversational interface

Just ask questions and feel the difference between modes.

2. As a runtime control panel

Switch:

query_mode
use_workflow
use_tool_routing
use_retry_loop

Then observe how the request path changes.

3. As a trace viewer

Do not stop at the final answer.
Focus instead on:

how the planner rewrites the question
how many sources retrieval actually returned
whether workflow, tool routing, and retry loop were active
how citation checking and runtime budget shaped the result

If this piece helps you develop an intuition for those layers, then this series has started in the right place.

In the next article, I will return to the more basic — and more often misunderstood — question:

is RAG really still just chunking plus embeddings?

This Isn’t Just an Article. It’s an Interactive RAG Workbench.

This is not just an article, this is a live RAG workbench.

Ask the RAG a real question

Query mode

Advanced Runtime controls

Final answer

Request journey

Planner view