Contents
TEREO, a proof layer that shows the truth of change
When AI changes code, TEREO checks whether the change is really better and keeps only the gain that beats the noise.
TEREO is a proof layer for AI coding that fixes one small promise at a time, resists goal drift, and leaves receipts so only gain that beats noise becomes the next baseline.
- TEREO fixes the round around scope, promise, check, and baseline before the edits begin.
- When a new problem appears, it asks whether that problem makes the current promise false before widening the task.
- A receipt records what was tested and why the verdict became keep, hold, or drop.
When AI changes code, TEREO checks if that change is really better.
keep only if gain > noise
Repo: https://github.com/kim-woojoo/tereo
What does this do?
Let me explain it in a simple way.
In normal AI coding, AI changes code, runs it again, and changes it again.
- The main agent edits the code.
- A sub-agent reviews it again.
- A harness or another verification layer may check it one more time.
The problem is this:
If there is no fixed standard in this process, the result may look good, but it is hard to know if it is really better.
So many kinds of errors happen.
User error
For example, let’s say you made project A with many kinds of code in it.
The code inside the project has different roles.
- Some code is UI.
- Some code is discount calculation.
- Some code is payment.
- Some code is data flow.
- Some code is testing.
A skilled developer can read this structure and flow fairly fast.
But vibe coders or non-developers often cannot.
So every time they try to change something, problems like this happen:
- Where should I start?
- Which files are connected?
- What should I not touch?
- Is this change really going in the right direction?
Verification error
For example, think about an online shop project.
The user wants to improve the 10% coupon feature.
On the surface, it looks simple.
“Isn’t it enough if the 10% discount works?”
But in real life, it is not that simple.
For example, this can happen:
- The 10% coupon discount works.
- But the total price calculation breaks and becomes negative.
- Or the whole checkout flow breaks.
In this case, it looks like part of it worked, but in reality it is a false improvement.
In other words:
- If you only look at the one thing that improved, it looks like success.
- If you look at the core behavior of the whole system, it is failure.
Goal drift
When AI edits code, it keeps finding new problems.
For example:
- “Oh? The tax rounding looks strange too.”
- “Oh? The free shipping condition is a little off too.”
Old AI workflows easily make a compromise here.
- “Then let’s check tax too, not just the coupon.”
- “Let’s make the scope a little bigger and fix it all at once.”
- “We found the problem, so let’s fix it together.”
Then the win condition changes in the middle.
And when that happens, it becomes unclear if the original problem was really solved.
How TEREO solves this
TEREO solves these problems in a clear way.
First, let’s say we are working on project A.
TEREO looks through the whole codebase, makes you choose one small topic to work on now, and first fixes a small range for that topic.
In other words, instead of trying to fix everything at once, it makes you decide only the small truth for this round.
And it does not only ask:
“Did it get better?”
It also asks:
“Is there any core breakage that makes that improvement false?”
Also, just because a new problem is found, it does not move the standard.
It closes the current truth first, and pushes the next truth to the next round.
Let me explain that in more detail.
Example: project A
Let’s use an online shop project as an example for project A.
Here, what the user wants is “improve the coupon discount.”
TEREO finds the code related to coupons in the whole codebase.
cart.pydiscount.py- a few related tests
This is the scope.
So this group of code becomes topic A.
Then, for topic A, it sets the goal of what truth should be made true now.
This is the promise.
For example:
- The item price is $10.
- If a 10% coupon is applied, the total should be $9.
Then it sets the rule or command that will judge that goal.
This is the check.
For example:
- Check if the coupon result is $9.
- Check if the total is not negative.
- Check if the old payment tests still pass.
Now run this check on the current code.
Result:
- The current coupon test fails.
- The checkout core is not broken in this process.
- A negative total appears.
This result becomes the current state.
That is the baseline.
So this baseline is fixed as the current state of topic A.
To sum it up:
promise= What must be true? In other words, the goal.check= How will we test it? In other words, the rule for judgment.baseline= The current starting state. In other words, where we are now.
And now comes the key part.
The TEREO round 1 loop
Now that the baseline is set, it is time to start editing.
This is where the TEREO round 1 loop starts.
The goals are:
- Make the current coupon test pass.
- Remove the negative total.
- Keep the checkout core safe during the change.
Changes are made in a safe way.
If possible, change one file, or one small group of files, at a time.
After the change, run the check.
What is the result?
- No negative total. Success
- Checkout core stays safe during the change. Success
- But the coupon test still fails.
Then edit the file related to the failed coupon test, for example coupon.py, and fix the discount logic.
Run the check again.
Repeat until goals 1 to 4 succeed.
And if they succeed?
This is where the judge works.
- Is the change too weak or unclear? Then hold.
- Is there no real improvement compared to the old code? Then drop.
- Is the improvement clear? Then keep.
How?
Leave a receipt.
Then that kept receipt becomes the new baseline.
That is the end of the TEREO round 1 loop.
Let me explain this result a little more.
What the judge and receipt do
The judge is, simply, the part that tests if this change is really a good change.
Using the promise and the baseline as the standard, it checks many conditions and decides:
- keep
- hold
- drop
Why leave a receipt?
Because later, when you change code again, it becomes very useful evidence.
A receipt stores things like this:
scopepromisebaseline.idcheckbefore/after metricevidenceverdictwhyconfidencewin_probabilitytrace
In simple words:
- What was the goal?
- What was tested?
- Why did it become keep, hold, or drop?
So a receipt is:
- evidence for deciding what to trust
- and a record of why that decision was made
This matters because it stops AI from quietly changing the standard and making excuses for itself.
If this part makes sense, now it gets important.
When a new problem appears
For example, while fixing the discount logic in coupon.py, a new problem appears.
For example:
- “Oh? The tax rounding looks strange too.”
- “Oh? The free shipping condition is a little off too.”
The problem with old AI workflows is this:
When they find a new problem, they merge the original problem and the new problem into one task.
For example:
- The original goal was to fix the 10% coupon discount.
- But during the fix, a tax rounding problem is found.
Then old AI workflows say:
- “Then let’s fix tax too.”
- “Let’s fix free shipping too this time.”
- “Let’s clean up the whole flow at once.”
On the surface, that looks smart.
But in reality, it is dangerous.
Why this is dangerous
- The original goal becomes blurry.
- The success condition changes in the middle.
- It becomes unclear what really improved.
- While fixing the new problem, the old problem may stay half-fixed and unclear.
In the end, what remains is only this:
- “A lot changed.”
But what does not remain is this:
- How it changed
And the biggest problem is self-justification.
You cannot clearly tell:
- Was it really solved?
- Or was it just covered up?
The smarter the AI gets, the more dangerous this becomes.
TEREO goes straight into this problem.
The round 1 goal stays the same until the end:
“Was the coupon bug fixed?”
Why?
Because it was fixed as the promise.
Then what about the tax rounding problem?
The judge asks this:
“Does this tax problem make the coupon success false?”
- If the answer is Yes, it stays in the same round and gets fixed there.
- If the answer is No, it is written as the next topic, topic B.
For example, if the answer is No, round 1 ends successfully.
Then round 2 starts with topic B:
“tax rounding”
When you change one file at a time, the result may be small:
- 2%
- 1.5%
If you look only once, it can feel like:
- “Maybe it got better.”
- “Maybe not.”
But if you look across many rounds:
- round 1
- round 2
- round 3
- …
then the bigger result starts to show.
Closing
TEREO’s rule is simple.
- Fix one small goal.
- Judge it again and again by the same standard.
- Keep only the real gain that beats the noise, and leave that as the next reference point.
In other words:
keep only if gain > noise
It can work in any area where improvement is possible.
The really sad thing is that TEREO has one condition:
- Computer required
If that one condition is met, you can attach it anywhere.
- many projects
- many kinds of work
- many experiments
- many automations
- many hobbies
Will TEREO become useless as AI keeps getting better?
No.
It will become even more needed.
As AI gets stronger, it makes more changes, faster, and gives more plausible reasons.
So the really important question is no longer:
“Can it change things?”
It becomes:
“What should remain?”
That is where TEREO becomes more important.
Because among the many changes made by AI, it becomes the proof layer that keeps only the real improvements that truly lead to results.
Now I am curious about your case.
- What project did you attach it to?
- What kind of work did you use it for?
- What hobby or experiment did you apply it to?
- Which changes were kept?
- Which changes were dropped?
Please share.
As more cases build up, TEREO can grow by leaving one small truth at a time.
Just like the philosophy of TEREO.
FAQ
Why not fix every nearby problem at once?
The moment the task widens, the win condition changes and it becomes harder to tell whether the original improvement was real. TEREO keeps only the problems that would make the current promise false inside the same round.
What is the difference between `check` and `judge`?
`check` is the rule and command that tests the goal, while `judge` is the decision layer that turns the result into keep, hold, or drop.
Why does TEREO leave a `receipt`?
Because later, when code changes again, the receipt becomes evidence for what was trusted, what was tested, and why the result was accepted, held, or dropped.
What to read next
Recommended from the current topic, reading completion, and prior engagement signals.
A vector database is a memory device that searches by semantic distance
A vector database is a memory device born to search the world not by exact values but by semantic distance.
The problem that comes before AGI is memory
Why what matters before model intelligence is a memory layer that reconnects sessions and working state.