New Paper: Constraint Geometry as a Framework for Belief Arbitration Under Uncertainty

walshpharmd
Jun 1
6 min read

Picture two people glancing at a bowl of fruit. Both say "that is an apple," and both are correct. The first saw it plainly, in good light, from a foot away. The second caught a half glimpse in shadow and filled in the rest from memory and habit. Their reports are identical. Their confidence may even feel identical. But if you pressed them, or changed the lighting, or told them the bowl sometimes holds wax fruit, the two would behave very differently. The difference is not in what they concluded. It is in how that conclusion was built, and whether anything about that construction is still available when it is time to act.

This gap shows up everywhere. Two medical pipelines can output the same diagnosis with the same confidence score and then fall apart in completely different ways when the data drifts or a sensor starts lying. Two forecasts can agree on the number and disagree entirely on how much weight that number deserves. The pattern is general. A system can hold on to what it believes while quietly losing access to why that belief is trustworthy, fragile, or worth a second look.

The Constraint Geometry paper is about that loss, and about what it would take to prevent it.

The trouble with a single number

The usual fix for fragile belief is to attach a confidence score. The answer is "apple," and the confidence is 0.9. This helps, but it hides a serious problem. Confidence is one number, and one number has to stand in for many genuinely different situations.

Consider two ways of arriving at the same modest level of certainty. In the first, two independent sources both lean weakly toward the same conclusion. Neither is strong, but they agree. In the second, one source is strongly for the conclusion and another is strongly against it, and the two roughly cancel. Run the arithmetic and these can produce the exact same confidence value. Same answer, same number.

Yet they call for opposite actions. When evidence is weak but agrees, the sensible move is to gather more of the same kind of evidence. When evidence is strong but conflicts, gathering more of the same is close to useless. What you want instead is to step outside and check against an independent source, or to hold off committing at all. A system that sees only the answer and the number cannot tell these cases apart, because it has already thrown away the one feature that distinguishes them. The shape of the disagreement is gone.

That is the core observation. Confidence is a heavy compression of something richer, and the thing it compresses away is often exactly the thing that should drive what you do next.

Three ways to hold a belief

The paper sorts this into three levels of how a system can arbitrate, which is the word it uses for converting incoming evidence into a commitment and a plan of action.

The first level keeps only the answer. Just "apple." Nothing about how it was reached.

The second level keeps the answer plus a confidence number. This is where most engineered systems live today, and it is where the weak-agreement versus strong-conflict trap bites.

The third level keeps the answer plus a small, compact summary of the structure behind it. The paper calls this a support code. It does not store everything. It stores the handful of features that actually change what you should do. Was the evidence weak or sharp. Did the sources agree or fight. Did it come from a channel you trust or one you do not. Has the reliability of that channel been shifting lately. A support code is meant to be small and cheap while still separating situations that the bare confidence number would have blurred together.

The claim at the heart of the paper is that there is a right amount of this structure to keep. Not the most possible, because preserving every detail is expensive and slow to learn from. Not the least possible, because that is how you end up blind. The goal is the smallest summary that is still enough for the decisions actually in front of you. The paper names this target support sufficiency. A belief is held well when its support code separates the situations that would call for different actions, and no finer than that.

Support does not have to be a label

A natural worry is that this asks the brain to carry around little tags of metadata stapled to every thought. The paper is careful to say it does not.

In living systems, the relevant structure can live inside the dynamics rather than in any explicit variable. The way different signals compete and settle, the way some channels get amplified and others damped, the felt sense of agreement or friction between what you see and what you remember, all of this can carry reliability information without anything ever being written down as a number. On this view, attention is something like a limited audit. It samples a few of these reliability features at a time, because not everything can be inspected at once. That also gives a clean reason for a familiar experience. The basis for a belief can feel rich and detailed and still be very hard to put into words, because only a thin slice of the underlying structure is ever available for report at any one moment.

Three things you can actually test

What keeps this from being a nice story is that it makes specific predictions, each one designed so that a plain confidence based account would expect no difference.

The first prediction is about matched accuracy. Take two situations where a system gets things right just as often and reports the same confidence, but where the evidence has a different shape underneath. The framework predicts the system will check, hesitate, and change its mind at different rates across the two, even though accuracy and confidence are matched. The conflict case should trigger more cross checking than the weak case.

The second prediction is about matched confidence with different kinds of uncertainty. Weak evidence, conflicting evidence, evidence from a shaky source, and evidence whose reliability has recently shifted can all produce the same confidence number while demanding different responses. The prediction is that the responses do in fact differ in measurable ways, even with the number held fixed.

The third prediction is the sharpest, and in some ways the most useful. Misleading support can do more damage than missing support. If you simply remove the reliability cues, a careful system falls back on caution. It defers, it verifies, it hedges. But if you leave the cues in place and quietly corrupt them, so that untrustworthy sources wear trustworthy labels, the system is driven confidently in the wrong direction. It commits hard, suppresses the very checking that would have caught the error, and is slow to recover. A poisoned channel can be worse than a silent one, because it does not starve the system. It hijacks it.

Why this matters beyond brains

None of this is specific to perception or to biology. It applies to any system that pulls together several sources, has to turn them into a commitment, and operates in a world where source reliability varies, drifts, or can be attacked. That description fits a large share of modern AI, where pipelines routinely surface a final answer and sometimes a confidence score while leaving everything downstream blind to whether that answer was robust, conflicted, or pulled from a source that should not have been trusted.

The practical upshot is a modest design doctrine. Keep a compact support summary alongside the answer. Use it to decide when to verify, when to defer, and when to quarantine. Update the mapping over time so it stays calibrated as the world shifts. And treat corrupted support as a first class threat rather than an afterthought, because that is the failure mode that does the most harm. The doctrine does not require systems to retain every upstream detail, and it does not promise more correct answers on easy cases. It promises better behavior under pressure, which is a different and often more important thing.

A note on order

A word for anyone following the wider program. Constraint Geometry is the conceptual starting point. It is where the move from "answer plus a number" to "answer plus support" is first laid out. Because of how hosting worked out, the companion paper that develops support sufficiency as a compression problem reached arXiv before this one reached SSRN, so the public posting order runs slightly ahead of the conceptual order. If you are reading the program as a sequence, this is the foundation the rest is built on, even though it is not the first thing to have gone up.

The aim of the program is deliberately structural. Build the road and mark the exits clearly, so that others can take the off ramps into perception, machine learning, expertise, institutions, and wherever else the same architecture turns out to apply. Constraint Geometry is the first stretch of that road.

Institute of AI and Neural Theory