The AISI Arc
Coherence as the Missing Dimension of AI Safety
The UK’s AI Safety Institute (AISI) has rapidly become a global centre of expertise, revealing the hidden capabilities and unexpected risks of frontier AI systems. Their work is essential — but it leaves one critical question unanswered: What stabilises intelligence once it leaves the lab and enters the world?
Across frontier testing and applied deployment, the same pattern emerges:
- AI systems are powerful, but not coherent.
- They can generate fluent answers, but cannot regulate their behaviour.
- They can assist users, but cannot stabilise the human–machine feedback loop.
- They can process vast information, but cannot maintain alignment across shifting contexts.
This two-part series proposes the missing dimension: a coherence layer — an external behavioural architecture that stabilises both model output and societal interaction.
- Part I examines frontier risk: why internal guardrails cannot prevent jailbreaks, poisoning, or behavioural drift — and why coherence must sit outside the model.
- Part II explores applied AI: how incoherence manifests in society today, and why stabilising the human–machine interface is essential for national resilience.
Taken together, these papers outline a practical pathway for integrating coherence alongside AISI’s mandate, completing the safety picture both inside and outside the model boundary.
Reading Pathway
-
1
Why AI Models Need a Coherence Layer
AI safety today is centred on frontier-risk research — understanding the internal capabilities and hidden behaviours of the most advanced models. AISI’s work has demonstrated real dangers: biological knowledge beyond expected boundaries, jailbreaks that bypass safeguards, and the possibility of training-data poisoning that can hijack model behaviour.
-
2
Applied AI, Societal Stability, and the Case for a Coherence Layer
While frontier-risk research examines what AI models could do, applied AI shows us what they are already doing - reshaping society faster than human systems can adapt. We are now seeing measurable instability across cyber-security, youth mental health, political persuasion, fraud, identity manipulation, and the collapse of informational trust. These are not failures of model architecture. They are failures of interaction.