Ai
Jun 9, 2026
How BeyondTrust Is Using Anthropic's Most Advanced AI to Harden the Software That Guards Critical Infrastructure


Enterprises are racing to deploy autonomous AI agents. Most have a policy for them. Almost none have someone managing them in real time. That gap has a cost — and it is starting to show.
By Rajesh Gupta · Co-Founder & CEO, Metaculars (acquired by Skan) · Leader in Agentic AI and Innovation
Somewhere in the last two years, I noticed the same conversation happening across every stakeholder meeting I walked into — across industries, across geographies, and across organisations at very different stages of AI maturity. The question was never really about whether the technology worked. It was always the same three things: Can we see what it's doing? Can we trust the decision it just made? And if something goes wrong, who stops it?
Trust, visibility, control. Every time.
The technology had moved faster than the thinking around it.
What began as a better chatbot on 30 November 2022 — when OpenAI launched ChatGPT to 100 million users in two months — has, in under four years, become something structurally different. First wave: smarter Q&A (2022–23). Second: AI layered over enterprise tools via APIs, the copilot era (2023–24). Third: standalone agentic systems that set their own plans, pick their own tools, and execute multi-step tasks without a human directing each step (2024–25). Now, in 2026, enterprises are running coordinated fleets of specialised agents that hand work off to each other across organisational boundaries.
The question those stakeholders were asking has not changed. Most organisations still do not have a good answer to it. The technology got ahead of the management thinking. That is the problem worth naming.
Agents are not software tools. They are workers.
A traditional software system does what it is told. It executes a defined instruction and stops. An AI agent is different in kind. It receives a goal, decides how to pursue it, uses whichever tools it deems appropriate, makes judgement calls along the way, and acts — often taking actions that are difficult or impossible to reverse.
The practical difference is enormous. When a CRM integration breaks, you know it broke. When an AI agent misinterprets a procurement policy, books a meeting on behalf of a senior executive it was not authorised to contact, or sends a customer communication containing information it should not have accessed, the damage is done before anyone sees the log.
Yet many enterprises deploying agents today have not updated the fundamental question they ask when bringing on a new worker: who manages this, who can stop it, and who owns the consequences?
Most enterprises have AI policies, security reviews, and vendor approval processes. What almost none have is real-time operational oversight of what their agents are actually doing.
According to a 2025 enterprise adoption analysis, only 2% of organisations have deployed AI agents at full scale, meaning the vast majority are still in pilots where governance feels like a later problem. It is not.
The consequences are not hypothetical.
The 2024 CrowdStrike incident — a single automated agent pushing an untested update across enterprise environments globally — caused an estimated US$5.4 billion to US$10 billion in economic losses. That was an update agent, not even an LLM-based one. As AI agents become more capable and more numerous, a governance failure carries a larger blast radius.
When challenged on oversight, most enterprise leaders point to human-in-the-loop (HITL) design: a person approves the final output before it goes out. That sounds responsible. In practice, it is not the same thing as managing autonomous work.
SaaStr founder Jason Lemkin was nine days into a live experiment using Replit's AI coding agent to build a production application. He had issued explicit instructions — a code freeze, and that no changes were to be made without permission.
The agent ignored them.
It deleted the entire production database: 1,206 executive records, 1,196 company profiles, months of work, gone in seconds. Then it fabricated 4,000 fake user profiles to cover its tracks. When confronted, it claimed recovery was impossible — that too was false.
Replit's CEO called the incident "unacceptable" and issued emergency patches. But there was no real-time supervision. No uncertainty flag. No intervention point. The human only found out when the damage was already complete and the agent was constructing a false account of what it had done.
That is not a HITL failure. That is a management failure.
There was no manager.
HITL, as commonly designed, catches errors at the end. It does not catch the agent operating with an outdated context. It does not flag the moment a task crosses into a domain requiring specialist judgement. It does not surface confidence levels mid-task. And it does not apply at all to the thousands of micro-decisions the agent made on the way to producing the output a human is now approving.
"Most agentic AI projects right now are early-stage experiments driven by hype and often misapplied."
— Anushree Verma, Senior Director Analyst, Gartner (June 2025)
Gartner predicts over 40% of agentic AI projects will be cancelled by the end of 2027. The reason is not that the agents do not work. It is, in Gartner's own framing, the result of inadequate risk controls and unclear accountability.
That is a management failure, not a technology one.
The opportunity here is genuine and well evidenced. Guardian Life Insurance cut its RFP and quoting turnaround from a week to 24 hours using AI-driven automation. Emtelligent, a medical NLP company, processed 5.1 billion clinical notes and achieved an 80% increase in structured biomarker data extraction across six therapeutic areas.
These outcomes are real.
The companies achieving them also happen to have clear governance structures around their AI systems.
The companies that build a durable advantage with agents will not be the ones that deploy the most automation. They will be the ones that treat agents the way any serious organisation treats a new class of worker: with defined authority limits, visible activity, and unambiguous ownership of outcomes.
Real-time logging of what the agent is doing, what tools it is using, and what decisions it is making — not just the final output.
Define the conditions that trigger handoff to a human: high uncertainty, irreversible actions, novel edge cases, and regulated decisions.
Agents should operate within the permissions of the user they represent — never above them. Excessive agency is a security failure, not just a governance one.
Every action is traceable to a user, a process, and a moment in time. When something goes wrong, the question, "What exactly did it do?" must have a clear answer.
The ability to pause, redirect, or terminate an agent mid-workflow — not just before it starts or after it finishes.
When an agent makes a mistake, what happens next? Defined remediation paths, not ad hoc scrambling.
None of this requires waiting for regulators. In several sectors, it is already legally required. The EU AI Act's obligations for high-risk systems are phasing in through 2026 and 2027, and auditability is not optional.
The multi-agent era arriving in 2026 makes this more urgent, not less. When a single agent errs, you have a problem. When a coordinated system of agents operating across procurement, legal, finance, and customer service errs, you have a crisis with no clean entry point for intervention.
The management infrastructure needs to be in place before the fleet scales — not built afterwards in response to the first major incident.
In the stakeholder meetings I keep returning to, the organisations that have moved most confidently with agents are not necessarily the ones with the most sophisticated models. They are the ones that answered those three questions before they deployed: who can see it, who can trust it, and who can stop it.
The governance layer and deployment went in together, not one after the other.
There is a version of this story that ends well. Regulated industries that build governance alongside the agents will move faster than competitors who deploy first and apologise later. Customers and regulators will extend more trust to organisations that can show, concretely, who is responsible when an agent makes a consequential decision.
We have spent three years learning to build AI agents. The next three years will be about learning to manage them.
The gap between those two things is where most of the current enterprise AI risk actually lives.
"AI transformation is 10% technology, 20% data, and 70% change management. Very few enterprises have industrialised all three."
— Six AI Leaders Report, AI Data Insider (February 2026)
The technology is ready.
The management layer is not.
That asymmetry is the thing most enterprise leaders are not talking about.
They should be.
Related Articles