June 4, 2026 · 8 min read
Until recently, using generative artificial intelligence was quite similar to making a query. We asked a model for something —a summary, an idea, an email, an explanation, a piece of code— and the model answered. Better or worse, but it answered.
The conversation ended there.
With agentic AI, things change. And they change quite a lot.
We are no longer talking about a system that simply responds, but about one that can receive an objective, break it down into steps, choose tools, execute actions, check whether it is on the right track and correct itself when something fails. In other words: it starts behaving less like a sophisticated calculator and more like a digital collaborator with a certain degree of autonomy.
The simplest comparison is this: a traditional LLM gives us a recipe; an agent gets into the kitchen.
It checks which ingredients are available, decides where to start, tastes the sauce, realizes it needs salt and adjusts the dish before serving it.
It sounds good. And it is.
But there is a catch.
Because many companies are entering this new stage with too much enthusiasm and too little preparation. They see agents as a quick way to automate tasks, reduce costs or speed up processes. But agentic AI is not simply “more automation”. It is not a process robot with natural language. It is a different architecture, a different way of thinking about workflows and, above all, a different way of assuming risk.
That is where the uncomfortable truths begin.
An agent is not defined simply by answering well.
Previous models already did that.
What makes the difference is its ability to look at what is happening, evaluate whether the strategy is working and change the plan when the results are not good enough. This idea, known as reflection and adaptation, is one of the most important pieces of agentic AI.
Traditional automation works like a railway track. If this happens, do that. If this data appears, execute this action. Everything is planned in advance.
An agent, on the other hand, can move through less rigid scenarios. It can try one path, see that it does not work, go back and look for another. It can consult information, interpret partial results, ask another tool for help or rethink the sequence of work.
That is very powerful.
And also quite uncomfortable.
Because as soon as a system starts taking different paths depending on the context, a serious problem appears: lack of predictability. An agent can solve two similar tasks in different ways. It can get one right and go off track in another. It can look brilliant during a demo and become clumsy when the real case is full of nuance.
This does not mean the technology is useless. It means we need to know where to place it.
Current agents still have clear limits, especially when a task requires many layers of chained reasoning. They work well when the objective is bounded and the environment is reasonably controlled. But if we ask them for too much autonomy in complex processes, we may end up with decisions that are difficult to explain.
And in a company, what is difficult to explain usually becomes difficult to defend.
There is a very common trap here.
An organization has a slow, manual process full of exceptions, strange approvals, crossed spreadsheets and decisions that depend on “ask Marta, she knows how this works”. Then someone appears and says: “We can put an agent on it”.
Careful.
Because if we put AI on top of a badly designed process, the most likely result is a badly designed process… but faster, more opaque and with greater capacity to create trouble.
Agentic AI should not be used to copy what we already do. It should force us to ask whether it still makes sense to do it that way.
That is the point.
It is not about looking at all the steps a person follows and asking the agent to imitate them one by one. That would waste the opportunity. What is interesting is redesigning the flow while thinking about what AI can do differently: read large volumes of unstructured information, detect patterns, coordinate tools, generate drafts, classify cases, summarize contexts and propose actions.
Before building an agent, it is worth looking at the process with a certain amount of constructive mischief.
Because an agent does not magically fix a bad operation.
Sometimes it accelerates it.
And that can be worse.
Some people think that building agents is mainly about data scientists testing models.
Partly, yes.
But only partly.
When we want to bring agents into production, the problem stops being only “which model do we use” and becomes “how do we make all this work safely, traceably, maintainably and at a reasonable cost”.
That is where traditional software engineering comes in. The kind that does not always shine in presentations, but prevents fires.
A real agent needs to integrate with internal systems, query databases, call APIs, use tools, manage permissions, store states, recover from errors and leave a trace of what it has done. If something fails, someone must be able to answer very specific questions: which agent failed, at which step, with what information, what action it executed and how we can recover the process.
Without observability, we are blind.
And being blind with autonomous agents does not sound like the best plan.
In addition, when each team starts building its own agents, another problem appears: proliferation. Sales creates one. Finance creates another. Operations sets up three. Technology tests five. At first, everything looks like innovation. Then come duplications, contradictory versions, poorly defined permissions and costs that nobody can explain.
That is why organizations will need internal agent catalogs, common standards, reuse mechanisms and technical governance. Not to slow innovation down, but to prevent it from turning into chaos.
And then there is cost.
During the testing phase, almost nobody looks at it too closely. After all, we are learning. But agents can make many calls to models, use long context, execute intermediate steps and repeat attempts. All of that costs money. In some cases, it may even be more expensive than a well-organized human operation.
The question will not only be whether the agent works.
It will be whether it pays off.
An agent without memory can be useful, but it falls short.
It performs a task, answers a query, completes an interaction and forgets. For many cases, that is enough. But if we want agents that truly help an organization, we need them to remember things: context, previous decisions, preferences, patterns, exceptions and ways of working.
Memory is what allows us to move from a one-off tool to a digital collaborator with continuity.
We can think of several types.
Short-term memory is used to maintain the thread of a task while it is being executed. It is the notebook we use while we are working.
Long-term memory is another story. It includes stable facts about the company, relationships between concepts, past cases, decisions made and learned procedures. In other words, knowledge that should not disappear when the session is closed.
Semantic memory stores information about what each thing is and how it relates to the others. Customers, products, policies, departments, contracts, internal rules.
Episodic memory preserves previous experiences. Similar cases, past conversations, solutions that worked or did not go so well.
And then there is procedural memory, which will probably be the most valuable. It is the memory of “how we do things here”. Not what the manual says, but what really happens when there is an exception, an urgency or a decision with nuance.
That has enormous value.
But it also has risks.
Because if an agent remembers badly, it learns badly. If it stores sensitive information where it should not, we have a problem. If anyone can contaminate its memory, the system may start behaving in strange, biased or directly dangerous ways.
Memory is not just a technical feature.
It is a responsibility.
Full autonomy looks great in a demo.
In real life, it is better not to get too excited.
As long as agents are not completely consistent, explainable and reliable in complex tasks, we need to keep human supervision in sensitive decisions. Not in all of them, because then we would lose much of the value. But yes in those that affect money, contracts, customers, employees, reputation, regulatory compliance or security.
An agent can prepare a recommendation.
It can draft a response.
It can analyze information.
It can propose an action.
But it should not always execute it alone.
The human in the loop is not a brake. It is a guardrail.
The key is to design it well. When should a person intervene? What information do they need to see? What can they approve, reject or modify? What gets recorded? How does the system learn from that correction?
If we do not define this, human supervision becomes a useless formality. And if we define it well, it can become a very valuable part of the system.
There is also privacy.
Agents work with context, and context can contain everything: emails, internal documents, conversations, personal data, strategic decisions, conflicts, doubts, commercial information or customer details. Even if we remove names or direct identifiers, patterns may still remain that allow sensitive information to be inferred.
A piece of data does not need to shout to be delicate.
And then there is the big question: if an agent makes a serious mistake, who is responsible?
This is not a theoretical question. It will appear. And it is better to have an answer before the problem happens, not after.
Agentic AI can transform many business processes. Really.
It can reduce repetitive work, coordinate systems, analyze scattered information and help make better decisions. But it will not do so simply because we connect a powerful model to several tools and call it an “agent”.
That would be too easy.
The value will come from good design. From choosing the right processes. From measuring results. From controlling costs. From observing every execution. From protecting memory. From defining clear limits. From knowing when the agent can act alone and when it must raise its hand.
We are entering a stage in which some tasks will no longer be executed only by people or rigid automations, but by systems capable of reasoning, testing, correcting and learning within certain margins.
That opens up many possibilities, but it also forces us to be more serious.
The question is not whether we can put agents into our processes. We probably can.
The important question is another one:
Do we want autonomous intelligence to inherit our processes as they are today?
Because if what we have is disorder, poorly managed exceptions and unclear decisions, AI will not magically turn it into operational excellence.
It will only make it faster.
And sometimes, faster is not better.