A deeper look at the recent history of AI

By Alex Santos, CTO

A deeper look at the recent history of AI

Let’s start by demystifying how magical AI really is, which requires looking under the hood, and remembering that AI research, particularly neural networks and machine learning, has been around for decades.

The biggest change which enabled the Large Language Models (LLMs) revolution we have seen in the last few years comes from the transformer architecture, initially published by researchers from Google in 2017, which was the foundation for OpenAI to produce the first generative pre-trained transformer (GPT) model in 2018. We had many other neural network architectures before this, and while they showed promise, they were limited by computational cost, data efficiency, and inability to retain long-range context. Recurrent and convolutional networks could recognize patterns, but they struggled to model relationships across extended sequences of information. The transformer architecture solved this by introducing self-attention, a mechanism that allows models to weigh the relevance of every token in a sequence relative to every other token. That simple yet powerful idea unlocked scalability, enabling training on large data sets, therefore giving rise to models capable of generalised contextual understanding and multi-domain knowledge transfer.

Author:

Alex Santos

Chief Technical Officer at CyberIAM

This shift transformed AI from a niche research tool into a platform technology – one that now powers digital assistants, and increasingly, agents that can reason, plan, and act across digital systems. The real story of this revolution is not just text prediction; it’s the emergence of autonomous, goal-oriented agents capable of executing complex business workflows with minimal human intervention. And that’s where the enterprise impact truly begins – along with all the risks that it brings.

Under the hood, AI models don’t see words, images, or sounds – they see numbers. Every piece of input data is broken into tokens – units of text – which are converted into numerical representations called embeddings. During training, the model learns patterns and relationships between these tokens by processing massive amounts of data. Words that frequently appear in similar contexts end up close together in this numerical space.

To illustrate with a simple example of how this works: Even though cat and car are alphabetically similar, the model learns that cat is conceptually closer to dog because, in real text, they often appear in similar contexts - both associated with pets, animals, and actions like feed or walk. This process builds what’s called an embedding space, where meaning is captured by proximity between tokens. In that space, cat and dog are near each other, while car sits much farther away, reflecting semantic rather than lexical similarity.

The same principle applies to multimodal models – those that can handle not just text, but also images, audio, and video. Instead of words, these models break pixels or sound waves into numerical features and map them into the same kind of embedding space. This allows the system to find relationships across different data types: a picture of a cat and the word cat end up represented close to each other in that space, even though one came from visual data and the other from language. That shared numerical understanding is what enables today’s AI systems to generate captions for images, describe videos, or even answer questions about what they “see.” In essence, whether it’s language or vision, all modern AI models work by finding structure and meaning in numbers – not by understanding the world as humans do, but by recognising the patterns that define it.

For many people, their first encounter with modern AI came through impressive but narrow demonstrations: an LLM writing code from a short prompt, a chatbot drafting an email, or an image generator creating photorealistic scenes from a sentence. These moments triggered the perception that AI had reached not only sentience, but also a kind of omniscience – an all-knowing digital guru that could answer any question, solve any problem, or replace entire professions overnight. That illusion of completeness is reinforced by how fluidly these models communicate. When the output sounds confident and articulate, people instinctively trust it. Yet beneath that fluency lies a statistical engine, not a sentient thinker. AI doesn’t “know”; it predicts the next most probable element. This gap between convincing language, and genuine understanding, is what leads to the inflated, unrealistic expectations about what AI can actually do.

The Body and the Brain

When looking at AI solutions, there is a fundamental distinction that must be understood for businesses to successfully navigate their AI-adoption journey: the difference between the AI Models and AI Applications.

There’s consistent evidence, both anecdotal and documented, showing that many enterprise decision-makers and even technical leaders conflate the AI models with the end-user applications powered by those models. When asked about AI adoption, organisations often refer to the applications being rolled out, such as ChatGPT or Copilot (Mckinsey, 2024) – and it does make sense to measure AI adoption by the workforce effectively using such applications.

Why should the distinction even matter for decision makers, both executive and technical? It might seem irrelevant, or something that only AI-geeks would care about, but it matters for understanding how to use the technology to empower business transformations, as well as understanding and managing the risks brought implementing such technologies bring to the organisations. For example, conflating these two concepts could lead a business to focus on acquiring and deploying the front-end chat app, instead of the API-level integration of the model into internal systems or workflows. This also skews expectations – people assume that implementing a certain app will lead to bigger impacts than it actually does and become frustrated; when coming out at the other end, they then assume that the AI itself is the problem rather than how it has been implemented, and what specific problem was targeted.

When we refer to ChatGPT, Claude, Gemini, we are referring to the entire experience – the chat interface, UI, and output – not the underlying model that powers it. Many enterprise leaders see the face of AI but not its architecture. They see the body (ChatGPT) and assume it is the brain, when in reality the brain (GPT-5) can inhabit countless other bodies – copilots, agents, analytics platforms, customer tools – each tuned for a different purpose. Failing to understand the distinction is likely part of the problem on why organisations are seeing a rapid uptake on generative AI, but not necessarily a large-scale impact yet (Mckinsey, 2025).

Just as the human brain provides cognition and the body provides movement, AI has its own division of roles. The model is the brain – it contains the reasoning, knowledge, and understanding that makes intelligence possible. The application is the body – it senses, communicates, and acts on the model’s behalf, turning potential into observable outcomes. A system like ChatGPT illustrates this perfectly: GPT-5 is the brain that performs the thinking, while ChatGPT is the body that gives that intelligence a voice, an interface, and a way to interact with people. Enterprises that understand this distinction design their AI strategies accordingly – building strong, reliable “brains” and pairing them with purpose-built “bodies” that fit their specific workflows, security models, and user contexts. There is a strategic difference: when ChatGPT is being rolled out, it is SaaS. When GPT-5 is being deployed in Azure AI Foundry to be used by an Agent, it is Infrastructure.

The application is responsible for engaging with the model. It gathers the user input, sends it to the model, reads the output and decides what to show to the user. The model is only responsible for taking inputs and producing output tokens, always based on the next most-likely word given a context. The model itself does not remember any interaction, it has no concept of a conversation – that’s the responsibility of the application, which is responsible for implementing “memory”. ChatGPT stores the conversation and reconstructs the relevant conversation history for each new query, often through summarization, to give the model full context of what is expected to produce for an output. In a nutshell, ChatGPT creates the illusion of a conversation – but GPT-5 on its own will not remember anything.

With this in mind, it should be easier to understand where the security risks are. OpenAI is responsible for storing and securing the data on who you are and all conversations you had, same as Microsoft is responsible for Copilot. ChatGPT and Copilot both use GPT-5 as the underlying model, but with different parameters, security constraints and layers, which results in a slightly different user experience when you compare the two solutions. But in essence, ChatGPT and Copilot are not that different from all the applications we have deployed for many years before the rise of AI – they have databases, they a backend that runs on a server somewhere, and which provides APIs for their web, mobile and desktop apps.

Developing and deploying AI Agents is therefore not different from any other application – the AI models are just another piece of infrastructure that must be configured, deployed, secured and governed. Furthermore, it is important for technical leaders to understand the limitations of AI models and how design choices implement how efficient and secure the solutions are, and an important one is the context window.

Have you ever experienced how a very long conversation with ChatGPT or Gemini starts to become confusing, as if the app had started to forget things you already mentioned multiple times? The AI models have a physical limitation on how many tokens can be in its context as it produces a response – and both the input and output count. Once that limitation is reached, the model can’t continue to produce outputs. This is known as the context window, and every model has a different size on how much context it can hold. To avoid the sudden interruption of output, chat apps such as ChatGPT, Copilot and Gemini will periodically summarise the conversation, and that’s how certain things considered important to you (but ignored by the model) can be ‘lost in the conversation’, which leads to that experience of arguing with an app that keeps forgetting what you said yesterday on the same chat.

To successfully deploy AI solutions that meaningfully impact the business, executive and technical leaders are expected not only to know what AI is, but also how to operationalise it with clear focus on the business model, revenue streams, or efficiency outcomes. AI must not be just an IT or modernisation project, but have direct, measurable impact on market differentiation. Leaders are expected to understand the stack – data, models, API, and orchestration – to make informed vendor and integrated decisions, and must also be prepared to handle ethical, regulatory and operational risk, particularly in industries with compliance mandates (Gartner, 2025).

Context matters

Early AI coding examples looked magical and scary at the same time: a few lines of natural-language instruction, and the model produced functional scripts or API calls. As models evolved, and new AI-powered development tools such as Cursor emerged, we’ve seen AI coding agents being used to build full-scale applications much faster than before, even by people who did not have software engineering skills or experience. The initial results were astonishing and exciting, and the conclusion I’ve seen repeated multiple times was inevitable: “no developers will be needed in 6 months!” – that was about 27 months ago.

As projects grew more complex, the limits appeared – hallucinations resulted in code attempting to use components that don’t exist, insecure configurations leading to highly vulnerable systems, and logic errors hidden behind syntactically correct code. Memes flooded the developer communities about AI creating apps with clear text secrets exposed on the web, or AI deleting failing unit tests so that “all the unit tests are passing”, or developers frustrated by AI coding agents changing things they have never prompted them to change.

The same pattern appeared in other creative domains: content generators producing plausible but inaccurate reports, or image models that rendered extra fingers, distorted faces, or impossible shadows. These are not failures of intelligence; they’re symptoms of statistical reasoning without grounding. There are many factors influencing such outcomes – the quality of the training data, the quality of the prompts, how much context can the model hold in its context-window, the implementation of the app itself. For me, the key factors are down to:

How specific is the problem that the AI solution helps fixing
How skilled are the users in the domain space
How well the organisation understands the limitations of AI – what it can do well and what it can’t

Despite having a lot of potential, AI Agents are a big part of the AI hype, due to the inflated expectations previously discussed. However, agents are also one of the keys to successfully deploy the most impactful and transformational AI technology that can take business process automation to the next level.

Ready to move beyond the hype?

Join CyberIAM and SailPoint for an exclusive Agentic AI Roundtable where we unpack what agentic AI really means for the enterprise, and how to operationalise it securely and at scale.