Return to the APSL blog Fran Muñoz

The role of RAG with large-context LLMs

May 14, 2025 · 6 min read

APSL News

Large Language Models (LLMs) are evolving at a pace that surprises everyone, increasing their ability to automatically complete ever longer and more complex tasks. Despite continuous evolution, we still face four fundamental challenges for their application in organizations: the orchestration and coordination of different AIs, alignment (having a minimum degree of control over the outcome), encompassing large bodies of knowledge, and connection with other tools.

When using an LLM, we always have the context window available, which we can imagine as a short-term memory similar to our own. Context is measured in tokens, the symbolic unit with which AI models work and which, to simplify, we can think of as a number of words. Like our short-term memory, context has a limit, measured in millions of tokens.

Retrieval Augmented Generation (RAG) is an expanded brain for AI. RAG is the long-term memory that, as in humans, can grow and scale much further. It is a technique that allows AI to store a lot of knowledge and use it only when relevant to the user's request. In recent months, LLMs capable of processing contexts of up to a million tokens have appeared, with a clear roadmap towards expanding this capacity. This formidable immediate context window raises a debate: is an external retrieval system like RAG still necessary if AI can "remember" so much information on its own? In this article, we explore the use cases and why RAG is more relevant than ever due to problems of knowledge scale (millions of tokens), data governance (access and permissions), and integration with current company information (integration with tools). In fact, we use it every day without realizing it in the new versions of the most famous models via web search.

The Attention Problem

One of the inherent challenges of extensive contexts is the "attention problem." Similar to how a human can feel overwhelmed or lose focus when faced with an excess of simultaneous information, AI could have difficulties discerning and prioritizing crucial elements within a vast sea of data. A similar situation occurs in tasks that require considering multiple fragments of knowledge to provide an optimal solution. That is, the problem translates into the LLM not focusing on the most relevant knowledge or not being able to consider multiple relevant fragments at once.

This effect, sometimes known as "needle in a haystack," has been mitigated in the most advanced models that incorporate sophisticated reasoning capabilities. These models can apply internal search strategies, as well as implement "divide and conquer" approaches to analyze manageable portions of information, and then synthesize the findings. This improvement in managing large volumes of data in the immediate context has been one of the reasons that has fueled doubts about the continued need for RAG.

However, the reality of implementation in day-to-day scenarios, especially in the business environment, tips the balance in favor of RAG. Corporate knowledge bases – which include technical manuals, customer histories, internal policies, research, legal bases, and a long etcetera – reach volumes that simply overwhelm even the most generous context windows. We are talking about terabytes or even petabytes of structured and unstructured information. It is precisely this scale and magnitude that underscores the sustained relevance of RAG. No current context window can, by itself, encompass the entirety of the knowledge accumulated by an organization over years.

Timeliness of Information

Another fundamental aspect where RAG demonstrates its worth is in the timeliness of information. LLMs are trained with data corpora that have a cutoff date; their knowledge of the world, therefore, is not in real-time. RAG, on the other hand, can connect to live and updated data sources.

The combination of efficient RAG tools, such as specialized platforms like Yedai (if focused on this dynamic management), and their connection with the ecosystem of business applications – whether through multiple connectivity platforms (MCP) or through agent-to-agent (Agent2Agent) communication architectures – is what will truly make the AI we envision possible: an artificial intelligence deeply integrated into the organization, accessing its tools and data in real-time, acting as a true informed and updated co-pilot for the worker.

An example of RAG's utility, transparent to the user, is that the main publicly accessible AI tools, such as Perplexity AI, ChatGPT, or Gemini (and others), have already implemented their own versions of RAG based on web search. These solutions leverage the robust and proven indexing infrastructure of traditional search engines like Google, Bing, or DuckDuckGo. This not only validates the intrinsic utility of RAG with large volumes of information but also demonstrates that its implementation can be simpler and more accessible than might be assumed, building on existing and proven technologies, without always needing to resort to additional technological developments.

RAG for Cost Reduction

An inescapable pragmatic factor is the cost structure of LLM services. Providers typically charge based on the amount of tokens processed, both input (the context provided to the model) and output (the response generated by the model). Consequently, continuously feeding the LLM with massive contexts not only significantly increases operational costs but also elevates the risk of the AI "wandering" or generating less focused responses, as it has to navigate an overly broad spectrum of information that is not always relevant to the task at hand. RAG helps mitigate this by pre-selecting and providing only the most pertinent information, optimizing both cost and response quality.

Cost reduction can also allow for the use of smarter AIs that respond better, ensuring user satisfaction. Although cheaper models generally respond well, they fail when reasoning through more complex answers that are not directly in the text. Furthermore, they still have a certain artificiality, responding with clear patterns. The use of reasoning models generates a more pleasant user experience with less of an "uncanny valley" sensation (the aversion people experience when encountering something that looks human, but not quite).

Permissions and Data Governance

Finally, information governance and access control are critical in any organization. With RAG, applying visibility rules and permissions is greatly simplified. Instead of trying to replicate complex access hierarchies within the LLM itself, the RAG system integrates with the client's knowledge bases, respecting the security and visibility policies already existing in those sources. This ensures that the AI only retrieves and uses information to which the user or the process invoking it has legitimate access, maintaining data integrity and confidentiality.

How do I implement my RAG?

Implementing a Retrieval Augmented Generation (RAG) solution may seem like a technical challenge, but with YedAI, APSL Nagarro's RAG platform, you can do it easily and adapted to your company's needs. YedAI is designed to simplify and accelerate the deployment of your specialized AI, allowing you to connect business knowledge sources directly with the power of Large Language Models (LLMs) securely and scalably. Furthermore, we will guide you with a clear process for integrating and cataloging your company's knowledge base, performing information ingestion and use based on your business rules.

Data Source Connection: Yedai easily integrates your existing information repositories – from databases and internal documents to intranets and cloud systems – thanks to YedAI's versatile connectors. We handle the technical complexity.

Intelligent Processing and Vectorization: The platform optimizes your data for AI, managing chunking and the creation of high-quality vector embeddings, essential for precise and relevant information retrieval. Fluid Integration with LLMs: YedAI integrates seamlessly with the main LLMs on the market or with custom models, ensuring that RAG operates with the most advanced artificial intelligence.

Continuous Management and Optimization: We offer intuitive tools to manage the knowledge base, monitor performance, and continuously refine the relevance of responses, ensuring your RAG evolves with your needs.

Integrated Security and Governance: Implement your RAG with the confidence that data is protected, respecting access and governance policies.

Contact us at yed.ai to learn about the integration process and start smoothly. Yed.ai integrates with your knowledge base and makes it accessible. All your operational knowledge in a single point.

Conclusion: RAG as a Knowledge Center at Scale

In short, far from becoming obsolete due to advancements in LLM context capabilities, RAG is consolidated as a strategic and complementary tool for implementing AI at scale in organizations. It allows the creation of information access at a single point, avoiding the need to create hundreds of tools and portals. It acts as an intelligent long-term knowledge management system, processing and deciding what specific information the AI needs to respond successfully. In this sense, RAG ensures searches across all knowledge with relevance, timeliness, cost efficiency, data governance, and regulatory compliance. It will undoubtedly be a key piece in building truly useful and connected AI systems alongside MCP and Agent2Agent.