By corporate RAG, we mean the adoption of an architecture that combines large language models with data retrieval systems, to provide precise answers based on the internal knowledge base. This technology overcomes the limitations of traditional LLMs (such as ChatGPT), which operate on generic and uncontrolled knowledge to provide answers that are often not relevant to workers' requests. Observing the difficulties of our customers in using generative AI, we thought of an alternative solution using our own RAG technology. Here's how it went.
Generative artificial intelligence is gradually entering our daily lives, changing the way we search for information or create content. Just think of the recent introduction of AI Overviews in Google searches to understand, even if only in part, the extent of this change.
But when you enter the business environment, things aren't that simple.
Companies that try to integrate tools based on generative AI, such as ChatGPT, into their internal processes, end up facing a series of important limitations.
First, there is a lack of precise context.
Always taking the example of ChatGPT, these generalist models are trained on public data and do not know the policies, processes or language in use in their company.
Consequently, the answers, however well formulated, cannot embrace the real context of the requests and are so irrelevant. We speak well, but not much is said.
Second, you need to consider information security.
Every time an internal user of an organization sends a request to an external tool, they risk sharing sensitive data with a system that does not offer great guarantees of privacy. And for every company, that's a risk you can't afford.
In addition, these tools have static knowledge. This means that they are not automatically updated to keep up with changes within a company: if a procedure changes or if a new policy is published, the AI cannot know it. Then, keep giving answers based on outdated information.
If it is correct to say that artificial intelligence is already giving good results in terms of productivity, it is also true that there is ample room for improvement.
Without a link to company data, without security, without verifiable sources and without continuous updating, the risk is that people will see AI as a fascinating tool, certainly, but useless.
The result? Companies could go to great lengths to adopt this new technology, only to then see their investment go up in smoke. For this reason, it is necessary to resort to alternative solutions.
And this is precisely where RAG technology comes into play.
When we use tools like ChatGPT, we get answers that are based on the vast general knowledge with which the model has been trained.
However, in the company this is not enough: users need answers that are relevant to their reality, based on internal documents and verifiable without compromising the security of the shared data.
The missing link between this need and generative AI capabilities is a RAG system.
RAG stands for 'Retrieval-Augmented Generation' and in fact represents an evolution of conversational artificial intelligence introduced with ChatGPT. In particular, this technology combines large language models with an intelligent search engine, able to retrieve the most relevant information from the company's knowledge base in real time.
When a user asks a question, the system does not rely only on the model's memory, but actively searches business documents for the most suitable answer.
The mechanism behind it is simple.
All documents are divided into smaller parts, such as paragraphs or sections. Each of these parts is transformed into a numerical vector that represents its semantic content and is stored in a specialized search system.
User questions are also converted into a vector, and then compared with the index and combined with the most relevant sections. The generative model thus receives these contents as context and generates a response with the most relevant business data.
In this way, a RAG system combines the naturalness of ChatGPT style answers with the precision and reliability of business sources. For example, instead of vaguely answering the question “How does remote work work in our company?” , a virtual assistant based on RAG technology goes to look for the exact section of internal policies where remote work is discussed and uses it to build its response.
But there's more: the RAG can be configured to operate on multi-paradigm AI models.
Although retrieval and generation are typically text-oriented, multimodal models allow these functionalities to be extended to other types of content as well, such as images, video, or audio.
This allows artificial intelligence to bring information that comes from different sources and formats to the same semantic level, allowing a unified understanding and cross-sectional reasoning regardless of the origin of the data.
This brings us to the heart of this project, namely the commitment of our team to make this technology even more intelligent.
If the standard version of the RAG already allows language models to generate answers based on specific data sources, our goal was to make it modular and easily reusable.
But let's go into detail.
The first major improvement concerned the user experience.
We have made the system capable of holding contextual conversations, keeping the memory of previous interactions. This means that the user now has the opportunity to ask a series of related questions and receive consistent answers, without having to repeat the same information each time.
We then focused on architecture.
We have integrated Kernel Memory into our project, an open source library supported by Microsoft that allows us to manage the memory of the conversation, the context data and the semantic objects to be recalled. To be clear, thanks to this component, the system becomes more scalable and adaptable to specific use cases.
Finally, we thought about future implementations of our solution.
We didn't just create a working system, but we built a template capable of accelerating the development of customized RAG solutions.
This is an approach that allows us to create new, tailor-made RAG projects, using an already tested structure.
As a result, implementation times and costs are reduced for our customers.
When you decide to implement a RAG system in the company, the platform with which to build it obviously makes the difference.
In our case, the choice fell on Azure OpenAI both for the performance of the integrated language model and for all the advantages offered by the Azure ecosystem.
As we said, one of the main limitations affecting the business utility of public generative AI models is related to data security. Every time a worker makes use of a tool like ChatGPT, they risk sharing sensitive information with an uncontrolled environment.
Azure OpenAI does not pose this problem, since user requests are processed directly in the customer's Microsoft cloud, without leaving the corporate security perimeter. This involves maximum privacy protection and the guarantee of compliance with regulations.
But it's not just about security.
Azure OpenAI allows us to integrate the generative component of a RAG system with all the Microsoft tools that populate the digital workplace (such as SharePoint, Teams, OneDrive and Microsoft Viva), as well as with a company's custom applications (whether web, desktop or mobile) to respond to any need, even outside the Microsoft 365 ecosystem.
Complementing Azure OpenAI, our solution then makes use of other Azure services to expand its capabilities. Among these, we highlight:
Everything thus remains in a coherent ecosystem that is easily managed by the internal IT department.
Another advantage is modularity.
We chose Azure also because it allows us to develop a scalable architecture, which can grow with the needs of each customer. Our solution can initially be adopted by a single team, as part of a pilot project, and then extended to the entire organization with a limited effort.
On a technical level, the control over the models is total.
We can decide which version to use, how to update the models, what data to make available for content generation and which to make inaccessible. In addition, we can define language filters, such as the detection of offensive words (Bad Words Detection) or automatic moderation.
Finally, it is possible to collect data on the use of the tool to generate useful metrics to understand how it is used and thus understand how to improve its operation.
These are the advantages that led us to choose Azure for our RAG projects. And most importantly, these are the advantages that are making the difference for our customers.
Despite the complexity associated with managing sensitive business data and the integration of an AI system in structured environments such as Microsoft 365, we have succeeded in creating a RAG architecture that facilitates the retrieval of information in the digital workplace.
One of the main use cases concerns the consultation of company guidelines and protocols, especially in highly regulated contexts such as the healthcare sector, where the answers must be precise, contextualized and often divided into several steps.
Among the challenges successfully overcome during this project, it is worth mentioning:
After the release of the first projects, together with our customers, we monitored the impact of this new technology on their digital workplace. The goal was to understand if users could get really useful answers, and the results speak for themselves.
The Modern Apps team responds swiftly to IT needs where software development is the core component, including solutions that integrate artificial intelligence. The technical staff is trained specifically in delivering software projects based on Microsoft technology stacks and has expertise in managing both agile and long-term projects.