What role does VIER play in the context of large language models?

Last updated: 05.12.2023 09:00

With the advancements of the past year, the potential for AI applications in customer service has grown tremendously. Language models are more versatile and achieve significantly better results than before, even without extensive training. Furthermore, increased alignment makes the deployment of generative language models in live settings possible.

To continue providing optimal, customer-centric solutions, new applications based on LLMs are needed as natural communication in human-machine interaction becomes increasingly commonplace. Simultaneously, new challenges arise in the security and performance of these applications.

As a provider of innovative software solutions, VIER selects and optimizes the most powerful models for specific use cases, then securely and stably deploys them in live customer environments. Since early 2023, VIER has been working with its own AI teams to develop:

a Model Garden containing the best Large Language Models tailored for each specific use case,
a Gateway enabling the secure and GDPR-compliant deployment of various LLMs,
a chat solution bridging knowledge processing and generative language models.

Additionally, we have integrated the capabilities of LLMs into our products VIER Cognitive Voice Gateway, VIER Copilot, and VIER Interaction Analytics, which were made available to our customers in June 2023.

VIER Model Garden

The Model Garden serves as a repository where [Company name] stores and provides information about LLMs that have been tested for specific use cases. It offers an overview of current developments crucial for live deployment and provides insights into the quality, response time, hosting, and costs of various models.

Why VIER Model Garden?

There are many LLM benchmarks and most new models are tested based on these benchmarks. The results of these benchmarks are summarized in LLM leaderboards, such as the Open LLM Leaderboard or the LMSYS Leaderboard, which also integrates commercial models and human evaluations.

Standard benchmarks of the language models

MMLU: reflects acquired knowledge during pre-training and covers 57 topics from STEM (Science, Technology, Engineering, Mathematics), humanities, social sciences, and more.
HellaSwag: is a test of general knowledge for completing sentences, which is easy for humans but was previously very challenging for LLMs.
Commonsense QA: contains 12,247 general knowledge questions, each with 5 answer choices.
OpenBookQA: consists of 5,957 multiple-choice questions at an elementary science level (4,957 training data, 500 development, 500 test).
ARC Benchmark: is a more challenging QA task that includes common-sense reasoning.
TriviaQA Benchmark: is a realistic, text-based question-answer dataset containing 950,000 question-answer pairs from 662,000 documents from Wikipedia and the web.
TruthfulQA: measures how much a model reproduces falsehoods frequently found on the internet.
Chatbot Arena Elo Rating: is an LLM battle platform with human evaluations. Over 70,000 user votes are combined to calculate Elo ratings.

Of course, VIER utilizes this information to stay abreast of the latest developments. However, there are several reasons why this information alone is far from sufficient to make a secure decision about which model to use for which use case:

None of the benchmarks mentioned above utilize German language as an evaluation basis. The VIER Model Garden provides information on model quality in the German language.
The benchmarks are not application-specific. Even if a model excels at answering knowledge questions (MMLU) or providing responses that align with common sense (e.g., HellaSwag, Commonsense QA), and does not tend to repeat falsehoods frequently found on the internet (e.g., TruthfulQA), it does not automatically mean that German texts will be correctly summarized. Even specific summarization benchmarks are mostly created with English newspaper datasets that are not comparable to the relevant application data. The VIER Model Garden provides information on model quality in specific use cases with customer-relevant data (e.g., transcripts from phone calls).
Most benchmarks focus on quality, which is undoubtedly the most important criterion. However, different use cases have other important aspects such as response time and costs. The VIER Model Garden provides an overview of the key criteria for users and supports decision-making regarding a model's quality, response time, hosting (data security), and costs.
The Huggingface Open LLM Leaderboard only includes open-source models. The VIER Model Garden compares the quality, response time, and costs of open-source models with those of commercial models.
The Huggingface Leaderboard, along with most other rankings, targets experts and developers. They are often difficult to understand. The VIER Model Garden is aimed at potential users and provides information in a structured and understandable manner.
Regarding customer-relevant use cases, the Model Garden consists of several sections. In the VIER Model Garden, it is shown that a model can perform well in one use case while underperforming in others.

VIER thoroughly tests relevant models to offer businesses the best options for each use case. Besides selecting the right model for each application scenario, several other aspects need to be considered for the secure deployment of LLMs at an enterprise level.

The VIER AI Gateway and the new way of deploying conversational AI in enterprises

For the secure deployment of LLMs, expertise in prompt engineering and systematic testing of various prompt formats against each other are needed, enabling the creation of powerful applications. VIER has extensive experience in setting guardrails to keep models on track during application. This involves controlling that models, for example, adhere to instructions in chat applications and do not hallucinate or provide information on topics not intended for the respective use case. To accomplish this, VIER follows a multi-stage approach, which includes fine-tuning the prompt and implementing guardrails through our flow management, blacklists, and conversation guides for the models, all unified under VIER's NEO-CAI (New Enterprise Optimized Conversational Artificial Intelligence) project.

To make expertise readily available, NEO-CAI offers Retrieval Augmented Generation (RAG) in a customized variant. This allows VIER to combine the capabilities of LLMs, providing coherent and effective responses, with query-based approaches that search for the correct information from existing documents. This makes it possible, for example, to completely automate FAQs or questions about product descriptions. To ensure these applications work optimally, it is important to segment the content documents sensibly (chunking), find a good mechanism for translating these documents into vectors (embedding), and develop an application that retrieves the data from the vector database based on the specific question and provides it to the LLM for answer generation in the correct format.

Model access is facilitated through our AI Gateway, which provides detailed privacy features alongside authentication, billing, monitoring, and management of various model accesses. This includes optional anonymization or pseudonymization of requests, ensuring that a model never receives customer-specific data such as names, customer numbers, or addresses while still delivering responses with the same naturalness as in direct communication with the selected model. Anonymization is ensured by an internal technology from VIER Cognesys, which ensures that customer data does not leave the VIER systems.

VIER ensures that the best available models can be safely deployed in our customers' respective use cases. For this purpose, VIER offers personalized chat solutions as well as the integration of LLMs into our products Cognitive Voice Gateway, Copilot, and Interaction Analytics.

Using LLMs safely and in compliance with data protection regulations

The development of Large Language Models (LLMs) is advancing rapidly. We are only at the beginning of a development that will change how we use information and how we communicate. VIER is ready to tackle this challenge together with our customers and leverage the potential of LLMs to improve both the customer experience and the employee experience.

VIER relies on a mix of various technologies such as the Model Garden, the AI Gateway, and NEO-CAI technology to help businesses navigate the complex landscape of LLMs. These tools enable companies to find the best models for their needs while ensuring that their applications are secure and compliant with data protection regulations.

The journey towards the widespread use of LLMs in customer applications has only just begun. If you would like to learn more about specific use cases, integrations, or testing, please feel free to contact us.

Author:

Dr. Anja Linnenbürger

Head of Research

VIER

More information

Read in this blog post how language models have evolved and how ChatGPT differs from other models.

Back to the blog