Continuity of data with embedded AI models

When most people think of AI, they associate it with tools like ChatGPT, Mindgram, or Claude. It's no wonder - Large Language Models (LLMs), often referred to as Generative AI, are the most popular AI applications in the public eye. Yet, the concept of AI is very broad and encompasses various algorithms and behaviours. At KOIA, we specialise in applying these diverse AI models to solve real-world problems.

A Case Study: AI in Claims and Insurance Associated Emissions Calculation

One of our most comprehensive AI projects was developed for Claims Carbon. The goal was to calculate the share of carbon emissions in vehicle collisions, estimating the contribution of specific insurance companies to air pollution. This project showcases how AI can be applied to complex, data-driven challenges in unexpected ways.

‍The Challenge: Data Accessibility and Compatibility

The primary hurdle we faced was the lack of readily accessible data. Our algorithms required information on each vehicle's CO2 emissions, weight, and power. While this data is available in the European Commission's database, two significant issues arose:

The dataset is enormous and challenging to navigate.
There's often a mismatch between how car models are listed in the database and how insurers enter them (e.g., due to typos or variations in model names).

The database lacks elastic search criteria, meaning users must enter exact keywords to find matches. This rigidity posed a significant challenge for efficient data retrieval and matching.

The Solution: Embedded AI Models and Vector Search

To overcome these obstacles, we turned to embedded AI models, specifically utilising vector search. Here's how we approached the problem:

Step 1: Data Preparation

We extracted data from the European Union database.
We meticulously cleaned the data, corrected typos, and filled in missing information.
This cleaning process was crucial, as our algorithm requires 100% accurate data to function properly.

Step 2: Vector Transformation

We converted all records from the EU database (e.g., car brand, model) into vectors.
We performed the same vector conversion on the insurance company data.

Step 3: Similarity Matching

Using the Cosine Similarity algorithm (a vector similarity measure), we compared the vectors from the EU Commission data with those from the insurer's database.
We looked for common points between these vector sets.
If the probability was above 80%, we considered it a match, indicating equivalent information.

This approach essentially uses an embedding model, which converts textual elements of our world into mathematical vectors, allowing for more flexible and forgiving comparisons.

Comparing AI Approaches: Embeddings vs. Supervised Machine Learning

During the search for the most effective solution for Claims Carbon, we also explored supervised machine-learning models using regression algorithms. This approach allowed us to determine average or expected values based on more precise data than just the vehicle name. For example, predict expected CO2 emissions based on a car's weight and engine specifications.

The supervised ML model achieved 90% accuracy, slightly outperforming the embedding-based approach. It also provided better data continuity - regardless of the input values, the trained ML always produced some data.

However, despite its higher precision, the supervised ML model has a significant drawback: verifying the correctness of its output requires an advanced understanding of machine learning models. This complexity can be a barrier for clients without specialised ML expertise.

In contrast, the embedding-based model, while slightly less accurate, is more transparent and easier for clients to understand and verify. This accessibility gives it an advantage in many real-world applications.

Conclusion: AI Solutions require tailoring

Our experience with both embedding-based and machine-learning models in real-life scenarios has equipped us to introduce these technologies to new projects effectively. The key takeaway is that each project requires an individual approach tailored to the specific needs and technical capabilities of the client's team.

Machine learning models, while powerful, require careful setup and parameter tuning. Factors to consider include:

The quantity and format of available data
Training methodology
Fine-tuning of multiple parameters

This complexity often necessitates an iterative process to achieve reliable results.

At KOIA, we understand that the most effective AI solution balances technical performance with practical usability. Whether you need a highly accurate predictive model or a more accessible, transparent system, we're equipped to deliver AI solutions that meet your specific requirements.

Are you looking for reliable data prediction models for your software? With our experience in both embedding-based and machine-learning approaches, we're well-positioned to help you navigate the AI landscape and find the perfect solution for your needs.

‍

Author: Arkadiusz Szulc

‍