The short story of how machine learning knocked rules-based systems into a cocked hat

Hashtags have become a sign of our time. Thanks to them, on social networks, we can mark everything, categorize, group, find connections and reach the information we need. However, social media is not the only place where grouping and categorizing information works well and brings measurable results. The idea of hashtags connected with machine learning algorithms is replacing the rules-based system - the basic approach to marking transactions.

Rules-based system - the first step in data analysis

A rules-based system is the most basic method to describe transactions. It is based on simple rules connecting exacts words with category, eg.: when we define the name of the gym, every transaction included this word will be described as a "fitness" – simple as that.

As it's working in many lending companies solutions right now, it is not elastic enough in the world where the number of transactions and their diversification is growing rapidly. That's the moment when machine learning steps in.

Categorization – where the fun with data begin

With machine learning, it doesn't matter how many different kinds of transactions appear - the system can learn on its own and handle increasing scale. How does it look in practice? The base for the categorization is the analyst's expert knowledge, in the next step machine learning steps in. One of the basic examples of using machine learning to transaction categorization is finding keywords and assigning weights to selected categories. The model checks if a sum of weights is greater than the selected before the threshold. The simplicity of weighted keywords allows us to easily find the answer to the question of why we assigned some category, why not. Also, it is an advantage for regulated industries such as banks, where we were able to deploy solutions for automated income discovery - it is fully auditable by humans. Compared to a rigid binary rules based system, categorization is far more elastic and covers multi-factors assessment.

But it is also not a perfect solution. It has a few significant limitations. First of all, every transaction is described only with one category (with possible subcategories). In effect, important information may be missed. The second - it is inflexible - when you need to add a new category you need to rebuild the whole model and this may be quite challenging - consume time and resources.

So, are we locked in?

Labeling – a fresh look at categorisation

Programmers a while ago discovered that composition is a much better approach than inheritance. Labeling is an implementation of that concept. When we are changing from categories to labels the primary effect is that one transaction can have more than one category - just as mentioned hashtags on Instagram. What does it mean?

Let’s check one simple transaction example:

SALARY FOR __month__ FOR _owner_name_entity_ (DEDUCTION BAILIFF 600 EUR)

In this case, standard categorization has to choose between “bailiff” and “salary” categories. When we are working with labels, we will get two (or more – eg. Label monthly can be added if transactions occur monthly around the same date) labels - “bailiff” and “salary”.

‍

‍

This approach allows us to iterate fast on introducing labels for particular countries or use cases. It is also more elastic, enables for multilevel transaction summary – you can create sets of transaction compilations connected with the labels. Of course, Labeling can be used everywhere where categorization is implemented.

‍

‍

Using the labels or categories helps to measure trends and indicators in various dimensions, such as revenues, loans, lifestyle, healthcare expenses, recurring monetary obligations and much more. Identifying complex risk conditions, such as gambling or internal transfers between user accounts, becomes a matter of marking it with a "red flag". It is also possible to automatically confirm the customer's income by analyzing categories such as "salary" or "welfare". Thanks to this credit application processing is easily scalable and human resources can be focused on other tasks.

Data – the key to unlock the customer's potential

Transactions saturated with additional information in such detail constitute the basis for performing credit scoring, and this is the last step from significantly increasing the level of correctly granted credit decisions. Artificial intelligence will not replace a reliable, classic banking assessment, but a solid banking assessment supported by advanced machine learning methods will quickly replace the assessment without it.

*cover photo: Pietro Jeng, Unsplash

The short story of how machine learning knocked rules-based systems into a cocked hat

Rules-based system - the first step in data analysis

Categorization – where the fun with data begin

Labeling – a fresh look at categorisation

Data – the key to unlock the customer's potential

More posts