Over the past few decades, we have witnessed the exponential rise of Machine Learning (ML), the introduction of Deep Learning Techniques and the dramatic improvement in the analytical capabilities of software. The advancements made in these technologies are now enabling better anomaly detection and subsequently unlocking new opportunities for Professional Services firms. But, how exactly?
We can define anomaly detection as the process of recognising events or data that are significantly different from the majority. In a given dataset, it is the identification of data that presents irregular patterns. These are called ‘anomalies’ or ‘outliers’. As this is a challenging task, especially when the dataset contains millions of records (i.e. Big Data), the use of Machine Learning and Deep Learning Techniques (DLT) serves as the best approach. A plethora of Supervised and Unsupervised Learning Techniques are available to identify outliers even within different data types, for example structured, semi-structured or unstructured data.
The challenge of anomaly detection peaks when it comes to understanding which of the suggested outliers are indeed malicious, and which are more simply a by-product of the quality of the data. Let’s take fraud detection as an example. In this case, a malicious outlier would represent fraudulent activities, whereas non-malicious outliers would point to the poor quality of the data or inaccurate data entry.
Examples of this critical task include fraud detection, to detect data abnormalities or problematic records and intrusion detection. For professional services firms, being able to identify hidden anomalies in corporate data, make sense of huge data volumes and surface the most valuable connections to understand the context created is extremely important.
At Engine B, our primary focus is on the quality of audit data, which may contain many millions of numerical and categorical values. As we need to take into consideration multiple attributes while detecting outliers, ‘multivariate’ approaches, which correspond to more than one variable, are a focus, alongside ‘univariate’ ones when needed, which correspond to the analysis of one only variable.
Two methodologies are applied to ensure we successfully tackle the Auditor’s challenging task of identifying abnormalities in client data:
This involves locating outliers based on the Auditor’s classic risk assessment
This modern risk assessment methodology makes sure that deep irregularities are identified even within big data, which consist of millions of records with hundreds of attributes.
Our software suggests to the auditor possible outliers (or anomalies) to consider. And with the further use of our Knowledge Graphs, we can next examine whether the given outlier is indeed malicious or not.
The algorithms that we are currently using have been selected and utilised so far in such way that enable us to deal with a variety of anomalies, regardless of the domain – be it Audit, Legal Services, Tax etc. Like a fisherman’s net with differently sized gaps to make sure that fish of different sizes will be caught, each of our algorithms focus on detecting outliers in a unique fashion while emphasising different data distributions. Our methodology is then dynamically shaped according to audit data specific features.
As previously mentioned, the repertoire of our algorithms suggests several common anomalies, based on an agreement weighting formula. These common anomalies are then further investigated through Graph Technology to find additional evidence or outliers (even from other relevant datasets). With this and the continued application of Machine Learning techniques plus expert knowledge, we are able to determine if these anomalies are indeed related to audit risk or not.