According to Gartner, an estimated 20 percent of today’s data is structured, while the remaining 80 percent is unstructured. At Engine B, the Industry Common Data Models and Knowledge Graphs we are developing are unique in that they enable the interrogation and analysis of both structured and unstructured data. But, what does it mean for data to be structured or unstructured? And what are the key differences when it comes to the analysis, storage and processing of the two?
Put simply, structured data is a standardised way to provide information. It resides in table format and can be defined in a data set to create a data model. It’s easy to search, as it is defined in rows and columns, and can be mapped with other fields. Structured data is the most commonly used data and often categorised as ‘quantitative data’. Usually, it is stored in a relational database and can be returned by using a Structured Query Language (SQL).
Some examples of structured data include names, dates and ERP (Enterprise Resource Planning) systems.
Alternatively, unstructured data is a non-standardised way to provide information, or in other words – it is everything else. It does have an internal structure but is not defined in rows and columns and cannot be presented in a data model or schema. As a result, it is much more difficult to search, manage and analyse. Unstructured data resides in various different formats such as images, text, audio and video files, graphs and so on. It is often categorised as ‘qualitative data’ and is sometimes stored in a non-relational databases, frequently known as ‘No-SQL’.
Take PDFs, emails, contracts, websites – just some examples of unstructured data.
1. They are displayed differently
As previously highlighted, structured data is displayed in rows and columns, whilst unstructured data is usually represented in the form of text, images, media etc.
2. One is much easier to store than the other
Structured data is far easier to store, manage and export in databases like SQL and Microsoft Excel, with datatypes like numbers and dates. Unstructured data gives more freedom for storage as it is stored in its native format and often housed in ‘Data Lakes’.
3. Structured data is easier to manage, process and protect
Legacy solutions can be used to help the management of structured data, whereas the processing and protecting of unstructured data would be difficult with such a system and often requires Artificial Intelligence solutions instead.
4. Unstructured data is harder to analyse
Structured data is much easier to analyse with standard data analysis methods and tools. Unstructured data must be analysed and examined manually or using the analysis tools in a ‘No SQL’ database.
5. A bird’s-eye view vs a deeper insight
Let’s look at a scenario based example. A business is receiving feedback from a customer satisfaction survey and the answers can be captured in two separate ways. Structured data, such as a rating between 0 and 10, would be stored in structured format, whilst open questions requiring free format text would be classified as unstructured data. Whilst the structured data gives us a bird’s-eye view of the responses and statistical information about performance, the unstructured data gives us much deeper understanding into customer behaviour and intent.
6. Processing
Structured data takes a lot of processing to scale up the relational database schema, whereas unstructured data is more scalable with less processing, comparatively.
7. Schema independence
Structured data is less flexible as it is schema dependant. By contrast, unstructured data is more flexible as it is schema independent.
8. Number of joins
Structured data in a relational database has to be joined to multiple tables to get to the end result, but unstructured data within a ‘No SQL’ or graph database could get to the end result with less amount of joins.
It’s clear to see there are vast differences between both types of data. However, for professional services firms, being able to process and analyse large volumes of data, both structured and unstructured, is often a huge challenge and essential in making informed, operational decisions. This is exactly why Engine B is developing Industry Common Data Models and Knowledge Graphs to enable the linking of structured and unstructured data and allow for more intelligent decision making.