Seer Data Analyst Nadya DeBeers walks us through a question we get asked almost every day – what is the difference between unit record and aggregate data?

When you’re in the field of data, you’ll often hear the terms unit record data and aggregate data.

The funny thing about words that are used often is that I believe a fair amount of those listening don’t have a clue what they mean.

Now I don’t mean this in a harsh way because we can’t know everything! I actually think that there is a very simple reason for this misunderstanding of common words, and it stems from the fact that we have loads of ways to say the same thing.

Take the word exponent for instance. Exponents are the mathematical operation that tells us to multiply a number by itself as many times as shown in the exponent. For example:

4² = 4 * 4 = 16, where the exponent is 2 so we multiply 4 by itself two times

Imagine sitting in a maths class at a new school and the lesson for the day is on exponents. Up to this point, you’ve never heard the term exponent because you’ve been using the term index, or order, or power to represent the same concept… so not knowing what exponent means does not indicate that you can’t solve the problem or are unintelligent, it means that you just need to decode the language first. This is a key example as to why we should never criticise or poke fun at someone for a lack of understanding language.

So with all that being said, let’s get into what unit record data and aggregate data are in case you’re uncertain, want to solidify your understanding, or are just curious!

Unit record data

According to the ABS, unit record data, also known as microdata, is defined as a dataset of “unit records”, where each record contains information about the “unit”. Let’s unpack this a bit.

Imagine a dataset full of census information. You might expect each row of the dataset to represent a single person. This would mean that the unit is a person, and therefore, the unit records are information about each person.

A unit can be defined as something like a person, as well as a calculated unit like a person per week. In the case of person per week, each row of the dataset would represent information about a person per week.

Aggregate data

According to the ABS, aggregate data, also known as macrodata or tabular data, is produced by grouping information into categories and combining values within these categories.

So let’s say we were interested in finding the amount of people living in each suburb from the unit record data provided in the census. We could add up each row where a person is from a particular suburb and output a new dataset where our rows are no longer defined by a single person, but rather, are the sum of people living in each suburb.

We can create aggregate data from the unit record data shown above by summing the amount of people in each suburb:

We hope this blog provided some clarity around the two types of data! Please let us know if you have any questions or feedback.

Sources

ABS Glossary: https://www.abs.gov.au/about/data-services/data-confidentiality-guide/glossary

Originally posted on Nadya DeBeers’ Medium page.