Skip to main content

Fill Missing Values with Grouping

Filling missing values with grouping is a unique function which segregates the data and fills the null or empty values in a Dataset.

Quickstart

Let's take a sample data set "Hospital" which shows the information about certain patients. You can download this sample data.

Row NumberPatient entry IDDateName of PatientBlood GroupAny prior history?Address
1000106/01/2018LukeA+Minneapolis
2000222/03/2018NguyenB+DiabeticOhio
3000322/03/2018Nguyen
4000414/09/2017JohnAB+High blood pressureSan Diego
5000506/01/2018LukeMinneapolis
6000626/11/2017AnneO-ObesityKansas city
7000706/01/2018LukeA+Minneapolis
8000806/01/2018LukeA+AlzheimerMinneapolis

In this Dataset, there are three columns with missing values - 'Blood Group', ' Prior medical history' and 'Address'. It is evident that the blood group of a person never changes, therefore we fill the missing values in the column 'Blood Group'.

Let's take the column ‘Blood Group’ and fill the missing values within the group (Name of patient) from above. Follow the steps below:

  1. Go to Transform > Column Functions.
  2. Select the Fill Missing Values function.
  3. Select Blood Group column from Fill empty cells in drop-down.
  4. Select Patient Entry ID column and ascending/descending order in order by drop-down.
  5. Select above/below from the drop-down 'with values from' to fill the missing value in the column with the value above or below it.
  6. Click on Add Group and select Name of the patient column for the grouping of the data.
  7. Click on APPLY .

How it Works

Grouping by column divides the table into groups on the basis of data in the selected column. Filling from above or below is done on the groups thus formed.

Let's take the sample Dataset of a hospital and break it down step by step.

  1. Add a grouping rule and select the column which you want to use to segregate the data. For example, let’s use the Name of the patient column. This rule divides the table based on the similar values in the same column. In this table, four groups are formed. We can visualize this as:

Division of Dataset into Groups

Values from Above / Below

When you select ‘above’ or ‘below’ from the ‘with values from’ drop-down, it fills the empty cell of the column with value in the cell just above/below it. If we fill a group from above (using default row, as ‘order by’) in the table “Hospital”, it will fill the data from above in the four groups formed separately. Similarly, on selecting ‘below’, it will fill the data from cells below and then show it together.

We can visualize fill missing values from Above as follows:

Fill missing values from Above

The resultant table will be: Final outcome after filling values from Above

Similarly, fill missing values from below can be visualized as:

Fill missing values from Below

The resultant table will be:

Final outcome after filling values from Below

Ordering

To fill from above or below, you would need to order the data first. 'Order by' option allows you to sort your data according to the specified column in ascending or descending order. By default, ordering of data is done by the row number. For example, select Patient Entry ID column and ‘ascending’ in the drop-down menu labelled ‘order by’ to sort the data in ascending order.

Alternatively, you can order them on the home screen itself. At the bottom of the screen, there is a ‘Sort by’ option. If you sort the table on the home screen, it becomes the default option in the fill missing values menu.

Sort by option at the home screen

In the above example, if you select the ‘Date’ column for ordering the data, it leads to an ambiguity as the data in that column is not unique, there is duplicate data in several cells. Since row number is a unique, duplicate value in the chosen order column get further resolved by row number.

note

The ‘Add Group' option however is optional. When grouping is not applied, whole data is treated as a single group. Thus the missing values is filled by the value above or below based on the ordering over complete data.

info

See the Window Function documentation.