Apply Filter
You can use the Apply Filter function when you want to keep or remove certain rows of data from the View based on some condition(s). These conditions are also applicable in other Tasks.
Quickstart
Let's use the same download dataset as above to reach the final output using the Apply Filter rule.
- Go to Transform > Label, Filter and Replace > Apply Filter.
- Select Keep from the Keep/Remove option to include the values which satisfy the condition;
- Select Gender from the column drop-down;
- Select is from the operator list;
- Select Value from the operand list;
- Select the operand value Female;
- Click Apply.
Fundamentals
In order to use filters effectively, let’s understand the following terminologies:
Condition
A condition is a statement against which every row of data is checked. The rows that satisfy the condition appear in the results.
Value or Column Value
A condition in Mammoth has two types of operands: Value and Common Value.
- Value
A Value
lets you build a condition attached to any value from the selected column. For example,
in the above dataset, we can filter the Gender data for one of the two values -
“Female” or "Male".
- Column Value
A Column Value
lets you build a condition connected to values in another Column.
For example, in the example dataset above, a Column Value operand
can help compare the exam marks over two semesters for all students.
If we want to select the best performing students in Sem-1, we can build a condition comparing one (Sem-1) column with another (Sem-2) using the greater-than operator. The result is below:
Student Gender English (Sem-1) English (Sem-2) Alice Female 95 90 Chuck Male 80 60
Building Complex Conditions
Let’s take a look at the example dataset again. Suppose you want to filter the Gender data to show all “Female” exam marks in English in Sem-1 greater than 70. You’ll need to build two conditions for the Task using the AND or OR operation.To build multiple conditions, click on the ‘+’ icon.
To solve the problem above, create two conditions:
- Gender is “Female”.
- Value of English (Sem-1) column is greater than 70.
Setting “AND” on both conditions to true will get the desired result.
Or perhaps you want to select the students with average marks of more than 65 but less than 80 in English across both Semesters or who scored above 90 in Sem-1. From the defined selection criteria, this requires two conditions. But the first condition needs to compare the marks for both semesters which means there is a condition within a condition, also called a nested condition. To create a nested condition, click on the ‘→’ icon.
So, to solve the above problem, you can build a nested condition as:
- English (Sem-1) is greater than 90
OR
Output of (Use '→' icon to create nested condition) -
English (Sem-1) in between 65-80
AND
- English (Sem-2) in between 65-80.
Use “OR” and “AND” as the operators in between the conditions to get the desired result.
Supported Options
-
Keep/Remove: To filter for or against of the condition statement. The Keep option retains rows satisfying the condition while the Remove option removes the columns satisfying the condition.
-
Select a column: Shows a drop-down menu of all the columns. Select a column from the menu to build your condition on.
-
Select an operator: Contains a list of operations. These operations vary as per the datatype. Here's the complete list of operations for different datatypes:
- Types of text operators:
Operation type Filters rows is matching the value is NOT not matching the value contains containing the value does NOT contain not containing the value starts with starting with the value does NOT start with not starting with the value ends with ending with the value does NOT end with not ending with the value is empty with no value in the selected column is NOT empty with value in the selected column
- Types of numeric operators:
Operation type Filters rows is matching the value is NOT not matching the value is less than containing value less than the specified value is less than or equal to containing value less than or equal to the specified value is greater than with a value greater than the specified value is greater than or equal to with a value greater than equal to the specified value is the maximum value containing the highest value for the selected column is NOT the maximum value containing values other than the highest value is the minimum value containing values other than the highest value is NOT the minimum value containing values other than the lowest value is empty with no value in the selected column is NOT empty with value in the selected column in between with values in the specified range
- Types of text operators:
Operation type Filters rows is matching the value is NOT not matching the value contains containing the value does NOT contain not containing the value starts with starting with the value does NOT start with not starting with the value ends with ending with the value does NOT end with not ending with the value is empty with no value in the selected column is NOT empty with value in the selected column
-
Select an operand: Offers two options: Value and Common Value.
- Types of date operands:
Operand type Filters rows Date matching the date Date-Time in seconds matching date and time to the second Date/Minute matching date and time to the minute Date/Hour matching date and time to the hour Year/Month matching the year and the month Year matching the year Day of Month matching the day of the month Month matching the month Weekday matching the weekday Earliest single value with the earliest date and the time Latest single value with the latest date and the time Earliest Date-Time with earliest date-time increased to a certain date-time Latest Date-Time with latest date-time decreased to a certain date-time Current Date matching the current date or the current date increased/decreased by specific days, weeks, months, or years -
Add condition: You can build complex conditions using this. The Add condition option is present as the + sign next to the condition.
-
AND/OR: Operators for complex conditions. The And operator checks for rows statisfying all the conditions. The OR operator checks for rows satifying at least any one of the statements.
- Filtering does not delete data. It only filters the data for the next step of the pipeline. Filters can be modified or deleted;
- You can make the text values case sensitive by clicking on the case-sensitive checkbox present below the last condition ;
- Filter conditions are suggested based on current data in the View. It does not account for any future data the View may get. This could affect the filter results. For example, if you need to remove all those rows with a Column Value of -(hyphen) or _ (underscore) or if your current data in the View only has Values with - , the suggestions would reflect this. You need to explicitly add _ to account for any future data with this Value.
- Apply Filter can also be applied via explore cards which filter out the Values but don’t add the rule to the Data Pipeline. From the explore card you can add it to the Pipeline and it will open the Apply Filter task pre-filled with the filter conditions matching the explore card selection.
- You can place a condition outside the parentheses by clicking on '←' icon.