Bulk Replace¶
This task allows for one or more group of values to be replaced by a single value in a column. This task comes with the capability automatically to find words that have similar spellings.
Table of Contents
When to use¶
When you have data that contains data entry errors.
When you want to categorise a bunch of values into one.
Quick Start¶
Let us start with the following data
Year |
Country |
---|---|
1970 |
brazil |
1974 |
west germany |
1978 |
Argentina |
1982 |
italy |
1986 |
Argentina |
1990 |
West Germany |
1994 |
BRAZIL |
1998 |
France |
2002 |
Brazil |
2006 |
Italy |
2010 |
SPAIN |
2014 |
Germany |
2018 |
france |
Looks like there are multiple variants of spellings in data for the same countries. We can fix this by doing the following.
Open Data Preparation menu and click on Find and Replace
Click Bulk Replace.
Select the column Country.
Mammoth will list down all the unique values in data.
It will auto suggest that 4 bucket suggestions found.
Click on Autofill. This will create 4 buckets where words with similar spellings are grouped together.
Click APPLY.
Data will now look like this:
Year |
Country |
---|---|
1970 |
brazil |
1974 |
west germany |
1978 |
Argentina |
1982 |
italy |
1986 |
Argentina |
1990 |
west germany |
1994 |
brazil |
1998 |
france |
2002 |
brazil |
2006 |
italy |
2010 |
SPAIN |
2014 |
Germany |
2018 |
france |
We can then use the Text Formatting task to make data more consistent.
Supported Options¶
The following options are supported.
Column: The column on which the bulk replace operation is to be performed.
Search & Replace Groups: The words grouped together to be searched for and replaced.
Fuzzy Suggestions & Automatic Grouping¶
Mammoth comes with the ability to automatically group words that look similar. This uses internally the Levenshtein distance algorithm to determine which words are closer to each other.
It is advisable to always verify the suggestions before applying the task.