Extract Text¶
The Extract Text task allows for partial extraction of values from a text column. In other words, you can extract sub-strings from text values.
Table of Contents
Quick Start¶
Let us start with the following sample data:
URL |
---|
website1.com/Alice |
website2.com/Bob |
website1.com/Chuck |
Let us assume that you want to extract the name in the column URL. That is everything following the string /. Complete the following steps to achieve this.
Open Data Preparation menu and click on Text Function
Select Extract Text
Choose the column URL as the column to extract text from.
Select Characters around a LETTER/WORD for the What to Extract option.
Select All characters right of letter/word.
Enter ‘/’ in the input box for letter/word.
Apply result into a new column called Name.
Click APPLY.
The resulting data appears as shown below.
URL |
Name |
---|---|
website1.com/Alice |
Alice |
website2.com/Bob |
Bob |
website1.com/Chuck |
Chuck |
Supported Options¶
The following options are supported by this task:
Extract Text From: The column to extract sub-string from.
What to Extract: This option allows you to choose from one of the following three options:
Characters at a certain position
Characters around a letter/word
Characters from beginning or end
Starts with
Does not start with
Ends with
Does not end with
Starts with a number
Does not start with a number
Ends with a number
Does not end with a number
Contains a number
Does not contain a number
Regular expression
All except regular expression
Parameters: Set of options that allow specification of other inputs as required by the choice of What to extract option. For more information, see:
Condition: See Condition
Apply result into: This option allows configuration of the destination of the results by this task. See result documentation.
Extraction Types¶
Mammoth supports the following sub-string extraction techniques. Each of these techniques can be used to achieve multiple sub-objectives and they are controlled using the parameters in the options that follow the What to Extract section.
Extracting characters from a certain position¶
By using this option, you can achieve one of the following objectives:
Extract all characters to the left of a specific position.
Extract all characters to the right of a specific position.
Extract specific number of characters to the left of a specific position.
Extract specific number of characters to the right of a specific position.
Extracting characters around a letter/word¶
By using this option, you can achieve one of the following objectives:
Extract all characters to the left of a letter/word.
Extract all characters to the right of a letter/word.
Extract specific number of characters to the left of a letter/word.
Extract specific number of characters to the right of a letter/word.
The letter/word can be included in the extracted string by using the left of and including/right of and including option.
Extracting characters from beginning or end¶
By using this option, one of the following objectives can be achieved.
Extract a specific number of characters from the beginning of the string.
Extract a specific number of characters from the end of the string.
Starts with¶
You can use this option to extract all strings that starts with a specific word/number/character.
Does not start with¶
You can use this option to extract all strings that does NOT start with a specific word/number/character.
Ends with¶
You can use this option to extract all strings that ends with a specific word/number/character.
Does not end with¶
You can use this option to extract all strings that does NOT end with a specific word/number/character.
Starts with a number¶
You can use this option to extract all strings that starts with a number.
Does not start with a number¶
You can use this option to extract all strings that does NOT start with a number.
Ends with a number¶
You can use this option to extract all strings that ends with a number.
Does not end with a number¶
You can use this option to extract all strings that does NOT end with a number.
Contains a number¶
You can use this option to extract all strings that contains a number.
Does not contain a number¶
You can use this option to extract all strings that does NOT contain a number.
Regular expression¶
You can use Regular expression, or RegEx, to customize a search pattern. It’s a powerful tool to find and extract sub-strings from a text column that would otherwise not match any other pre-defined searches above.
Refer to this Regex documentation for detailed information on usage.
All except regular expression¶
Use All except regular expression when you want to remove the specified regex string from the extract.
This is how:
Open the tasks menu → “Text Functions” → “Extract Text”
In the “What to extract” drop down, select “All except regular expression”
Enter the RegEx in the field below it. It should look something like this:

Fig. 116 Extracting substring with All except regular expressions¶
Click Apply and it’ll extract everything except the regex.

Fig. 117 Extracted substring¶
Refer to this guide to learn how to create regular expressions.