Extract Text
The Extract Text task allows for partial extraction of values from a text column. In other words, you can extract sub-strings from text values.
Quickstart
Let us start with the following sample data:
URL |
---|
website1.com/Alice |
website2.com/Bob |
website1.com/Chuck |
Let us assume that you want to extract the name in the column URL. That is everything following the string /. Complete the following steps to achieve this.
- Go to Transform > Text Functions.
- Select the Extract Text function.
- Choose the column URL as the column to extract text from.
- Select Characters around a LETTER/WORD for the What to Extract option.
- Select All characters right of letter/word.
- Enter '/' in the input box for letter/word.
- Apply result into a new column called Name.
- Click APPLY.
The resulting data appears as shown below.
URL | Name |
---|---|
website1.com/Alice | Alice |
website2.com/Bob | Bob |
website1.com/Chuck | Chuck |
Supported Options
The following options are supported by this task:
- Extract Text From: The column to extract sub-string from.
- What to Extract: This option allows you to choose from one of the following three options:
- Characters at a certain position
- Characters around a letter/word
- Characters from beginning or end
- Starts with
- Does not start with
- Ends with
- Does not end with
- Starts with a number
- Does not start with a number
- Ends with a number
- Does not end with a number
- Contains a number
- Does not contain a number
- Regular expression
- All except regular expression
- Parameters: Set of options that allow specification of other inputs as required by the choice of What to extract option. For more information, see:
- Conditions: See how to apply conditions.
- Apply result into: This option allows configuration of the destination of the results by this task. See result documentation.
Extraction Types
Mammoth supports the following sub-string extraction techniques. Each of these techniques can be used to achieve multiple sub-objectives and they are controlled using the parameters in the options that follow the What to Extract section.
Extracting characters from a certain position
By using this option, you can achieve one of the following objectives:
- Extract all characters to the left of a specific position.
- Extract all characters to the right of a specific position.
- Extract specific number of characters to the left of a specific position.
- Extract specific number of characters to the right of a specific position.
Extracting characters around a letter/word
By using this option, you can achieve one of the following objectives:
- Extract all characters to the left of a letter/word.
- Extract all characters to the right of a letter/word.
- Extract specific number of characters to the left of a letter/word.
- Extract specific number of characters to the right of a letter/word.
The letter/word can be included in the extracted string by using the left of and including/right of and including option.
Extracting characters from beginning or end
By using this option, one of the following objectives can be achieved.
- Extract a specific number of characters from the beginning of the string.
- Extract a specific number of characters from the end of the string.
Starts with
You can use this option to extract all strings that starts with a specific word/number/character.
Does not start with
You can use this option to extract all strings that does NOT start with a specific word/number/character.
Ends with
You can use this option to extract all strings that ends with a specific word/number/character.
Does not end with
You can use this option to extract all strings that does NOT end with a specific word/number/character.
Starts with a number
You can use this option to extract all strings that starts with a number.
Does not start with a number
You can use this option to extract all strings that does NOT start with a number.
Ends with a number
You can use this option to extract all strings that ends with a number.
Does not end with a number
You can use this option to extract all strings that does NOT end with a number.
Contains a number
You can use this option to extract all strings that contains a number.
Does not contain a number
You can use this option to extract all strings that does NOT contain a number.
Regular expression
You can use Regular expression, or RegEx, to customize a search pattern. It's a powerful tool to find and extract sub-strings from a text column that would otherwise not match any other pre-defined searches above.
Refer to this Regex documentation for detailed information on usage.
All except regular expression
Use All except regular expression
when you want to remove the specified regex string from the extract.
This is how:
- Open the tasks menu → “Text Functions” → “Extract Text”
- In the "What to extract" drop down, select “All except regular expression”
- Enter the RegEx in the field below it. It should look something like this:
- Click
Apply
and it'll extract everything except the regex.
Refer to this guide to learn how to create regular expressions.
- Result Column: The result column documentation
- Conditions: To apply conditions to extraction.