Extract Text

The Extract Text task allows for partial extraction of values from a text column. In other words, you can extract sub-strings from text values.

Quick Start

Let us start with the following sample data:

URL

website1.com/Alice

website2.com/Bob

website1.com/Chuck

Let us assume that you want to extract the name in the column URL. That is everything following the string /. Complete the following steps to achieve this.

  1. Go to Transform > Text Functions.

  2. Select the Extract Text function.

  3. Choose the column URL as the column to extract text from.

  4. Select Characters around a LETTER/WORD for the What to Extract option.

  5. Select All characters right of letter/word.

  6. Enter ‘/’ in the input box for letter/word.

  7. Apply result into a new column called Name.

  8. Click APPLY.

The resulting data appears as shown below.

URL

Name

website1.com/Alice

Alice

website2.com/Bob

Bob

website1.com/Chuck

Chuck

Supported Options

The following options are supported by this task:

  • Extract Text From: The column to extract sub-string from.

  • What to Extract: This option allows you to choose from one of the following three options:

  1. Characters at a certain position

  2. Characters around a letter/word

  3. Characters from beginning or end

  4. Starts with

  5. Does not start with

  6. Ends with

  7. Does not end with

  8. Starts with a number

  9. Does not start with a number

  10. Ends with a number

  11. Does not end with a number

  12. Contains a number

  13. Does not contain a number

  14. Regular expression

  15. All except regular expression

  • Parameters: Set of options that allow specification of other inputs as required by the choice of What to extract option. For more information, see:

  • Condition: See Condition

  • Apply result into: This option allows configuration of the destination of the results by this task. See result documentation.

Extraction Types

Mammoth supports the following sub-string extraction techniques. Each of these techniques can be used to achieve multiple sub-objectives and they are controlled using the parameters in the options that follow the What to Extract section.

Extracting characters from a certain position

By using this option, you can achieve one of the following objectives:

  1. Extract all characters to the left of a specific position.

  2. Extract all characters to the right of a specific position.

  3. Extract specific number of characters to the left of a specific position.

  4. Extract specific number of characters to the right of a specific position.

Extracting characters around a letter/word

By using this option, you can achieve one of the following objectives:

  1. Extract all characters to the left of a letter/word.

  2. Extract all characters to the right of a letter/word.

  3. Extract specific number of characters to the left of a letter/word.

  4. Extract specific number of characters to the right of a letter/word.

The letter/word can be included in the extracted string by using the left of and including/right of and including option.

Extracting characters from beginning or end

By using this option, one of the following objectives can be achieved.

  1. Extract a specific number of characters from the beginning of the string.

  2. Extract a specific number of characters from the end of the string.

Starts with

You can use this option to extract all strings that starts with a specific word/number/character.

Does not start with

You can use this option to extract all strings that does NOT start with a specific word/number/character.

Ends with

You can use this option to extract all strings that ends with a specific word/number/character.

Does not end with

You can use this option to extract all strings that does NOT end with a specific word/number/character.

Starts with a number

You can use this option to extract all strings that starts with a number. s Does not start with a number —————————-

You can use this option to extract all strings that does NOT start with a number.

Ends with a number

You can use this option to extract all strings that ends with a number.

Does not end with a number

You can use this option to extract all strings that does NOT end with a number.

Contains a number

You can use this option to extract all strings that contains a number.

Does not contain a number

You can use this option to extract all strings that does NOT contain a number.

Regular expression

You can use Regular expression, or RegEx, to customize a search pattern. It’s a powerful tool to find and extract sub-strings from a text column that would otherwise not match any other pre-defined searches above.

Refer to this Regex documentation for detailed information on usage.

All except regular expression

Use All except regular expression when you want to remove the specified regex string from the extract.

This is how:

  • Open the tasks menu → “Text Functions” → “Extract Text”

  • In the “What to extract” drop down, select “All except regular expression”

  • Enter the RegEx in the field below it. It should look something like this:

regular expressions

Fig. 124 Extracting substring with All except regular expressions

  • Click Apply and it’ll extract everything except the regex.

regular expressions

Fig. 125 Extracted substring

Refer to this guide to learn how to create regular expressions.

See also

Result Column

The result column documentation

Condition

Conditions in tasks