Skip to main content

Extract Text

The Extract Text task allows for partial extraction of values from a text column. In other words, you can extract sub-strings from text values.

Quickstart

Let us start with the following sample data:

URL
website1.com/Alice
website2.com/Bob
website1.com/Chuck

Let us assume that you want to extract the name in the column URL. That is everything following the string /. Complete the following steps to achieve this.

  1. Go to Transform > Text Functions.
  2. Select the Extract Text function.
  3. Choose the column URL as the column to extract text from.
  4. Select Characters around a LETTER/WORD for the What to Extract option.
  5. Select All characters right of letter/word.
  6. Enter '/' in the input box for letter/word.
  7. Apply result into a new column called Name.
  8. Click APPLY.

The resulting data appears as shown below.

URLName
website1.com/AliceAlice
website2.com/BobBob
website1.com/ChuckChuck

Supported Options

The following options are supported by this task:

  • Extract Text From: The column to extract sub-string from.
  • What to Extract: This option allows you to choose from one of the following three options:
  1. Characters at a certain position
  2. Characters around a letter/word
  3. Characters from beginning or end
  4. Starts with
  5. Does not start with
  6. Ends with
  7. Does not end with
  8. Starts with a number
  9. Does not start with a number
  10. Ends with a number
  11. Does not end with a number
  12. Contains a number
  13. Does not contain a number
  14. Regular expression
  15. All except regular expression
  • Parameters: Set of options that allow specification of other inputs as required by the choice of What to extract option. For more information, see:
  • Conditions: See how to apply conditions.
  • Apply result into: This option allows configuration of the destination of the results by this task. See result documentation.

Extraction Types

Mammoth supports the following sub-string extraction techniques. Each of these techniques can be used to achieve multiple sub-objectives and they are controlled using the parameters in the options that follow the What to Extract section.

Extracting characters from a certain position

By using this option, you can achieve one of the following objectives:

  1. Extract all characters to the left of a specific position.
  2. Extract all characters to the right of a specific position.
  3. Extract specific number of characters to the left of a specific position.
  4. Extract specific number of characters to the right of a specific position.

Extracting characters around a letter/word

By using this option, you can achieve one of the following objectives:

  1. Extract all characters to the left of a letter/word.
  2. Extract all characters to the right of a letter/word.
  3. Extract specific number of characters to the left of a letter/word.
  4. Extract specific number of characters to the right of a letter/word.

The letter/word can be included in the extracted string by using the left of and including/right of and including option.

Extracting characters from beginning or end

By using this option, one of the following objectives can be achieved.

  1. Extract a specific number of characters from the beginning of the string.
  2. Extract a specific number of characters from the end of the string.

Starts with

You can use this option to extract all strings that starts with a specific word/number/character.

Does not start with

You can use this option to extract all strings that does NOT start with a specific word/number/character.

Ends with

You can use this option to extract all strings that ends with a specific word/number/character.

Does not end with

You can use this option to extract all strings that does NOT end with a specific word/number/character.

Starts with a number

You can use this option to extract all strings that starts with a number.

Does not start with a number

You can use this option to extract all strings that does NOT start with a number.

Ends with a number

You can use this option to extract all strings that ends with a number.

Does not end with a number

You can use this option to extract all strings that does NOT end with a number.

Contains a number

You can use this option to extract all strings that contains a number.

Does not contain a number

You can use this option to extract all strings that does NOT contain a number.

Regular expression

You can use Regular expression, or RegEx, to customize a search pattern. It's a powerful tool to find and extract sub-strings from a text column that would otherwise not match any other pre-defined searches above.

Refer to this Regex documentation for detailed information on usage.

All except regular expression

Use All except regular expression when you want to remove the specified regex string from the extract.

This is how:

  • Open the tasks menu → “Text Functions” → “Extract Text”
  • In the "What to extract" drop down, select “All except regular expression”
  • Enter the RegEx in the field below it. It should look something like this:
  • Click Apply and it'll extract everything except the regex.

Refer to this guide to learn how to create regular expressions.

info
  1. Result Column: The result column documentation
  2. Conditions: To apply conditions to extraction.