Extract JSON

The Extract JSON task allows for safe conversion of data from JSON format to a table format. This task comes with an auto detection which will pre-populate the user interface with suggestions.

You can use this function to extract data from arbitrary JSON without programming.

Quick Start

Let us start with the following sample data:

Items

{"Student": "Alice", "Science": 140, "Language": 138}

{"Student": "Frank", "Science": 122, "Arts": 102}

There are two rows in this data. Each row contains a JSON. If you want to convert this JSON into a rows vs column format, we can do the following:

  1. Go to Transform > Reshape, Group & Aggregate.

  2. Select the Extract JSON function.

  3. Select the column Items.

Mammoth will automatically make the following suggestions:

  1. The JSON is an dictionary (object).

  2. Each Key in the dictionary will be extracted as a column.

  3. Keys to be extracted are Arts, Language, Science, Student.

Finally, click APPLY. The result will appear as displayed below:

Arts

Language

Science

Student

138

140

Alice

102

122

Frank

When to use

JSON flattening might be necessary in the following situations.

  1. If you are using webhooks to bring data into Mammoth.

  2. If you have JSON files

  3. If you have an API response from a web service.

  4. One or more of the columns in your database has JSON data.

Supported Options

The following options are supported in this Task:

  1. Source Column: The column from which JSON values are to be extracted.

  2. JSON type: The type of JSON object. You can choose between two types - Dictionary and Array/List.

  3. Extract as: Extracts data either as new columns or new rows.

  4. Extract the following keys: This panel allows for the list of keys to be extracted from the JSON dictionary. Each key will become a new column.

  5. Column type: The type of the new column. If the data cannot be extracted as the option chosen here, empty values will be produced. Numeric and Text data are supported. For date type see Date Handling in JSON.

  6. Number of columns: The number of new columns to create. This option appears when the JSON type is list and it is being extracted as new columns.

  7. Keep source column: If the source column is to be kept after extraction. This option is turned off by default. See Keeping source column.

Extracting different JSON types

JSON can be either a list or a dictionary. JSON can also be extracted as new columns or rows. The following section describes how the combination of these options work.

1. Extracting Dictionary

As Columns

When extracting JSON dictionaries as new columns,

  1. One column gets created per key.

  2. All the values go into a cell below the column corresponding to that key.

  3. If a key does not have a value for a certain row, the corresponding cell remains empty.

This is the default combination of options when the JSON is detected by Mammoth as a dictionary.

To see an example of this see Quick Start.

As Rows

When extracting JSON dictionaries as new rows,

  1. Each key-value pair is extracted as a new row.

  2. Two columns will get automatically created, one for the keys and one for the values.

  3. If the data does not contain any key-value pairs, corresponding cells are left empty.

For example, if the data appears like this:

Names

{"first": "Alice", "second": "Bob", "third": "Claire"}

After applying this combination, it will appear as shown below:

Key

Value

first

Alice

second

Bob

third

Claire

2. Extracting List

When extracting JSON lists as new columns,

  1. The number of columns to be extracted can be configured.

  2. The first value in the list goes into the first column, second value goes into the second column and so on.

  3. If the list does not contain as many columns as you have chosen, the corresponding cells are left empty.

For example, if the data appears like this:

Names

["Alice", "Bob", "Chuck"]

After applying this combination, it will appear as shown below:

Item 1

Item 2

Item 3

Alice

Bob

Chuck

When extracting JSON lists as new rows,

  1. Mammoth creates two columns automatically when this option is selected. One for index of the item in the array and one for the item itself.

  2. If the array is empty, the corresponding cell will be empty.

This is the default combination of options when the JSON is detected by Mammoth as a list.

For example, if the data appears like this:

Names

["Alice", "Bob", "Chuck"]

After applying this combination, it will appear as shown below:

Index

Item

1

Alice

2

Bob

3

Chuck

To summarize JSON extraction in Mammoth:

Table 10 Extracting JSON

Type

Dictionary

List/Array

As Columns

  1. This creates one column per key chosen to be extracted.

  2. All the values will go into a cell below the column corresponding to that key.

  1. This extracts key-value pairs into individual rows.

  2. Mammoth auto-generates 2 columns, one for the Key and one for the Value respectively.

  3. All keys go into the Key column.

  4. All values go into the Value column.

As Rows

  1. One can choose how many columns should be created.

  2. Each value in the list goes into the corresponding column.

  3. Mammoth auto-generates a column to put these values in.

  1. Mammoth auto-generates 2 columns, one for the Index and one for the Item respectively.

  2. This will create as many rows as there are items in the list.

  3. Even if the list is empty, one row will still be added and the Item value will be empty.

Notes

Safe Extraction

JSON data is often non-uniform. The keys are not mandatory and it is possible that data gets mixed.

The present ways of flattening JSON involve programming. The programmer has to ensure that the code for flattening data does not throw an error if the data does not match with what they had envisoned.

Mammoth simplifies this complexity and provides safety extracting JSON. Mammoth’s algorithms safely handle the error cases and produce empty values instead of producing error conditions.

The concept of safe extraction also applies for various data types. If a value cannot be extracted safely as the given type, it produces an empty value. For example, if you want to extract a key-value pair into a numeric column and the value itself is of type text, then the output will be empty.

Auto Detection and Suggestions

Mammoth looks at some sample values in the JSON column to automatically detect JSON type and make other suggestions. Since the suggestions are made only on the sample data, you should always verify these options before applying the rule.

  1. If the JSON is detected as a dictionary, the default suggestion is extraction as new columns.

  2. If the JSON is detected as a list, the default suggestion is extraction as new rows.

Corrective measures can be applied to the suggestions if the suggestion does not appear to be what you have envisoned. Some examples of such a scenario are:

  1. If the keys are not very consistent in data, some of the keys in data might not be captured by the suggestion algorithm. You can manually add missing keys.

  2. Some keys may be less important than others and you may want to delete such keys from the Key Extraction panel when carrying out this task to reduce complexity of the resulting data.

  3. The choice of extraction as rows or columns is dependent on the data and the desired result. See the following section for more information.

Extract as Rows vs Columns

The choice of extraction as rows or columns is dependent on the data and the desired results.

For example, see the following data:

Items

{"Student": "Alice", "Science": 140, "Language": 138}

{"Student": "Frank", "Science": 122, "Arts": 102}

This can be extracted as new columns to appear as shown below:

Arts

Language

Science

Student

138

140

Alice

102

122

Frank

The same data when extracted as new rows will appear as shown below:

Key

Value

Language

138

Science

140

Student

Alice

Arts

102

Science

122

Student

Bob

Both of these extraction mechanisms have their uses.

  • If you want to know how each of the students performed, the first method is useful.

  • If you want to know the average of marks irrespective of students, second method is useful. You can filter the Student values from Key column and build summary subject-wise.

Date Handling in JSON

JSON does not have a date type but JSON data often contains date values. However, it is up to the programmers as to how they want to represent dates in JSON. Since JSON is very flexible, there is no consistent date format in JSON data.

Mammoth does not allow for date handling in Extract JSON task. You would have to perform date conversion separately using the Convert column type task.

Keeping Source Column

The costs of storage may increase exponentially when this option is turned on. This is especially true when the JSON extraction is creating new rows. However, it may be useful to use this option if the data contains both lists and dictionaries and you want to apply two different extraction techniques on the same column.

See also

Convert column type

How to convert one or more columns into a different type.

Extract Text

How to extract sub-strings from a text column into another column.

Remove Column

How to delete one more columns.