Extract JSON
The Extract JSON task allows for safe conversion of data from JSON format to a table format. This task comes with an auto detection which will pre-populate the user interface with suggestions.
You can use this function to extract data from arbitrary JSON without programming.
Quickstart
Let us start with the following sample data:
Items |
---|
{"Student": "Alice", "Science": 140, "Language": 138} |
{"Student": "Frank", "Science": 122, "Arts": 102} |
There are two rows in this data. Each row contains a JSON. If you want to convert this JSON into a rows vs column format, we can do the following:
- Go to Transform > Reshape, Group & Aggregate.
- Select the Extract JSON function.
- Select the column Items.
Mammoth will automatically make the following suggestions:
- The JSON is an dictionary (object).
- Each Key in the dictionary will be extracted as a column.
- Keys to be extracted are Arts, Language, Science, Student.
Finally, click APPLY. The result will appear as displayed below:
Arts | Language | Science | Student |
---|---|---|---|
138 | > 140 | Alice | |
> 102 | > 122 | Frank |
When to use
JSON flattening might be necessary in the following situations.
- If you are using Webhooks to bring data into Mammoth.
- If you have JSON files
- If you have an API response from a web service.
- One or more of the columns in your database has JSON data.
Supported Options
The following options are supported in this Task:
- Source Column: The column from which JSON values are to be extracted.
- JSON type: The type of JSON object. You can choose between two types - Dictionary and Array/List.
- Extract as: Extracts data either as new columns or new rows.
- Extract the following keys: This panel allows for the list of keys to be extracted from the JSON dictionary. Each key will become a new column.
- Column type: The type of the new column. If the data cannot be extracted as the option chosen here, empty values will be produced. Numeric and Text data are supported. For date type see Date Handling in JSON.
- Number of columns: The number of new columns to create. This option appears when the JSON type is list and it is being extracted as new columns.
- Keep source column: If the source column is to be kept after extraction. This option is turned off by default. See Keeping source column.
Extracting different JSON types
JSON can be either a list or a dictionary. JSON can also be extracted as new columns or rows. The following section describes how the combination of these options work.
1. Extracting Dictionary
As Columns
When extracting JSON dictionaries as new columns,
- One column gets created per key.
- All the values go into a cell below the column corresponding to that key.
- If a key does not have a value for a certain row, the corresponding cell remains empty.
This is the default combination of options when the JSON is detected by Mammoth as a dictionary.
To see an example of this see QuickStart JSON.
As Rows
When extracting JSON dictionaries as new rows,
- Each key-value pair is extracted as a new row.
- Two columns will get automatically created, one for the keys and one for the values.
- If the data does not contain any key-value pairs, corresponding cells are left empty.
For example, if the data appears like this:
Names |
---|
{"first": "Alice", "second": "Bob", "third": "Claire"} |
After applying this combination, it will appear as shown below:
Key | Value |
---|---|
first | Alice |
second | Bob |
third | Claire |
2. Extracting List
As Columns
When extracting JSON lists as new columns,
- The number of columns to be extracted can be configured.
- The first value in the list goes into the first column, second value goes into the second column and so on.
- If the list does not contain as many columns as you have chosen, the corresponding cells are left empty.
For example, if the data appears like this:
Names |
---|
["Alice", "Bob", "Chuck"] |
After applying this combination, it will appear as shown below:
Item 1 | Item 2 | Item 3 |
---|---|---|
Alice | Bob | Chuck |
As Rows
When extracting JSON lists as new rows,
- Mammoth creates two columns automatically when this option is selected. One for index of the item in the array and one for the item itself.
- If the array is empty, the corresponding cell will be empty.
This is the default combination of options when the JSON is detected by Mammoth as a list.
For example, if the data appears like this:
Names |
---|
["Alice", "Bob", "Chuck"] |
After applying this combination, it will appear as shown below:
Index | Item |
---|---|
1 | Alice |
2 | Bob |
3 | Chuck |
To summarize JSON extraction in Mammoth:
Type | Dictionary | List/Array |
---|---|---|
As Columns |
|
|
As Rows |
|
|
Notes
Safe Extraction
JSON data is often non-uniform. The keys are not mandatory and it is possible that data gets mixed.
The present ways of flattening JSON involve programming. The programmer has to ensure that the code for flattening data does not throw an error if the data does not match with what they had envisoned.
Mammoth simplifies this complexity and provides safety extracting JSON. Mammoth's algorithms safely handle the error cases and produce empty values instead of producing error conditions.
The concept of safe extraction also applies for various data types. If a value cannot be extracted safely as the given type, it produces an empty value. For example, if you want to extract a key-value pair into a numeric column and the value itself is of type text, then the output will be empty.
Auto Detection and Suggestions
Mammoth looks at some sample values in the JSON column to automatically detect JSON type and make other suggestions. Since the suggestions are made only on the sample data, you should always verify these options before applying the rule.
- If the JSON is detected as a dictionary, the default suggestion is extraction as new Columns.
- If the JSON is detected as a list, the default suggestion is extraction as new rows.
Corrective measures can be applied to the suggestions if the suggestion does not appear to be what you have envisoned. Some examples of such a scenario are:
- If the keys are not very consistent in data, some of the keys in data might not be captured by the suggestion algorithm. You can manually add missing keys.
- Some keys may be less important than others and you may want to delete such keys from the Key Extraction panel when carrying out this task to reduce complexity of the resulting data.
- The choice of extraction as rows or columns is dependent on the data and the desired result. See the following section for more information.
Extract as Rows vs Columns
The choice of extraction as rows or columns is dependent on the data and the desired results.
For example, see the following data:
Items |
---|
{"Student": "Alice", "Science": 140, "Language": 138} |
{"Student": "Frank", "Science": 122, "Arts": 102} |
This can be extracted as new columns to appear as shown below:
Arts | Language | Science | Student |
---|---|---|---|
138 | > 140 | Alice | |
> 102 | > 122 | Frank |
The same data when extracted as new rows will appear as shown below:
Key | Value |
---|---|
Language | 138 |
Science | 140 |
Student | Alice |
Arts | 102 |
Science | 122 |
Student | Bob |
Both of these extraction mechanisms have their uses.
- If you want to know how each of the students performed, the first method is useful.
- If you want to know the average of marks irrespective of students, second method is useful. You can filter the Student values from Key column and build summary subject-wise.
Date Handling in JSON
JSON does not have a date type but JSON data often contains date values. However, it is up to the programmers as to how they want to represent dates in JSON. Since JSON is very flexible, there is no consistent date format in JSON data.
Mammoth does not allow for date handling in Json Extract Task. You would have to perform date conversion separately using the Convert Columns task.
Keeping Source Column
The costs of storage may increase exponentially when this option is turned on. This is especially true when the JSON extraction is creating new rows. However, it may be useful to use this option if the data contains both lists and dictionaries and you want to apply two different extraction techniques on the same column.
- Convert Columns: How to convert one or more columns into a different type.
- Extract Text: How to extract sub-strings from a text column into another column.
- Remove Column Task: How to delete one more columns.