Skip to main content

Dataset

A dataset contains tabular data organized as columns and rows. You can create a dataset by sourcing data from files, databases, APIs or Webhooks. Read more about the sources of data.

You can add or replace data in a dataset. This data shows as Batches in a dataset. The mechanism of adding data differs based on different sources of data.

You see the data of a dataset in a View. A fresh View with no rules shows you the original dataset. You can create many Views on a dataset. Each View transforms the dataset but never changes the dataset itself.

Datasets are listed in Data Library under their respective projects. Clicking on a dataset opens the Preview Panel for that dataset where you can see various properties related to the dataset, such as its Batches and Views.

note

All columns in a dataset have well defined data type namely Numeric, Date or Text. Mammoth respects the data types when it is available (from Databases, API etc.). For CSV and Excel files, Mammoth infers the types after analysing the uploaded file. This inference may not be correct at times and Mammoth may need your feedback to make corrections. Mammoth would prompt you for your inputs whenever such need arises.

Let us visit each of the above ideas in some detail.

Adding datasets

You can create datasets in any project with the "Add data" button.

Mammoth supports data imports from the following sources:

Desktop

You can drag and drop files in the Data Library from the Desktop. Alternatively, click on 'Add data' icon and open "My Desktop" Window.

Mammoth supports the following file types: .csv, .tab, .tsv, .txt, .xls, .xlsx, .zip. Mammoth also supports password-protected .xlsx files.

APIs and databases

You can fetch data from various APIs and Databases. Learn more about supported APIs and Databases here.

Webhooks

You can set up an incoming webhook to receive data from Webhooks section after clicking on 'Add data' in the Data Library. Webhooks can receive data as GET, POST, or JSON post parameters. They can be used with popular services such as GitHub or JIRA.

For more information on Webhooks, click here.

Public files from URLs

You can ingest data from public files directly with their URLs. Just enter the URL to file on web under "Fetch from URL" option in the 'Add data' menu.

Note that there are files on the web that are not accessible by external software because the owners do not want automated software to access the data. In such case, you can download the file and upload it in the Data Library.

Viewing dataset

You can see your dataset by clicking on Open in the Preview Panel. Mammoth creates a first View of the dataset by default.

Click on open to view dataset

If the View has been modified with rules, you can still see the original data in the dataset with the Original dataset option present at the top of the Data Pipeline or by simply creating a new View.

Adding or replacing data

You can add or replace data into your dataset. Add or replace data is configured differently in case of different types of datasets. This is how it works for datasets created through different sources:

File upload and public URLs

For these datasets, you can add or replace data from the Preview Panel. When you add more data, Mammoth infers type of columns in the new data and attempts to do an exact match with the types in the existing dataset. If the type matches correctly the merge of data succeeds. Otherwise you would see a new dataset in the Data Library. You can then use the combine dataset process to merge the datasets. This gives you a way to map the types of the original and the uploaded datasets.

Add data to dataset

Third-party connections

You can combine with or replace older data while creating a new connection.

Choose the option to replace or combine.

Mammoth currently does not allow for changing the combine or replace options once the dataset is created.

Branch out to dataset

When you work in a View, you may want to finally save the data of a View in a dataset. Use the Branch out to dataset to do this. The options to combine new data or replace existing data from the View is configured within the Branch Out to dataset interface.

Webhooks

Webhooks typically provide one row of a data at a time. Mammoth, however, deals with data in terms of Batches. Mammoth accumulates the Webhook data and makes a Batch of data. Batches of data are made every hour automatically if there is any data in that hour. Whenever Webhook catches any data, a Refresh button appears on the Preview Panel. You can also create a Batch manually by clicking on Refresh.

Click on Refresh to add a new Batch manually

You can add or replace data from the Webhook menu in the Preview Panel.

Click on edit on dataset mode to replace or combine data

Combine with another dataset

Combine with another dataset allows for combining of two datasets that are not similar to each other. The result of combining datasets can be saved as a new dataset or into one of the existing datasets.

Combine with another dataset

Understanding Batches

A dataset is made of Batches of data. A Batch is created when new data is added into a dataset. The following actions can be performed on any Batch of the data:

Previewing a Batch

You can preview the data in a Batch. Select the Batch you want to see from the Batch Table and click on Preview batches to see the data of that Batch under Preview section.

Preview Batch

Adding columns of the Batch Table into dataset

If you want to analyze your data in the context of the information present in the Batch Table, columns of the Batch Table can be added to the dataset by using "Add Batch info to dataset" option. If you want to remove the Batch columns, uncheck the columns under "Add Batch info to dataset" option and click Apply.

Click on "Add Batch info to dataset" and check the desired columns.

Viewing source

You can see the source of a Batch from the source column in the Batch Table to know its origin. Source column in the Batch Table

Deleting Batch

You can delete one or more Batches of data. Select the Batch you want to delete from the Batch Table and click on delete batches option.

Deleting a Batch

Suspending/unsuspending a Batch

You can suspend one or multiple batches to halt data merge from these batches into the respective dataset and its corresponding Views.

To suspend a batch, select the batch and click on the “Suspend/Unsuspend” option at the top. On successful batch suspension, the batch information greys out and the State changes to “Suspended”.

Figure showing the greyed-out Suspended batch. At the top are buttons to Suspend/Unsuspend, Delete and Preview selected batches

If you wish to include data from a suspended batch, select the suspended batch (or batches) and click on the “Suspend/Unsuspend" option again. This will lift the suspension and the data will start reflecting in the dataset and its corresponding Views.

note

Make sure Auto Sync is on for the changes to automatically reflect in the Views.

When a batch is suspended or unsuspended, the relevant dataset properties update accordingly. For instance, you'll notice an additional "Suspended rows" property appear when a batch is suspended. It shows the number of rows suspended in the dataset.

dataset properties showing number of suspended rows

Similarly, you'll see information regarding pending syncs. This shows the number of rows that are yet to be synced. The number becomes zero when the data is synced.

Image showing zero pending syncs

Synchronizing Views with data

You can choose to synchronize selective Views with data updates from the source. Mammoth provides the Data Sync feature to control the dataflow into individual Views of a dataset.

You can read more about the Dataflow Control here.