A Dataset contains tabular data organized as columns and rows. You can create a Dataset by sourcing data from files, databases, APIs or Webhooks. Read more about the sources of data here.
You can add or replace data in a Dataset. This data shows as Batches in a Dataset. The mechanism of adding data differs based on different sources of data.
You see the data of a Dataset in a View. A fresh View with no rules shows you the original Dataset. You can create many Views on a Dataset. Each View transforms the Dataset but never changes the Dataset itself.
Datasets are listed in Data Library. Clicking on a Dataset opens Preview Panel for that Dataset where you can see various properties related to Datasets, its Batches and Views.
All columns in a Dataset have well defined data type namely Numeric, Date or Text. Mammoth respects the data types when it is available (from Databases, API etc.). For CSV and Excel files, Mammoth infers the types after analysing the uploaded file. This inference may not be correct at times and Mammoth may need your feedback to make corrections. Mammoth would prompt you for your inputs whenever such need arises.
Let us visit each of the above ideas in some detail.
You can create Datasets by clicking on ‘+’ icon in the Data Library.
You can add data from the following sources:
You can drag and drop files in the Data Library from the Deskstop. Alternatively, click on ‘+’ icon and open “My Desktop” Window.
Mammoth supports the following file types: .csv, .tab, .tsv, .txt, .xls, .xlsx, .zip .
APIs and databases¶
You can fetch data from various APIs and Databases. Click here to learn more about supported APIs and Databases.
You can set up an incoming webhook to receive data from Webhooks section after clicking on ‘+’ in the Data Library. Webhooks can receive data as GET, POST, or JSON post parameters. They can be used with popular services such as GitHub or JIRA.
For more information on Webhooks, Click here.
Public files from URLs¶
Enter the URL to file on web under “Fetch from URL” option in ‘+’ menu.
Note that there are files on the web that are not accessible by external software because the owners do not want automated software to access the data. In such case, you can download the file and upload it in the Data Library.
You can see your Dataset by clicking on Open in the Preview Panel. Mammoth creates a first View of the Dataset by default.
If rules in the Data Pipeline have modified your data, you can see the original Dataset by creating a new View. If the View has rules, you can see the original data in the Dataset from the Original Dataset option present at the top of the Data Pipeline.
Adding or replacing data¶
You can add or replace data into your Dataset. Add or replace data is configured differently in case of different types of Datasets. This is how it works for Datasets created through different sources:
File upload and public URLs¶
For these Datasets, you can add or replace data from the Preview Panel. When you add more data, Mammoth infers type of columns in the new data and attempts to do an exact match with the types in the existing Dataset. If the type matches correctly the merge of data succeeds. Otherwise you would see a new Dataset in the Data Library. You can then use Combine Dataset process to merge the Dataset. This gives you a way to map the types of the original and the new uploaded Datasets.
You can combine with or replace older data while creating a new connection.
Mammoth currently does not allow for changing the combine or replace options once the Dataset is created.
Branch out to Dataset¶
When you work in a View, you may want to finally save the data of a View in a Dataset. This is done through Branch Out to Dataset. The options to combine new data or replace existing data from the View is configured within the Branch Out to Dataset interface. Read more about it here.
Webhooks typically provide one row of a data at a time. Mammoth, however, deals with data in terms of Batches. Mammoth accumulates the Webhook data and makes a Batch of data. Batches of data are made every hour automatically if there is any data in that hour. Whenever Webhook catches any data, a Refresh button appears on the Preview Panel. You can also create a Batch manually by clicking on Refresh.
You can add or replace data from the Webhook menu in the Preview Panel.
Combine with another Dataset¶
Combine with another Dataset allows for combining of two Datasets that are not similar to each other. The result of combining Datasets can be saved as a new Dataset or into one of the existing Datasets.
Dataset is made of Batches of data. A Batch is created when new data is added into a Dataset. The following actions can be performed on any Batch of the data:
Previewing a Batch¶
You can preview the data in a Batch. Select the Batch you want to see from the Batch Table and click on Preview batches to see the data of that Batch under Preview section.
Adding columns of the Batch Table into Dataset¶
If you want to analyze your data in the context of the information present in the Batch Table, columns of the Batch Table can be added to the Dataset by using “Add Batch info to Dataset” option. If you want to remove the Batch columns, uncheck the columns under “Add Batch info to Dataset” option and click Apply.
You can see the source of a Batch from the source column in the Batch Table to know its origin.
You can delete one or more Batches of data. Select the Batch you want to delete from the Batch Table and click on delete batches option.
Suspending/unsuspending a Batch¶
You can suspend one or multiple batches to halt data merge from these batches into the respective dataset and its corresponding Views.
To suspend a batch, select the batch and click on the “Suspend/Unsuspend” option at the top. On successful batch suspension, the batch information greys out and the State changes to “Suspended”.
If you wish to include data from a suspended batch, select the suspended batch (or batches) and click on the “Suspend/Unsuspend” option again. This will lift the suspension and the data will start reflecting in the dataset and its corresponding Views.
Make sure Auto Sync is on for the changes to automatically reflect in the Views.
When a batch is suspended or unsuspended, the relevant dataset properties update accordingly. For instance, you’ll notice an additional “Suspended rows” property appear when a batch is suspended. It shows the number of rows suspended in the dataset.
Similarly, you’ll see information regarding pending syncs. This depicts the number of rows that are yet to be synced. The number becomes zero when the data is synced.
Synchronizing Views with data¶
When a Batch is added or replaced to the Dataset, the Pipelines in the View are run and thus Views are synchronized with data. While Views are synchronized, you cannot perform any activity in the View. The data from the Dataset can be synced to its Views in the following ways:
Automatic sync is the default option for data sync. This can be changed from the Preview Panel.
When new data is added to a Dataset, there is a 30-second window during which you can turn off the automatic sync of new data into the Views. Pausing the sync through this option changes the Dataset to work in the manual sync mode.