Managing dataflows

A data flow refers to the movement of data within a system. In Mammoth, you build a data flow whenever you create a View over a dataset, use a Join rule to fetch data from another View or dataset, Branch-out a View as a dataset, or export data from Mammoth into another databases. For the purpose of this document, let us call each of these Datasets or Views as a Node. Your data moves in a forward direction from any node that changes, either due to a change in data or a change in a rule in Views.

Data flow control allows you to hold the propagation of changes going forward at any of the nodes. This essentially means you can work on an area while pausing updates to other connected areas in a data flow as you wish.

For better understanding look at the following data flow. The green points show the controls that can hold the data at the entry or exit points of any node. These control points are called as Data sync.

Data flow control

Fig. 135 How the data flow control works in Mammoth

Data Sync

Data Sync for a Dataset

You can control the dataflow in multiple Views of a Dataset at the same time. You can choose to allow or dis-allow sync of a View from the source by changing the Data sync settings in the Data Library. This is how:

When the Sync is turned off the data stays in a pending state at the Source and Mammoth shows a warning like this:

datasync warning

Fig. 136 Pending data updates warning for a View

When you update the View with the changes, this warning goes away.

These pending data updates also show up in the Dataflow status in such cases.

Data Sync for a View

You can control the dataflow in individual Views as well. The Data sync settings appear at the bottom of the data pipeline.

dataflow settings in views

Fig. 137 Data sync options in Views

Here’s how you can change Data sync settings in a View:

Alternatively, you can also choose to disable dataflow from all nodes for a View with the Data sync toggle in the navbar menu.

datasync in navbar

Fig. 138 Data sync toggle in navbar

When a View is out of sync with new data or pipeline updates, the system shows a warning like the following:

Original column name

Fig. 139 Warning showing inconsistent data in a View

When you update the View with the changes, this warning goes away.

Data Sync for Tasks

The Data sync feature is also present as a toggle button for separate tasks such as Crosstab, Join, Lookup, Branchout, and Exports to databases in the pipeline. Enable or disable the toggle to allow or dis-allow data flow from the respective Views.

Original column name

Fig. 140 Data sync toggle for separate tasks

Dataflow Status

The Dataflow status is a global monitor that tracks pending updates across your workspace. It provides a summary of:

  • pending data updates,

  • pending pipeline changes,

  • pipelines in error,

  • active pipelines,

  • queued pipelines, and so on.

Data flow status

Fig. 141 Data flow status

It ensures you are always on top of Views and pipelines that require your attention. You can also use this modal to manage all pending updates from within a single window like this:

Note

Updating data in the dataflow status modal is a manual action and it does not alter Data sync settings elsewhere in your workspace.