Managing dataflows¶
A data flow refers to the movement of data within a system. In Mammoth, you build a data flow whenever you create a View over a dataset, use a Join rule to fetch data from another View or dataset, Branch-out a View as a dataset, or export data from Mammoth into another databases. For the purpose of this document, let us call each of these Datasets or Views as a Node. Your data moves in a forward direction from any node that changes, either due to a change in data or a change in a rule in Views.
Data flow control allows you to hold the propagation of changes going forward at any of the nodes. This essentially means you can work on an area while pausing updates to other connected areas in a data flow as you wish.
For better understanding look at the following data flow. The green points show the controls that can hold the data at the entry or exit points of any node. These control points are called as Data sync.
Data Sync¶
Data Sync for a Dataset¶
You can control the dataflow in multiple Views of a Dataset at the same time. You can choose to allow or dis-allow sync of a View from the source by changing the Data sync settings in the Data Library. This is how:
When the Sync is turned off the data stays in a pending state at the Source and Mammoth shows a warning like this:
When you update the View with the changes, this warning goes away.
These pending data updates also show up in the Dataflow status in such cases.
Data Sync for a View¶
You can control the dataflow in individual Views as well. The Data sync settings appear at the bottom of the data pipeline.
Here’s how you can change Data sync settings in a View:
Alternatively, you can also choose to disable dataflow from all nodes for a View with the Data sync toggle in the navbar menu.
When a View is out of sync with new data or pipeline updates, the system shows a warning like the following:
When you update the View with the changes, this warning goes away.
Dataflow Status¶
The Dataflow status is a global monitor that tracks pending updates across your workspace. It provides a summary of:
pending data updates,
pending pipeline changes,
pipelines in error,
active pipelines,
queued pipelines, and so on.
It ensures you are always on top of Views and pipelines that require your attention. You can also use this modal to manage all pending updates from within a single window like this:
Note
Updating data in the dataflow status modal is a manual action and it does not alter Data sync settings elsewhere in your workspace.