Amazon Redshift is a data warehouse product which forms part of the larger cloud-computing platform Amazon Web Services. It is built on top of technology from the massive parallel processing (MPP) data warehouse company ParAccel to handle large scale data sets and database migrations. Redshift uses parallel-processing and compression to decrease command execution time. This allows Redshift to perform operations on billions of rows at once.
Connecting to the Database¶
Mammoth allows you to connect to your Database and get the data into Mammoth.
Select API & Databases from the add menu and click on AWS Redshift.
Create a new connection and add your database credentials - Host URL, Port, Username, Password.
Once the connection is established, you will be presented with a list of tables and views in that database.
After you have selected the table you want to work on, you get options to schedulw data imports as discussed in the next section.
Scheduling your Data Pulls¶
You can start retrieving the data now or at a specific time according to your choice. You can also schedule the data pull in order to get the latest data from your Database at a certain time interval - just once, daily, weekly or monthly.
On every data pull from your Database, you also have an option to either replace the older data or combine with older data.
On choosing Combine with older data option, you will get an option to choose a unique sequence column. Using this column, on refresh, Mammoth will pick up all the rows that have greater value in this column than the previous data pull .
Make sure that Mammoth’s public IP address is added to your whitelist.
Mammoth’s public IP is displayed on the create connection window.