Skip to main content

How to use Concatenate Values to solve common data problems?

In Data Analytics, you are often working to find insights like - the top 10 revenue-generators, top performers in an organization, most popular electives in a college department, varieties available for a product, and so on.

Even though it’s a typical data problem, the solution may require many steps before we get to the result. But, it does not have to be this tough.

Today we are looking at an alternative way to solve these data problems using Mammoth. I’ll show you how you can achieve the results faster with a combination of efficient functions in Mammoth.

Let's look at a dataset to better understand what's going on. Suppose you have the download following dataset and you’re tasked with finding the 3 bestselling models within each brand:

Sample Data

That is, you need results like this:

Required result

The challenge:

There are three sub-tasks we are dealing with here:

  1. To know the total sales each of these brands, and by extension, each model is making
  2. To know the best-selling models among each lot
  3. To have the final top 3 models within each brand as a comma-separated list

The first part is pretty straightforward. You can find the total sales value for the entire data using a simple mathematical function.

Solving the second will require the use of grouping combined with mathematical functions.

The third bit is the trickiest. If you’ve ever tried following this in Excel , you’ll see how tricky it can be. We are essentially trying to achieve this:

Understanding the Concatenation Values function

Our goal:

Our goal will be to use Mammoth to solve the above data problem, and to take you through the detailed steps involved in each of the above-mentioned tasks.

First, let’s breakdown the requirement into simple to-dos:

  1. To calculate the total sales values for each row
  2. To rank the models in each brand as per their sales value
  3. To generate a list of top-selling models in each brand

Step 1: Calculating total sales value

First, we’ll calculate the total sales value (say ‘Revenue’) using one of the math functions in Mammoth.

  1. Navigate to the dataset
  2. Go to Data Preparation > Numeric Function > Perform a math function
  3. Write a mathematical function to calculate the total sales value. Here we are multiplying the columns - price each and quantity ordered, and getting the results into a new column ‘Revenue’. Click Apply:

Step 2: Ranking models according to their sales value

Next, we’ll try to rank the data as per the highest revenue-generating models in each brand. Now, this wouldn’t be a hassle at all if the clause ‘within each brand’ wasn’t there. We could have simply sorted the data by Revenue (descending). But since we have an extra condition here, we can’t do that. Instead, we’ll use another mathematical function here - the window function. This will help us apply grouping before computing rank.

To do this,

  1. Go to Data Preparation > Numeric Function > Window Function > Rank
  2. Use the following configuration:

Rank configuration in Window Function

This will rank the rows within the grouping, here within each brand, and limit them to the top 3 results.

Step 3: Creating final list of top-performing models

Finally, we need this list as a comma-separated list like:

Concatenated result

To get this:

  1. Go to Data Preparation > Text Function > Concatenate Values
  2. Use the following configuration:

And you’re done.

Besides this

Mammoth provides a range of data preparation options that helps you handle tricky data problems just like that. Many companies have opted for Mammoth to cut time and make their analysis efficient.

Giants like Starbucks, Nielsen, etc have discovered the power of Mammoth to solve numerous data problems. So have companies like Rethink, ILG, and Everest Detection. Check out these case studies to understand how companies are using Mammoth to solve their unique data problems.

If you want to take the tool for a spin, get in touch.