First things first: What’s it?
Regression analysis is a method that helps you determine which factors have the most significant impact on an outcome.
In the context of a customer loyalty survey you’ve built using HubSpot, the outcome we are focused on is the Net Promoter Score (NPS) — the metric used to gauge how likely customers are to recommend your store. While NPS gives you an overall view of customer loyalty, regression analysis allows you to go deeper and uncover what influences that score.
For instance, your customer surveys might ask about product variety, staff responsiveness, or value for money. But simply looking at the responses to these questions doesn’t tell you which factors truly drive NPS. Regression helps you break this down and identify the most important contributors.
***We’ll use the terms customer loyalty and NPS interchangeably in the article; therefore, this also applies to the “NPS survey” and “customer loyalty survey.”
What’s the challenge?
Imagine you’ve collected data from a customer loyalty survey where you asked a set of questions about different aspects of the customer experience, such as:
How do you perceive the variety of products we offer? (Rating 1–10)How frequently do you shop at our store? (Frequency per month)How responsive and knowledgeable did you find our staff? (Rating 1–10)How would you rate the overall value for money? (Rating 1–10)On a scale of 0 to 10, how likely are you to recommend our store to friends and family? (This is the NPS question)Let’s assume you’ve collected over 100 responses. For each response, you have ratings on these different aspects and an NPS score.
If you were to look at the raw survey data, you might be able to see patterns in how customers rate their experience. For example, you might notice that customers who rated staff responsiveness highly also tended to have higher NPS scores. But is staff responsiveness really the most important factor affecting NPS? What about value for money or product variety?
This is the challenge of analyzing surveys — simply looking at averages or individual responses doesn’t give you the whole picture. It’s hard to know which factors are statistically significant drivers of customer loyalty and how much each factor influences NPS.
But isn’t customer loyalty analysis straightforward?
At first glance, it seems that way, of course. You collect feedback from your customers, calculate their Net Promoter Score (NPS), and categorize them as Promoters, Passives, or Detractors based on their likelihood to recommend your store. If your NPS is high, everything’s going well; if it’s low, you need to improve. It sounds simple enough, right?
The reality, however, is that many factors influence customer loyalty, and focusing solely on the NPS score doesn’t give you the whole picture. While calculating the NPS score is easy, figuring out why customers rate their experience the way they do and which aspects of that experience drive their loyalty (or lack thereof) is much more complex.
Most marketers stop at basic NPS analysis:
They take the average NPS and compare it to industry benchmarks.They calculate the percentage of Promoters, Passives, and Detractors, assuming this gives them a clear snapshot of customer satisfaction.Some might take broad, generalized actions based on the score, such as improving customer service if the NPS is low or enhancing product offerings if they believe that’s the issue.While this type of analysis might seem sufficient, it often isn’t enough because it doesn’t explain why customers give these scores.So, what questions should you be asking?
Well, you’ll need to be asking questions that help you go beyond basic reporting:
Are Promoters happy because of the great product variety, or is the responsive staff that matters most to them?
It’s possible that customers who gave you a high NPS score did so because they loved your wide range of products. But, what if staff responsiveness played a more significant role than product variety in their overall satisfaction? If you don’t ask the right questions and analyze the data carefully, you might assume product variety is the key driver of customer happiness when, in fact, it’s the interactions with your staff that truly stand out for them.
Are Detractors unsatisfied with pricing issues, or is the store layout confusing?
Customers who give low NPS scores might cite pricing as a reason for dissatisfaction, but is that the primary issue? Are they frustrated because your store’s layout makes it hard to find what they’re looking for? Maybe pricing isn’t the main pain point, and improving the shopping experience would lead to more impactful changes in their satisfaction.
To illustrate how regression analysis can answer some of these questions, let’s walk through an elaborate example using customer survey data.
How is it relevant for HubSpot data?
For instance, let’s examine Customer Feedback and how regression analysis can enhance your understanding of customer loyalty survey (or similar) data.
***(By the way, Table 1 is a smaller view of the actual dataset we’ll be working with. But remember, this is only an illustration, and the original dataset has 1000 rows)
Table 1: A glimpse of the customer loyalty survey dataset
Product Variety: How do customers perceive the range of products you offer?Shopping Frequency: How often do they visit your store?Store Atmosphere: What do they think about the overall ambiance?Staff Responsiveness: How helpful and knowledgeable did they find your team?Value for Money: Do they feel they’re getting a good deal?Store Navigation: How easily can they find what they need?Overall Satisfaction: Their general feeling about the shopping experienceAge Range: Which age group does the customer belong to?Notification Preferences: Are they interested in receiving updates about special offers?Net Promoter Score (NPS): How likely are they to recommend your store to friends and family?As you’re already familiar, the golden nugget in this dataset is the Net Promoter Score (NPS). This score tells you how likely a customer is to recommend your store, a powerful indicator of customer loyalty and potential business growth.
But here’s the exciting part: we want to uncover which factors strongly influence this score. Is it the friendly staff? The great deals? Or perhaps the wide range of products? By analyzing this data, you’ll gain valuable insights into what really matters to your customers.
As a marketer, understanding these drivers can change how you go forward with your marketing approach/efforts. As we’ve already laid down before, here’s what it enables you to do:
Focus your efforts on what truly impacts customer loyaltyCustomize your marketing messagesIdentify areas for improvement that will have the most significant impact on customer satisfaction/loyalty!Essentially, you wouldn’t have to shoot arrows in the dark.
Okay, so what’s regression analysis, and how does it tie?
Regression analysis is a statistical method that’ll help you understand how changes in one or more factors (called independent variables) affect another factor (called the dependent variable).
In our example:
The independent variables are things like product variety, store atmosphere, and staff responsiveness.Our dependent variable is the Net Promoter Score (NPS).Let’s break it down step by step:
Regression analysis looks at how each factor (like store atmosphere) relates to the NPS. It’s like asking, “When store atmosphere scores go up, does NPS tend to go up too?”The analysis doesn’t just tell us if there’s a relationship; it tells us how strong that relationship is. It’s like saying, “For every point increase in store atmosphere, NPS tends to increase by X points.”All of this information is combined into a mathematical formula, which can predict NPS based on other factors.We use this formula on a portion of our data to see how well it predicts NPS. This tells us how reliable our formula is.Let’s say we come up with this formula:
NPS = 2 × (Staff Responsiveness) + 1.5 × (Value for Money) + …
This would mean:
Improving staff responsiveness by 1 point could increase NPS by 2 points.Improving perceived value for money by 1 point could increase NPS by 1.5 points.But we don’t know that yet. That’s what we’re about to find using regression analysis.
Perfect, so how does this work?
We’re using statistical modeling, specifically machine learning (ML) regression, as a substitute for the “formula” we mentioned earlier. Instead of manually calculating how each factor affects NPS, we let the model do the work. (Please note that the code for the regression analysis is beyond the blog’s scope for now!)
The model analyzes customer survey data (Table 1, 1000 rows) to understand how factors like Overall Satisfaction and Value for Money influence NPS (remember the independent and dependent variables we discussed?). The model looks for patterns and assigns weights to each factor, showing how much each impacts NPS.
Instead of manually figuring out these relationships, the model automatically identifies which factors matter most and predicts how changes in these areas could affect NPS. Essentially, it’s building the “formula” for you.
Table 2: Results of the regression analysis
Take a look at Table 2. What do you observe?
Overall Satisfaction (0.608): For every 1-point increase in overall satisfaction, NPS tends to increase by 0.608 points. This makes Overall Satisfaction the most powerful predictor in the model.
Value for Money (0.401): A 1-point improvement in perceived value for money could increase NPS by 0.401 points. This is the second most important factor in determining NPS.
What about other factors?
Store Atmosphere and Store Navigation have minimal positive effects on NPS, with increases of just 0.003 points and 0.001 points, respectively, for each point increase in their scores.
Shopping Frequency has almost no effect (0.00003), meaning customers’ shopping frequency doesn’t significantly predict NPS in this dataset.
Fig 1: Correlation matrix between all your variables
Take a look at Fig 1. The correlation matrix visually shows how different factors from the survey are related to each other and NPS. This is a visually friendly way to understand which factors are most strongly linked to customer loyalty. For example, our observation of Overall Satisfaction and NPS having a very high correlation (0.82) is reflected in the correlation matrix as well. Similarly, Value for Money (0.54) and Staff Responsiveness (0.77) also show positive correlations, indicating that focusing on these areas can help improve your customer loyalty.
Take a look at Table 1. What’s the R2 score at the top?
The R-squared (R²) score is a way to measure how well a regression model (or “formula”) explains the variability in the data. In simple terms, it’ll tell you how much of the changes in NPS can be predicted based on the factors we’re analyzing, like Overall Satisfaction and Value for Money. An R² score of 1 means the model perfectly predicts NPS, while an R² of 0 means the model explains none of the variability.
For example, if our R² score is 0.98, it means the factors in the model can explain 98% of the changes in NPS. This is super important because the higher the R² score, the more confident you can be in the model’s predictions.
You can also predict the NPS score from here on
Now that you know the formula from the model, you can also predict the NPS score for any customer. The model takes all the important factors — like Overall Satisfaction and Value for Money — and tells you exactly how much each one impacts NPS. As we saw, if Overall Satisfaction increases by 1 point, NPS goes up by 0.608 points. So, if you know a customer’s satisfaction score, how they feel about value for money, and other details (i.e., the impact of each variable), you can plug those into the formula and predict their NPS score.
Though it’s not super important or a replacement for your customer loyalty/NPS survey, it’s a good note!
What are the different methods for regression analysis?
Now that we understand what regression analysis is, let’s explore some common types.
Think of these as different tools in your data analysis toolkit. Just as you wouldn’t use a hammer for every home repair job, different regression methods are suited for different data types and questions.
1. Linear Regression
This is the simplest and most common type of regression. It looks for a straight-line relationship between variables.
You’ll use it when:
You expect a straightforward, linear relationship between variablesThe dependent variable (like NPS) is continuous.When you want to predict a numerical outcome.For example, you’ll use linear regression when you want to understand how an increase in customer service rating directly relates to an increase in NPS.
2. Logistic Regression
You’ll use this technique when the outcome you expect is categorical, often binary (yes/no, true/false).
When should you use it?
When you’re predicting a binary outcome.When you want to classify results into categories.Say, if we refer to the same table (Table 1), if we applied logistic regression to the same data, instead of predicting a specific NPS score, we would transform the NPS into categories, such as:
Promoters (NPS of 9–10)Detractors (NPS of 0–6)Passives (NPS of 7–8)So, instead of predicting an exact NPS score (like 7 or 9), the model would predict which category a customer falls into — whether they’re a promoter, detractor, or passive. So, let’s say the model has figured out this “formula,” and if you bring in more data (real-time), you’ll be able to predict whether a customer will be a promoter or not (Promoter = 1, Not Promoter = 0). The logistic regression model would analyze factors like Overall Satisfaction and Value for Money and predict the probability of a customer becoming a promoter. This is especially good when you’re working with limited data and want to predict which category your customer might fall into.
3.Multiple Regression
Think of multiple regression as an extension of linear regression that includes two or more independent variables. You’ll want to use this technique when:
When you have multiple factors influencing your outcome.When you want to understand the relative importance of different variables.Analyzing how product variety, store atmosphere, and staff responsiveness influence NPS. This is precisely what we did in our example above. Remember, we’re using multiple factors to predict NPS. See Fig 2.
Fig 2: Illustration of linear regression vs. multiple linear regression taken from here
Polynomial Regression
You’ll use polynomial regression when the relationship between variables is curvilinear (not a straight line). So, in this case, perhaps you’re trying to understand how customer satisfaction might increase with age up to a point and then decrease for older customers. The relationship isn’t straightforward here anymore.
Let’s say you’re analyzing how age affects NPS. In some cases, the relationship between age and satisfaction might not be linear:
Younger customers (18–25) might be less satisfied, as they might be looking for trendier or more cost-effective options.Middle-aged customers (35–45) might have higher satisfaction because the products and services match their needs.Older customers (55+) might again have lower satisfaction due to different preferences or unmet needs.In linear regression, the model would assume a straight line: NPS increases or decreases consistently as age increases. However, the relationship could be curved in real-world scenarios — increasing with age until it peaks and then reducing. See Fig 3.
Fig 3: Illustration of linear regression (or model) vs. polynomial regression taken from
5. Other techniques to read
If the relationship between variables is curved (like in polynomial regression in Point 4) or complex, linear regression techniques will struggle. For example, if customer satisfaction increases up to a point but then drops after a certain level, a straight-line approach won’t capture that. For example, improving staff responsiveness might help NPS up to a point, but once customers expect a certain level of service, further improvements won’t change their opinion much.
Here are three tree-based methods that are popular alternatives for analyzing complex relationships, especially when the linear methods we discussed above (like linear/multiple regression) don’t fit — for example, decision trees, random forests, and gradient boost regressions:
Decision trees: Great for simplicity and visualization, but they can be too focused on the specific data used to build them, which may reduce their ability to perform well on new or unseen data.Random forests: More reliable and stable for complex data but computationally heavier.Gradient boosting: Best for accuracy in tricky data, but comes with high complexity.***We’re going to look at these techniques in our future blogs!
What regression isn’t
When it comes to regression analysis in marketing, you need to understand its true potential — and, just as importantly, what it can’t do. While regression can provide powerful insights into customer behavior and campaign performance, some common misconceptions exist about how it should be applied. Regression analysis goes beyond simple correlations and requires a nuanced understanding of your data.
Here’s what you need to be careful about:
Assuming that correlation always implies causation, without considering external factors or confounding variablesApplying the same regression model to all scenarios without taking the situation itself into context.Using regression to justify your marketing decisions rather than to make your future strategies better.Treating regression analysis as a one-time activity rather than an ongoing processOvercomplicating your formula by including every possible marketing variable you’ve captured!Looks great! Give me a quick summary!
Regression analysis will help you pinpoint the exact factors that influence customer behavior, like which aspects of your service lead to higher NPS scores. Sure, the” formula” will work if the number of variables is fewer still, as we saw above, with increasing complexity, the types of ML models we use to understand relationships between the independent and dependent variables.
Techniques like linear regression, multiple regression, and more advanced ones like polynomial regression allow you to understand and predict outcomes based on customer data — think satisfaction scores, spending habits, or even how often they engage with your brand. The more advanced methods, like random forests and gradient boosting, take it a step further by digging deep into complex, non-linear relationships.
But here’s the thing. Continuously running these models manually and interpreting the results can be very cumbersome and challenging. Especially as your dataset grows or you start adding new variables. Your analytics team will be stuck crunching numbers, creating plots, and sending you reports filled with stats, so you’ll spend hours translating data into actionable insights. All of this ultimately (and unfortunately) leads to delayed decision-making and a less responsive marketing strategy. So, what’s the fix?
ConvertML makes your job easier and streamlined with automated regression analysis and interpretation
This is where ConvertML comes in to make your life easier. It automates the entire process of regression analysis, integrating survey data directly from HubSpot and various other sources you’d want to bring: surveys from different platforms, ticketing information, or external DBs in just a few clicks.
How