Step One of Refocusing Your Organization Around CLV: De-silo Your Data

Brady Walker

The following is an excerpt from our new book, The Chance of a Lifetime: How to Use Customer Lifetime Value Reporting to Grow Your Retail Business. To read the book in its entirety, get it here.

At first glance, this might look like an intimidating step. You’ve got point of sales data, purchase history, online browsing behavior, demographic data, social media metrics, email engagement metrics, and so on. It’s a lot to pull together.

Prioritize Data Sources

We believe that your most valuable data by far is individual-level purchase data. Why? The actual exchange of money shows strong intention and is the fundamental starting point for how cash will flow in the future.

Within the purchase history, there are nuances of transactional behavior that tend to be highly correlated with CLV. The treasure trove of informative data points embedded just in your online and offline transactional history include things like:

  • What products they buy
  • Product purchase history
  • Channels of entry
  • Promotions
  • Shipping and payment methods

Other types of data that may be useful but aren't as high-priority as transactional history are things like customer demographic data (geo-location, age, gender, marital status, income, etc.) and non-purchase engagement behavior (e.g., how they interact with your loyalty program, email open and click data, site browse data).

But you don't need to unify ALL of your data before you can reap any benefit. Taking a pragmatic approach, focus strategically on the data sources that are most important for CLV modeling. You can enrich the high-priority data by adding new dimensions over time.

Let’s say you have a transactional file that looks something like this:

User ID

Order Date








And then you have a users file that looks something like this:

Email address




If we're going to create a unified profile of each user, we need some way of linking our users back to their purchase behavior. So what we mean by "stitching it together" is to identify the common identifiers that can be used across data sources.

So for the users file here, what we might want to do is add a column for the user ID.

Cleanse Your Data

There are three major categories of data cleansing: data standardization, data validation, data deduplication and consolidation.


1. Data Standardization 

Data standardization creates uniformity by grouping like values in a set. In the world of e-commerce, there are countless instances where slight variations in data will carry the same operational value.

For example, when inputting a shipping address, Jayne Dough might write out “street” for her first purchase but then, for her second purchase, she abbreviates it as “st,” thus creating two records.

Recognizing that these data points represent the same person ensures that

data is organized based on standardized criteria rather than meaningless differences.


2. Data Validation

Data validation processes guarantee that data makes sense against all governing business rules. An obvious glitch in data might appear if, for instance, the assigned date of return is actually earlier than the date of purchase. These are the kinds of things that need to be fixed before modeling can begin.


3. Data Deduplication and Consolidation

Data deduplication and consolidation eliminate redundant pieces of information and provide a retailer with a single definitive set of records. Should the same customer check out as a guest three times and input slightly different variations of her name or address each time, the model should recognize these variable inputs as coming from the same person and consolidate them into a single user profile.

Best Practice for Data Unification

Focus on Scalability

Making CLV usable for your organization is not a one-time exercise. You're going to want to keep those CLV scores refreshed and updated. From a de-siloing perspective, that means building scalable ETL (extract-transform-load) processes, which is basically the process of moving and unifying data.

If you'd like to learn more about how to prep your data for predictive modeling, see our webinar, Jumping the 3 Big Hurdles to Predictive Modelinghosted by Custora Head of Product Marketing Jordan Elkind,

And to learn more about how Customer Lifetime Value reporting can help you to efficiently and effectively grow your business across customer lifecycles phases, download our new book, The Chance of a Lifetime


Previous Article
Segmentation Strategies for Winning Back Churned Customers
Segmentation Strategies for Winning Back Churned Customers

Welcome to the latest installment of our series on segmenting best practices and use cases. Over the last t...

Next E-book
The Chance of a Lifetime
The Chance of a Lifetime