Why Your Sales Forecasting Fails: The Critical Role of Time in CRM Data and Machine Learning
TL;DR
Most sales analytics and ML models are built on static CRM data — effectively fossils of deals long closed.
This approach misses the most valuable signals: how a deal changed over time (stage shifts, close date moves, pricing updates, etc.).
Without understanding the journey of each deal, models struggle to predict which current opportunities will win or slip.
The fix? Start capturing how deals evolve over time — either through event sourcing, regular snapshots, or salvaging historical change logs (e.g., stage history in Salesforce).
With this richer data, you can build smarter models, get more reliable insights, and ultimately improve CAC, LTV, and revenue forecasting.
Introduction
One of the most valuable benefits of centralising your sales data in a CRM system is the ability to generate analytics and insights that help you answer critical questions—such as why certain deals are won, which opportunities to prioritise, and how likely deals are to close. Yet, I’ve seen multiple times that organisations fail to reap these benefits because the machine learning systems that have been designed to provide those insights fail to properly take into account one thing - The passage of time. In this article I’ll share with you why this occurs, how this creates a problem, and what you can do to implement a system that will provide you with truly meaningful insights that will help you better understand and improve CAC and LTV.
An analogy
An ammonite fossile
Bear with me—rather than dive into a load of technical jargon, I’m going to use an analogy to explain the challenge of extracting accurate and meaningful insights from sales data. I live in the UK and most years, as a family we like to go down to an area called the Jurassic Coast which is a world heritage site famous for cliffs which hold fossils from the Triassic, Jurassic and Cretaceous periods. As natural erosion takes place those fossils spill onto the beaches below. We like to go fossil hunting with the kids and usually we find a collection of small ammonite fossils - the classic spiral shaped sea animals. Most tend to be about the size of a coin and as fascinating as it is to think that this was once a little critter swimming about in the ocean, it’s obviously very difficult for us to know how these creatures lived because the only evidence we have that they ever existed is a piece of rock with some impressions left on it.
So unlike animals that are walking around today that we can simply observe to understand things like their habitats and habits, with fossils we have no source of reliable information that we can use and we simply have to guess what they did by comparing them to similar animals today. Imagine though if overnight we suddenly lost all cats in the world — just overnight, they had all disappeared from the world and we were left with little cat fossils littered around everywhere. Would we have lost all of our knowledge about the lives of cats? No of course not, because we have a wealth of documentary evidence in the form of written, video, audible media that tell us everything about them.
Now back in the commercial world, I think we should think of sales processes as supporting the lifecycle of opportunities and therefore we should think of deals more literally as things that have a life-span… they’re born, grow and develop and eventually die as they are closed as won or lost. The problem is that in a typical CRM system like Salesforce the deal history is just a record of what the deal is in its deceased state. Closed opportunities form a huge fossilised graveyard of deals that have come and gone from the past. Some had good lives and others did not, but just as our understanding of fossils from the animal kingdom is limited, so too is our understanding of the lives of closed deals.
Why this occurs?
CRM systems do not provide the sales equivalent of a David Attenborough documentary detailing the specific roaming habits of the 9 month life of your operational excellence project with BT - and so why would we expect to be able to learn much about the fate of deals that we are working on today if we can’t understand lives of deals from the past? The answer is simple - we can’t, and that is perhaps the primary reason why I have seen many a machine learning model fail to provide useful insights in production use.
What am I talking about? Well we use our CRM to update opportunities. We see things like the stage it’s currently in, when the last activity was, and the close date. When we have new information that we want to record, we update our deal and this overwrites what was there previously and so we lose the information that tells us how the opportunity came to be in its current state.
Sure, there may be certain fields that we can track over time, but we lack a rigorous framework that allows us to get into a time machine and go back to any point in the life cycle of the deal and see what it looked like ( its state) on any given day.
The problem it creates
So let’s pick a scenario. We’ll say that we have an opportunity that looks like the following:
ACME BT OpEx
- Amount: £100K
- Stage: Prospecting
- Close Date: 01/06/2025
- Account Type: Prospect
- Last Activity: 01/04/2025
- Stage Duration: 15 days
- Next Steps: Meet with John next Friday
Now I’ve taken the liberty of adding in some common custom fields, but let’s think about what happens when we’ve met with the prospect and we’ve agreed to send them a proposal for a project that we’ve identified. In that scenario our new updated opportunity might look like the following:
ACME BT OpEx
- Amount: £80K
- Stage: Proposal
- Close Date: 01/07/2025
- Account Type: Prospect
- Last Activity: 10/04/2025
- Stage Duration: 1 day
- Next Steps: Send proposal to John
- Must win deal (Yes/No): Yes
In our updated deal the amount has changed, because we’ve been able to better qualify the opportunity, the last activity was changed to take account of the last meeting that we had, the stage has changed to Proposal to indicate that we are now sending the prospect a proposal and the stage duration has changed accordingly. The next steps have changed and we also have a new field that was added to the CRM yesterday to indicate whether this is a must win deal and in this case it is.
We have lost all the details of what the deal looked like before this last set of changes and herein lies the problem: knowing what it looked like is the detail that we need in order to infer or predict what will happen to future opportunities that have similar characteristics.
Machine learning models need to be trained and we train them using historical data. In this case if we want to train a model that can predict the outcome of an opportunity then we must supply it with a dataset of historical deals that we have created and closed in the past and know their outcomes.
Using a dataset which consists of the fossilised remains of opportunities past is not much good, because it doesn’t help us learn the signals that give an indication of the final fate of each deal. What we need to know are the states that the opportunity passed through when it was “alive” so to speak. Or as a marketing colleague kept on saying: “it’s all about the journey rather than the destination”. Every time we update our CRM and overwrite the previous details of the opportunity we lose precious information that we might be able to use to train more effective predictive analytics
So think about what our example deal might look like when it’s closed. Let’s assume it’s won.
ACME BT OpEx
- Amount: £83K
- Stage: Closed - Won
- Close Date: 04/07/2025
- Account Type: Client
- Last Activity: 10/07/2025
- Stage Duration: 13 days
- Next Steps: Finalise Contracts
- Must win deal (Yes/No): Yes
If this is all we have then we’ve lost the history of how the pricing changed, how many times it slipped, how frequent the activity was on the deal, the time spent in each stage, we even don’t know whether the account was a first time buyer or was already an existing customer. That’s a lot of useful information that we are missing that we could be incorporating into our training material for our machine learning model to learn from.
Steps to fix it
So how do we fix this? We need to be able to reconstruct the opportunities and everything that was related to them as they were at various points when the opportunity was alive. In an ideal world we want to know the state of any opportunity at any point in time and a way of achieving this is with a system known as event sourcing, which would track and store every single change to any deal.
Now this may not be practical, but we don’t need to go that far as we could take regular snapshots of all open opportunities and record them in a separate database.
Of course the downside to these approaches is that they’re not things that can be applied retrospectively, and so you would need to implement these systems and leave them running for a period of time until sufficient historical data has accumulated to build a training set that could be used to feed your machine learning model.
Finally, you may want to consider assessing if there is any historical information that you can salvage. For example Salesforce maintains a history of stage changes, close dates and amounts, and so with some careful data wrangling it’s possible to at least partially reconstruct the data of an opportunity. You can then combine it with data that is immutable and does not change like the created date and this may well be sufficient information to teach a machine learning model enough about your sales process that would make it’s insights useful.
Conclusion
If we want to understand and influence the outcomes of deals in our pipeline, we need more than just a snapshot of their final resting place. We need a living record of how they evolved — the changes, the momentum shifts, the stalls, and the signals that pointed to success or failure. Without that, we’re trying to predict the future by staring at fossils.
Whether you’re building machine learning models to forecast deal outcomes or simply trying to get better visibility into your pipeline, the key is treating deals as living processes rather than static records. That means capturing their history, not just their end states.
By shifting how we collect and structure our sales data — even with relatively simple techniques like regular snapshots or careful use of CRM audit trails — we can build a far richer, more predictive understanding of what drives revenue. And with that, comes the ability to reduce your CAC, improve win rates, and better allocate your team’s energy to the deals that truly matter.
Because in the end, the journey is the insight.