Data Science Models

When product usage data explodes to the 10s billions of clickstream events, basic regression techniques don't just underperform but FAIL.

Here's what we do instead:

  • Reduce complexity: Remove noisy user behaviours and zero in on the most important events. We've noticed that more than 50% of all clicks have little to no impact on conversion - we get rid of these first.
  • Feature engineering: This is our secret sauce. NOT the model itself as most will assume. What historic timeframes for an activity to consider, sequences of actions, intelligent clubbing of related behaviour together, creating new aggregated events, and we're not even past the first few hours of a "Day in the Life of" our Data Scientists ⏳
  • Modelling: In ML parlance, predicting conversions is a "classification problem" where decision tree-based models perform the best. Using a combination of such models enables us to take care of non-linearities and sparsity. Put simply, while all user behaviours are equal... some are more equal than others.
  • Explainability: Imagine a model that tells you someone is going to convert, and they actually do! Pretty cool right? Now, imagine a model that tells you someone is going to convert and also why! They've invited 5 friends, created 3 folders, interacted with your paywall 2 times, work in a mid-sized organisation, and are based in NYC, USA. The latter sounds a whole lot like Toplyne