For discrete, contractual business settings where it is easiest to calculate CLV.
import numpy as np
import pandas as pd
from scipy.optimize import minimize
import scipy.special as sc
import os
from IPython.display import display, Image
import plotly.graph_objs as go
from sBG import compute_probabilities, log_likelihood, maximize, forecast
df = pd.read_csv('../data/sBG-1.csv')
data = (df.loc[0:3, 'Regular'] / 1000).to_list()
data
gamma, delta = maximize(data)
gamma, delta
gamma and delta of $0.76$ and $1.29$
# Expectation
gamma / (gamma + delta)
predictions = forecast(data, 12)
df.head()
Image('../images/discrete-contractual-figure-1.png')
Customers in this cohort are living longer than the model suggests. This seems to imply negative duration dependence in this context as the longer these customers stay, the probability of them dying seems to go down.
DEL
There are two ways to calculate the DEL (Discounted Expected Lifetime):
Easy Way
The equation on slide 64 is $$DEL = \sum_{t=0}^\infty \frac{S(t)}{(1+d)^t}$$
where $S(t)$ is the proportion of survival at time $t$ (what \% of people are alive at time $t$) , and $d$ is the discount rate per year (assume to be constant as well).
discount_rate = 0.1
cashflow = 100
def DEL(data, discount_rate):
'''Function that takes in discrete time survival data, fits
a sBG and calculates the discounted expected lifetime for 1000 periods'''
survival = [1.0] + forecast(data, 999).to_list()
discount = []
for t in range(1000):
discount.append(1 / (1+discount_rate)**t)
return np.sum(np.array(survival) * np.array(discount))
DEL(data, discount_rate)
result_del = sc.hyp2f1(1, delta, gamma + delta, 1 / (1+discount_rate))
result_del
Probably the more elegant way to code this
result_del * cashflow
DERL
The Discounted Expected Residual Lifetime. For instance, if we are at the end of Year $n$, what is the expected residual lifetime value of an alive customer?
For the BG model, $$DERL(\gamma, \delta, d, n - 1) = \sum_{t = n}^{\infty} \frac{S(t | \gamma, \delta) / S(n-1 | \gamma, \delta)}{(1+d)^t}$$
def DERL1(data, discount_rate, n):
'''Calculates the discounted expected residual lifetime which is the expected lifetime
given a customer has been alive for n periods or has done n-1 renewals'''
survival = [1.0] + forecast(data, 999).to_list()
sliced_survival = np.array(survival[n:])
# Compute conditional surival array S(t|t > n-1)
cond_survival = []
for i in range(1000-n):
cond_survival.append(sliced_survival[i]/survival[n-1])
# Compute discount array
discount = []
for t in range(1000-n):
discount.append(1 / (1+discount_rate)**t)
return np.sum(np.array(cond_survival) * np.array(discount))
# DERL for customer who has survived 5 periods (renewed 4 times) - Slide 71 Lecture 6
DERL1(data, discount_rate, 5)
In other words, if we were to project ahead what is the probability of a customer staying with us for 5, 10, 20 more years (residual years), the average number of renewals a customer would make given that they have already renewed 4 times/survived 5 periods is $5.68$.
def DERL2(data, discount, n):
gamma, delta = maximize(data)
return (delta+n-1)/(gamma+delta+n-1) * sc.hyp2f1(1, delta+n, gamma+delta+n, 1 / (1+discount_rate))
DERL2(data, discount_rate, 5)
Also the cleaner way to do this compared to using for loop and capping at $t=1000$
Image('../images/discrete-contractual-figure-2.png')
If we want to include duration dependence into the model, we can use the Beta-discrete Weibull. However, I'm too lazy to implement that right now. Also, I might want to implement the continuous version of this model at some point..