Cheatsheet of Latex Code for Most Popular Causual Inference and Uplift modelling Equations
Navigation
In this blog, we will summarize most fundamental concepts and equations of causal inference and uplift modelling equations. Causal inference has attracted much more attention in the modern machine learning, statistics community. Generally speaking, uplift modelling is a group of methods to estimate the effect of an action(covariate) on the final outcome. Some basic concepts will be discussed in the following post, including ATE(Average Treatment Effect), CATE(Conditional Average Treatment Effect), Unconfoundness assumption or (CIA conditional independence assumption). Let W_i denotes the indicator function that whether instance is assigned to control group (W_i=0) or treatment group (W_i=1). We denote Yi(0) as the potential outcome of instance X_i if it is assigned to control group(W_i=0), and Yi(1) as the potential outcome if it's assigned to treatment group (W_i=1).
- 1. Basic Concepts of Causal Inference
- 1.1 Average Treatment Effect(ATE)
- 1.2 Individual Treatment Effect(ITE)
- 1.3 Conditional Average Treatment Effect(CATE)
- 1.4 Propensity Score
- 1.5 Unconfoundedness Assumption(CIA)
- 2. Models
- 2.1 S-Learner
- 2.2 T-Learner
- 2.3 X-Learner
- 3. Metrics
- 3.1 Area Under Uplift Curve(AUUC)
- 3.2 QINI
1. Basic Concepts of Causal Inference
-
1.1 Average Treatment Effect(ATE)
Equation
Latex Code
\text{ATE}:=\mathbb{E}[Y(1)-Y(0)]
Explanation
Average Treatment Effect(ATE) is defined as the expectation of the difference between the treatment group Y(1) and control group Y(0)
-
1.2 Individual Treatment Effect(ITE)
Equation
Latex Code
\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)
Explanation
Individual Treatment Effect(ITE) is defined as the difference between the outcome of treatment group Y_i(1) over the outcome of control group Y_i(0) of the same instance i. There exists a fundamental problem that we can't observe Y_i(1) and Y_i(0) at the same time because each instance item i can only be assigned to one experiment of control group or treatment group, but never both. So we can't observe the individual treatment effect(ITE) directly for each instance i.
-
1.3 Conditional Average Treatment Effect(CATE)
Equation
Latex Code
\tau(x):=\mathbb{E}[Y(1)-Y(0)|X=x]
Explanation
Since we can't observe ITE of item i directly, most causal inference models estimate the conditional average treatment effect(CATE) conditioned on item i (X=x_{i}).
-
1.4 Propensity Score
Equation
Latex Code
e := p(W=1|X=x)
Explanation
The propensity score is defined as the degree of propensity or likelihood that instance i is assigned to treatment group W=1.
-
1.5 Unconfoundedness Assumption
Equation
Latex Code
\{Y_{i}(0),Y_{i}(1)\}\perp W_{i}|X_{i}
Explanation
The unconfoundedness assumption or CIA(Conditional Independence assumption) assume that there are no hidden confounders between (Y(0),Y(1)) vector and treatment assignment vector W, conditioned on input X.
2. Models
-
2.1 S-Learner
Equation
Latex Code
\mu(x,w)=\mathbb{E}[Y_{i}|X=x_{i},W=w] \\ \hat{\tau}(x)=\hat{\mu}(x,1)-\hat{\mu}(x,0)
Explanation
S-Learner use a single machine learning estimator \mu(x,w) to estimate outcome Y directly. And the treatment assigment variable W=0,1 is treated as features of S-learner models. The CATE estimation is calculated as the difference between two outputs given the same model \mu and different inputs features of W, namely w=1 and w=0.
-
2.2 T-Learner
Equation
Latex Code
\mu_{0}(x)=\mathbb{E}[Y(0)|X=x],\mu_{1}(x)=\mathbb{E}[Y(1)|X=x],\\ \hat{\tau}(x)=\hat{\mu}_{1}(x)-\hat{\mu}_{0}(x)
Explanation
T-Learner models use two separate models to fit the dataset of control group W=0 and dateset of treatment group W=1. The CATE estimation is calculated as the difference between two outputs given same input x and different models \mu_0 and \mu_1.
-
2.3 X-Learner
Equation
Latex Code
\tilde{D}^{1}_{i}:=Y^{1}_{i}-\hat{\mu}_{0}(X^{1}_{i}),\tilde{D}^{0}_{i}:=\hat{\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\ \hat{\tau}(x)=g(x)\hat{\tau}_{0}(x) + (1-g(x))\hat{\tau}_{1}(x)
Explanation
See this paper for more details of X-learner Metalearners for estimating heterogeneous treatment effects using machine learning
3. Metrics
-
3.1 Area Under Uplift Curve(AUUC)
Equation
Latex Code
f(t)=(\frac{Y^{T}_{t}}{N^{T}_{t}} - \frac{Y^{C}_{t}}{N^{C}_{t}})(N^{T}_{t}+N^{C}_{t})
Explanation
Authors in this paper Causal Inference and Uplift Modeling A review of the literature defines AUUC coefficient as the area under the uplift curve.
-
3.2 QINI
Equation
Latex Code
g(t)=Y^{T}_{t}-\frac{Y^{C}_{t}N^{T}_{t}}{N^{C}_{t}},\\ f(t)=g(t) \times \frac{N^{T}_{t}+N^{C}_{t}}{N^{T}_{t}}
Explanation
Author in this paper Using control groups to target on predicted lift: Building and assessing uplift model defines Qini coefficint as the area under the QINI curve, which is more suitable for the unbalanced samples size of control group and treatment group.