Cheatsheet of Latex Code for Most Popular Causual Inference and Uplift modelling Equations

rockingdingo 2024-08-25 23:05 #causal inference #uplift modelling #auuc #qini

Navigation

In this blog, we will summarize most fundamental concepts and equations of causal inference and uplift modelling equations. Causal inference has attracted much more attention in the modern machine learning, statistics community. Generally speaking, uplift modelling is a group of methods to estimate the effect of an action(covariate) on the final outcome. Some basic concepts will be discussed in the following post, including ATE(Average Treatment Effect), CATE(Conditional Average Treatment Effect), Unconfoundness assumption or (CIA conditional independence assumption). Let W_i denotes the indicator function that whether instance is assigned to control group (W_i=0) or treatment group (W_i=1). We denote Yi(0) as the potential outcome of instance X_i if it is assigned to control group(W_i=0), and Yi(1) as the potential outcome if it's assigned to treatment group (W_i=1).

1. Basic Concepts of Causal Inference

1.1 Average Treatment Effect(ATE)

1.2 Individual Treatment Effect(ITE)

1.3 Conditional Average Treatment Effect(CATE)

1.4 Propensity Score

1.5 Unconfoundedness Assumption(CIA)

2. Models

2.1 S-Learner

2.2 T-Learner

2.3 X-Learner

3. Metrics

3.1 Area Under Uplift Curve(AUUC)

3.2 QINI

1. Basic Concepts of Causal Inference

1.1 Average Treatment Effect(ATE)

Equation

$\text{ATE}:=\mathbb{E}[Y(1)-Y(0)]$

Latex Code

\text{ATE}:=\mathbb{E}[Y(1)-Y(0)]

Explanation

Average Treatment Effect(ATE) is defined as the expectation of the difference between the treatment group Y(1) and control group Y(0)
1.2 Individual Treatment Effect(ITE)

Equation

$\text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)$

Latex Code
```
            \text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)
        
```
Explanation

Individual Treatment Effect(ITE) is defined as the difference between the outcome of treatment group Y_i(1) over the outcome of control group Y_i(0) of the same instance i. There exists a fundamental problem that we can't observe Y_i(1) and Y_i(0) at the same time because each instance item i can only be assigned to one experiment of control group or treatment group, but never both. So we can't observe the individual treatment effect(ITE) directly for each instance i.
1.3 Conditional Average Treatment Effect(CATE)

Equation

$\tau(x):=\mathbb{E}[Y(1)-Y(0)|X=x]$

Latex Code
```
            \tau(x):=\mathbb{E}[Y(1)-Y(0)|X=x]
        
```
Explanation

Since we can't observe ITE of item i directly, most causal inference models estimate the conditional average treatment effect(CATE) conditioned on item i (X=x_{i}).
1.4 Propensity Score

Equation

$e := p(W=1|X=x)$

Latex Code
```
            e := p(W=1|X=x)
        
```
Explanation

The propensity score is defined as the degree of propensity or likelihood that instance i is assigned to treatment group W=1.
1.5 Unconfoundedness Assumption

Equation

$\{Y_{i}(0),Y_{i}(1)\}\perp W_{i}|X_{i}$

Latex Code
```
            \{Y_{i}(0),Y_{i}(1)\}\perp W_{i}|X_{i}
        
```
Explanation

The unconfoundedness assumption or CIA(Conditional Independence assumption) assume that there are no hidden confounders between (Y(0),Y(1)) vector and treatment assignment vector W, conditioned on input X.

2. Models

2.1 S-Learner

Equation

$\mu(x,w)=\mathbb{E}[Y_{i}|X=x_{i},W=w] \\ \hat{\tau}(x)=\hat{\mu}(x,1)-\hat{\mu}(x,0)$

Latex Code
```
            \mu(x,w)=\mathbb{E}[Y_{i}|X=x_{i},W=w] \\
            \hat{\tau}(x)=\hat{\mu}(x,1)-\hat{\mu}(x,0)
        
```
Explanation

S-Learner use a single machine learning estimator \mu(x,w) to estimate outcome Y directly. And the treatment assigment variable W=0,1 is treated as features of S-learner models. The CATE estimation is calculated as the difference between two outputs given the same model \mu and different inputs features of W, namely w=1 and w=0.
2.2 T-Learner

Equation

$\mu_{0}(x)=\mathbb{E}[Y(0)|X=x],\mu_{1}(x)=\mathbb{E}[Y(1)|X=x], \\ \hat{\tau}(x)=\hat{\mu}_{1}(x)-\hat{\mu}_{0}(x)$

Latex Code
```
            \mu_{0}(x)=\mathbb{E}[Y(0)|X=x],\mu_{1}(x)=\mathbb{E}[Y(1)|X=x],\\
            \hat{\tau}(x)=\hat{\mu}_{1}(x)-\hat{\mu}_{0}(x)
        
```
Explanation

T-Learner models use two separate models to fit the dataset of control group W=0 and dateset of treatment group W=1. The CATE estimation is calculated as the difference between two outputs given same input x and different models \mu_0 and \mu_1.

2.3 X-Learner

Equation

$\tilde{D}^{1}_{i}:=Y^{1}_{i}-\hat{\mu}_{0}(X^{1}_{i}),\tilde{D}^{0}_{i}:=\hat{\mu}_{1}(X^{0}_{i})-Y^{0}_{i} \\ \hat{\tau}(x)=g(x)\hat{\tau}_{0}(x) + (1-g(x))\hat{\tau}_{1}(x)$

Latex Code

\tilde{D}^{1}_{i}:=Y^{1}_{i}-\hat{\mu}_{0}(X^{1}_{i}),\tilde{D}^{0}_{i}:=\hat{\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\
\hat{\tau}(x)=g(x)\hat{\tau}_{0}(x) + (1-g(x))\hat{\tau}_{1}(x)