X

Cheatsheet of Latex Code for Most Popular Causual Inference and Uplift modelling Equations

Navigation

In this blog, we will summarize most fundamental concepts and equations of causal inference and uplift modelling equations. Causal inference has attracted much more attention in the modern machine learning, statistics community. Generally speaking, uplift modelling is a group of methods to estimate the effect of an action(covariate) on the final outcome. Some basic concepts will be discussed in the following post, including ATE(Average Treatment Effect), CATE(Conditional Average Treatment Effect), Unconfoundness assumption or (CIA conditional independence assumption). Let W_i denotes the indicator function that whether instance is assigned to control group (W_i=0) or treatment group (W_i=1). We denote Yi(0) as the potential outcome of instance X_i if it is assigned to control group(W_i=0), and Yi(1) as the potential outcome if it's assigned to treatment group (W_i=1).

1. Basic Concepts of Causal Inference

  • 1.1 Average Treatment Effect(ATE)

    Equation


    Latex Code
                \text{ATE}:=\mathbb{E}[Y(1)-Y(0)]
            
    Explanation

    Average Treatment Effect(ATE) is defined as the expectation of the difference between the treatment group Y(1) and control group Y(0)

  • 1.2 Individual Treatment Effect(ITE)

    Equation


    Latex Code
                \text{ITE}_{i}:=Y_{i}(1)-Y_{i}(0)
            
    Explanation

    Individual Treatment Effect(ITE) is defined as the difference between the outcome of treatment group Y_i(1) over the outcome of control group Y_i(0) of the same instance i. There exists a fundamental problem that we can't observe Y_i(1) and Y_i(0) at the same time because each instance item i can only be assigned to one experiment of control group or treatment group, but never both. So we can't observe the individual treatment effect(ITE) directly for each instance i.

  • 1.3 Conditional Average Treatment Effect(CATE)

    Equation


    Latex Code
                \tau(x):=\mathbb{E}[Y(1)-Y(0)|X=x]
            
    Explanation

    Since we can't observe ITE of item i directly, most causal inference models estimate the conditional average treatment effect(CATE) conditioned on item i (X=x_{i}).

  • 1.4 Propensity Score

    Equation


    Latex Code
                e := p(W=1|X=x)
            
    Explanation

    The propensity score is defined as the degree of propensity or likelihood that instance i is assigned to treatment group W=1.

  • 1.5 Unconfoundedness Assumption

    Equation


    Latex Code
                \{Y_{i}(0),Y_{i}(1)\}\perp W_{i}|X_{i}
            
    Explanation

    The unconfoundedness assumption or CIA(Conditional Independence assumption) assume that there are no hidden confounders between (Y(0),Y(1)) vector and treatment assignment vector W, conditioned on input X.

2. Models

  • 2.1 S-Learner

    Equation


    Latex Code
                \mu(x,w)=\mathbb{E}[Y_{i}|X=x_{i},W=w] \\
                \hat{\tau}(x)=\hat{\mu}(x,1)-\hat{\mu}(x,0)
            
    Explanation

    S-Learner use a single machine learning estimator \mu(x,w) to estimate outcome Y directly. And the treatment assigment variable W=0,1 is treated as features of S-learner models. The CATE estimation is calculated as the difference between two outputs given the same model \mu and different inputs features of W, namely w=1 and w=0.

  • 2.2 T-Learner

    Equation


    Latex Code
                \mu_{0}(x)=\mathbb{E}[Y(0)|X=x],\mu_{1}(x)=\mathbb{E}[Y(1)|X=x],\\
                \hat{\tau}(x)=\hat{\mu}_{1}(x)-\hat{\mu}_{0}(x)
            
    Explanation

    T-Learner models use two separate models to fit the dataset of control group W=0 and dateset of treatment group W=1. The CATE estimation is calculated as the difference between two outputs given same input x and different models \mu_0 and \mu_1.

  • 2.3 X-Learner

    Equation


    Latex Code
    \tilde{D}^{1}_{i}:=Y^{1}_{i}-\hat{\mu}_{0}(X^{1}_{i}),\tilde{D}^{0}_{i}:=\hat{\mu}_{1}(X^{0}_{i})-Y^{0}_{i}\\
    \hat{\tau}(x)=g(x)\hat{\tau}_{0}(x) + (1-g(x))\hat{\tau}_{1}(x)
            
    Explanation

    See this paper for more details of X-learner Metalearners for estimating heterogeneous treatment effects using machine learning

3. Metrics

Comments

Write Your Comment

Upload Pictures and Videos