LOAD: Local Optimal Adjustments Discovery

Statistically Efficient Causal Inference

After causal discovery, the causal effect of a treatment \(X\) on an outcome \(Y\) can be estimated by adjusting for a set of covariates according to the estimated graph [1]. There may be multiple valid adjustment sets that all yield unbiased estimates of the causal effect, but differ in their statistical efficiency, i.e., the variance of the estimates. The valid adjustment set with the lowest asymptotic variance is called the optimal adjustment set [2, 3], and can be identified graphically as the parents of the outcome and the mediators minus the forbidden nodes: \[ \text{Oset}(X,Y) = \text{Pa}(\{Y\} \cup \text{Med}(X, Y)) \setminus \underbrace{\text{PossDe}(\{Y\} \cup \text{Med}(X, Y))}_{\text{forbidden nodes}}. \]

effect estimation withdifferent adjustment sets

The parent and canonical adjustment sets are valid, but less efficient than the optimal adjustment set. No valid adjustment set may contain the forbidden nodes.

Inspired by [4], we show that after projecting out all possible descendats of the treatment, except the outcome, the optimal adjustment set can be identified as the parents of the outcome in the resulting graph. This makes it easier to find the optimal adjustment in large graphs with many paths from the treatment to the outcome. However, the possible descendants of the treatment still have to be determined first.

After projecting out the possible descendants of the treatment, the optimal adjustment set can be identified as the parents of the outcome in the resulting graph.

Finding the optimal adjustment set, even via the projected graph, traditionally requires learning the estimated causal graph over all variables first, which is computationally expensive and may be unnecessary. In this work, we show that the optimal adjustment set can be identified by learning only the local neighborhoods [5] around a few variables, which is both computationally efficient and statistically efficient.

Local Tests of Causal Relations and Identifiability

We begin by learning the causal relation between the target pair \(X\) and \(Y\) from their local neighborhoods, building on results from [6], who show that if \(X \not\kern-4pt{\perp\kern-6pt\perp}\ Y | \text{Pa}(X)\) then \(X\) is a possible ancestor of \(Y\), and if \(X \not\kern-4pt{\perp\kern-6pt\perp}\ Y | \text{Pa}(X) \cup \text{Sib}(X)\) then \(X\) is also an explicit ancestor of \(Y\). We show that if no target is the explicit ancestor of the other, then the causal effect is not identifiable. Otherwise, the explicit ancestor is established as the treatment whose causal effect on the other target, the outcome, may be identifiable.

The local neighborhood of \(X\) can be used to determine its causal relation to \(Y\) [6], no matter how far apart they are.

Once a target is determined as an explicit ancestor of the other, we can fully determine the identifiability of the causal effect. To do this, we establish a test based only on the local neighborhoods of the siblings of the treatment and show that the causal effect is identifiabe if and only if \[ \forall V \in Sib(X): V {\perp\kern-6pt\perp}\ Y | \text{Pa}(V) \cup \{X\}. \]

We can locally test the identifiability of the causal effect from the local neighborhoods of the siblings of the treatment.

If the causal effect turns out to be not identifiable at any of these steps, we terminate LOAD early and return the locally valid parent adjustment sets [7] for the possible treatments, i.e., targets that are possible ancestors of the other. This might be one target, but can also be both targets, or neither of them. If a target is not a possible ancestor of the other, then we know that it has no causal effect on the other without adjustment.

Local Optimal Adjustments Discovery

If the causal effect is identifiable, we proceed to find the optimal adjustment set. To do this, we first collect the possible descendants of the treatment. We can locally test if a node is a possible descendants of the treatment using its already learned local neighborhood. This requires only a single test per node, regardless of how far apart it is from the treatment. Finally, we learn the local neighborhood of the outcome again, but this time we project out the possible descendants of the treatment by simply considering the marginal distribution over the remaining nodes. Then, the optimal adjustment set is identified as the parents of the outcome.

The optimal adjustment set is the parents of the outcome after collecting and projecting out the possible descendants of the treatment.

By dragging the slider below, you can follow along the steps of LOAD on the running example. It begins by performing local causal discovery on the targets to determines their causal relation and establish \(X\) as the treatment and \(Y\) as the outcome. Then, it confirms the identifiability of the causal effect via the siblings of \(X\), \(\{V_3\}\). LOAD proceeds to collect and project out the possible descendants of the treatment. Finally, it performs local causal discovery on the outcome \(Y\) in the projected graph and identifies the optimal adjustment set as its parents \(V_5\) and \(V_6\).

Follow along the steps of LOAD by dragging the slider.

Experiments

We evaluate LOAD on synthetic and realistic data and perform several ablation studies. We showcase only the main synthetic data results here. We compare LOAD to several global- and local causal discovery methods in terms of the computational efficiency (CI tests and time) and statistical efficiency (Oset recovery and estimation error). We evaluate all methods on different domains of data and CI tests.

Results for computational efficiency. Lower is better for both CI tests and time.

Our results show that LOAD performs on par with local methods in terms of computational efficiency and is much more scalable than global methods both in terms of CI tests and computation time. On the other hand, LOAD provides the same theoretical guarantees for finding the optimal adjustment set as global methods. By only relying on local information, LOAD can even discovery the optimal adjustment set with more accuracy than global methods, which leads to one of the best estimation errors across all methods and domains.

Results for structural and statistical efficiency. Higher is better for the F1 score of Oset recovery, and lower is better for the intervetion distance, i.e., estimation error.

Our ablation studies show that the performance of LOAD is robust over different number of samples and graph densities, and remains one of the best in scenarios where the causal relation between the targets is known or the causal effect is ensured to be identifiable.

Local Causal Discovery for Statistically Efficient Causal Inference

Abstract

Statistically Efficient Causal Inference

Local Tests of Causal Relations and Identifiability

Local Optimal Adjustments Discovery

Experiments

References