Local Causal Discovery for Statistically Efficient Causal Inference

Mátyás Schubert1, Tom Claassen2, Sara Magliacane1

1University of Amsterdam
2Radboud University Nijmegen

Artificial Intelligence and Statistics (AISTATS), 2026

Paper Code

Abstract

Causal discovery methods can identify valid adjustment sets for causal effect estimation for a pair of target variables, even when the underlying causal graph is unknown. Global causal discovery methods focus on learning the whole causal graph and therefore enable the recovery of optimal adjustment sets, i.e., sets with the lowest asymptotic variance, but they quickly become computationally prohibitive as the number of variables grows. Local causal discovery methods offer a more scalable alternative by focusing on the local neighborhood of the target variables, but are restricted to statistically suboptimal adjustment sets. In this work, we propose Local Optimal Adjustments Discovery (LOAD), a sound and complete causal discovery approach that combines the computational efficiency of local methods with the statistical optimality of global methods. First, LOAD identifies the causal relation between the targets and tests if the causal effect is identifiable by using only local information. If it is identifiable, it finds the possible descendants of the treatment and infers the optimal adjustment set as the parents of the outcome in a modified forbidden projection. Otherwise, it returns the locally valid parent adjustment sets. In our experiments on synthetic and realistic data LOAD outperforms global methods in scalability, while providing more accurate effect estimation than local methods.

Statistically Efficient Causal Inference

After causal discovery, the causal effect of a treatment \(X\) on an outcome \(Y\) can be estimated by adjusting for a set of covariates according to the estimated graph [1]. There may be multiple valid adjustment sets that all yield unbiased estimates of the causal effect, but differ in their statistical efficiency, i.e., the variance of the estimates. The valid adjustment set with the lowest asymptotic variance is called the optimal adjustment set [2, 3], and can be identified graphically as the parents of the outcome and the mediators minus the forbidden nodes: \[ \text{Oset}(X,Y) = \text{Pa}(\{Y\} \cup \text{Med}(X, Y)) \setminus \underbrace{\text{PossDe}(\{Y\} \cup \text{Med}(X, Y))}_{\text{forbidden nodes}}. \]

effect estimation withdifferent adjustment sets

The parent and canonical adjustment sets are valid, but less efficient than the optimal adjustment set. No valid adjustment set may contain the forbidden nodes.

Inspired by [4], we show that after projecting out all possible descendats of the treatment, except the outcome, the optimal adjustment set can be identified as the parents of the outcome in the resulting graph. This makes it easier to find the optimal adjustment in large graphs with many paths from the treatment to the outcome. However, the possible descendants of the treatment still have to be determined first.

modified forbidden projection

After projecting out the possible descendants of the treatment, the optimal adjustment set can be identified as the parents of the outcome in the resulting graph.

Finding the optimal adjustment set, even via the projected graph, traditionally requires learning the estimated causal graph over all variables first, which is computationally expensive and may be unnecessary. In this work, we show that the optimal adjustment set can be identified by learning only the local neighborhoods [5] around a few variables, which is both computationally efficient and statistically efficient.

Local Tests of Causal Relations and Identifiability

We begin by learning the causal relation between the target pair \(X\) and \(Y\) from their local neighborhoods, building on results from [6], who show that if \(X \not\kern-4pt{\perp\kern-6pt\perp}\ Y | \text{Pa}(X)\) then \(X\) is a possible ancestor of \(Y\), and if \(X \not\kern-4pt{\perp\kern-6pt\perp}\ Y | \text{Pa}(X) \cup \text{Sib}(X)\) then \(X\) is also an explicit ancestor of \(Y\). We show that if no target is the explicit ancestor of the other, then the causal effect is not identifiable. Otherwise, the explicit ancestor is established as the treatment whose causal effect on the other target, the outcome, may be identifiable.

locally testing causal relations

The local neighborhood of \(X\) can be used to determine its causal relation to \(Y\) [6], no matter how far apart they are.

Once a target is determined as an explicit ancestor of the other, we can fully determine the identifiability of the causal effect. To do this, we establish a test based only on the local neighborhoods of the siblings of the treatment and show that the causal effect is identifiabe if and only if \[ \forall V \in Sib(X): V {\perp\kern-6pt\perp}\ Y | \text{Pa}(V) \cup \{X\}. \]

locally testing identifiability

We can locally test the identifiability of the causal effect from the local neighborhoods of the siblings of the treatment.

If the causal effect turns out to be not identifiable at any of these steps, we terminate LOAD early and return the locally valid parent adjustment sets [7] for the possible treatments, i.e., targets that are possible ancestors of the other. This might be one target, but can also be both targets, or neither of them. If a target is not a possible ancestor of the other, then we know that it has no causal effect on the other without adjustment.

Local Optimal Adjustments Discovery

If the causal effect is identifiable, we proceed to find the optimal adjustment set. To do this, we first collect the possible descendants of the treatment. We can locally test if a node is a possible descendants of the treatment using its already learned local neighborhood. This requires only a single test per node, regardless of how far apart it is from the treatment. Finally, we learn the local neighborhood of the outcome again, but this time we project out the possible descendants of the treatment by simply considering the marginal distribution over the remaining nodes. Then, the optimal adjustment set is identified as the parents of the outcome.

collecting possible descendants

The optimal adjustment set is the parents of the outcome after collecting and projecting out the possible descendants of the treatment.

By dragging the slider below, you can follow along the steps of LOAD on the running example. It begins by performing local causal discovery on the targets to determines their causal relation and establish \(X\) as the treatment and \(Y\) as the outcome. Then, it confirms the identifiability of the causal effect via the siblings of \(X\), \(\{V_3\}\). LOAD proceeds to collect and project out the possible descendants of the treatment. Finally, it performs local causal discovery on the outcome \(Y\) in the projected graph and identifies the optimal adjustment set as its parents \(V_5\) and \(V_6\). Steps of LOAD

Follow along the steps of LOAD by dragging the slider.

Experiments

We evaluate LOAD on synthetic and realistic data and perform several ablation studies. We showcase only the main synthetic data results here. We compare LOAD to several global- and local causal discovery methods in terms of the computational efficiency (CI tests and time) and statistical efficiency (Oset recovery and estimation error). We evaluate all methods on different domains of data and CI tests.

results for computational efficiency

Results for computational efficiency. Lower is better for both CI tests and time.

Our results show that LOAD performs on par with local methods in terms of computational efficiency and is much more scalable than global methods both in terms of CI tests and computation time. On the other hand, LOAD provides the same theoretical guarantees for finding the optimal adjustment set as global methods. By only relying on local information, LOAD can even discovery the optimal adjustment set with more accuracy than global methods, which leads to one of the best estimation errors across all methods and domains.

results for statistical efficiency

Results for structural and statistical efficiency. Higher is better for the F1 score of Oset recovery, and lower is better for the intervetion distance, i.e., estimation error.

Our ablation studies show that the performance of LOAD is robust over different number of samples and graph densities, and remains one of the best in scenarios where the causal relation between the targets is known or the causal effect is ensured to be identifiable.

References

  1. Perković et al. (2018) Complete graphical characterization and construction of adjustment sets in markov equivalence classes of ancestral graphs. JMLR.
  2. Henckel et al. (2022) Graphical criteria for efficient total effect estimation via adjustment in causal linear models. Journal of the Royal Statistical Society Series B: Statistical Methodology.
  3. Rotnitzky and Smucler. (2020) Efficient adjustment sets for population average causal treatment effect estimation in graphical models. JMLR.
  4. Witte et al. (2020) On efficient adjustment in causal graphs. JMLR.
  5. Wang et al. (2014) Discovering and orienting the edges connected to a target variable in a dag via a sequential local learning approach. Computational statistics & data analysis.
  6. Fang et al. (2022) A local method for identifying causal relations under markov equivalence. Artificial Intelligence.
  7. Maathuis et al. (2009) Estimating high-dimensional intervention effects from observational data. The Annals of Statistics.