Path analysis (statistics)

In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis, factor analysis, canonical correlation analysis, discriminant analysis, as well as more general families of models in the multivariate analysis of variance and covariance analyses (MANOVA, ANOVA, ANCOVA).

In addition to being thought of as a form of multiple regression focusing on causality, path analysis can be viewed as a special case of structural equation modeling (SEM) – one in which only single indicators are employed for each of the variables in the causal model. That is, path analysis is SEM with a structural model, but no measurement model. Other terms used to refer to path analysis include causal modeling, analysis of covariance structures, and latent variable models.

History

Path analysis was developed around 1918 by geneticist Sewall Wright, who wrote about it more extensively in the 1920s.^[1] It has since been applied to a vast array of complex modeling areas, including biology, psychology, sociology, and econometrics.^[2]

Path modeling

In the model below, the two exogenous variables (Ex₁ and Ex₂) are modeled as being correlated and as having both direct and indirect (through En₁) effects on En₂ (the two dependent or 'endogenous' variables). In most real models, the endogenous variables are also affected by factors outside the model (including measurement error). The effects of such extraneous variables are depicted by the "e" or error terms in the model.

Using the same variables, alternative models are conceivable. For example, it may be hypothesized that Ex₁ has only an indirect effect on En₂, deleting the arrow from Ex₁ to En₂; and the likelihood or 'fit' of these two models can be compared statistically.

Path tracing rules

In order to validly calculate the relationship between any two boxes in the diagram, Wright (1934) proposed a simple set of path tracing rules,^[3] for calculating the correlation between two variables. The correlation is equal to the sum of the contribution of all the pathways through which the two variables are connected. The strength of each of these contributing pathways is calculated as the product of the path-coefficients along that pathway.

The rules for path tracing are:

You can trace backward up an arrow and then forward along the next, or forwards from one variable to the other, but never forward and then back.
You can pass through each variable only once in a given chain of paths.
No more than one bi-directional arrow can be included in each path-chain.

Another way to think of rule one is that you can never pass out of one arrow head and into another arrowhead: heads-tails, or tails-heads, not heads-heads.

Again, the expected correlation due to each chain traced between two variables is the product of the standardized path coefficients, and the total expected correlation between two variables is the sum of these contributing path-chains.

NB: Wright's rules assume a model without feedback loops: the directed graph of the model must contain no cycles, i.e. it is a directed acyclic graph, which has been extensively studied in the causal analysis framework of Judea Pearl.

Path tracing in unstandardized models

If the modeled variables have not been standardized, an additional rule allows the expected covariances to be calculated as long as no paths exist connecting dependent variables to other dependent variables.

The simplest case obtains where all residual variances are modeled explicitly. In this case, in addition to the three rules above, calculate expected covariances by:

Compute the product of coefficients in each route between the variables of interest, tracing backwards, changing direction at a two-headed arrow, then tracing forwards.
Sum over all distinct routes, where pathways are considered distinct if they contain different coefficients, or encounter those coefficients in a different order.

Where residual variances are not explicitly included, or as a more general solution, at any change of direction encountered in a route (except for at two-way arrows), include the variance of the variable at the point of change. That is, in tracing a path from a dependent variable to an independent variable, include the variance of the independent-variable except where so doing would violate rule 1 above (passing through adjacent arrowheads: i.e., when the independent variable also connects to a double-headed arrow connecting it to another independent variable). In deriving variances (which is necessary in the case where they are not modeled explicitly), the path from a dependent variable into an independent variable and back is counted once only.

References

↑ Wright, S. (1921). "Correlation and causation". J. Agricultural Research. 20: 557–585.
↑ Dodge, Y. (2003) The Oxford Dictionary of Statistical Terms. OUP. ISBN 0-19-920613-9
↑ Wright, S. (1934). "The method of path coefficients". Annals of Mathematical Statistics. 5 (3): 161–215. doi:10.1214/aoms/1177732676.

Statistics

Descriptive statistics

Continuous data

Center	Mean arithmetic geometric harmonic Median Mode

Dispersion	Variance Standard deviation Coefficient of variation Percentile Range Interquartile range

Shape	Moments Skewness Kurtosis L-moments

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Population Statistic Effect size Statistical power Sample size determination Missing data

Survey methodology	Sampling Standard error stratified cluster Opinion poll Questionnaire

Controlled experiments	Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment

Uncontrolled studies	Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in

Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife

Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons

Parametric tests	Likelihood-ratio Wald Score

Specific tests

Z (normal) Student's t-test F

Goodness of fit	Chi-squared Kolmogorov–Smirnov Anderson–Darling Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC

Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra)

Bayesian inference

Correlation	Pearson product–moment Partial correlation Confounding variable Coefficient of determination

Regression analysis	Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain	Spectral density estimation Fourier analysis Wavelet

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function	Nelson–Aalen estimator

Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics	Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics

Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category
Portal
Commons
WikiProject

External links

This article is issued from Wikipedia - version of the 11/18/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.