Wednesday, February 24, 2010

Correlation

Correlation: Indicates the relationship between 2 or more variables like smoking and lung cancer. Strength of relationship and direction of relationship (positive or negative).

Scale data - Pearson Product-Moment Correlation (aka Pearson Correlation)
Ordinal data - Spearman Rank-Order Correlation
Nominal data - Phi Coefficient

Pearson Correlation: Range is from 1 to -1. Closer to 1 or -1 the stronger the relationship. At 0, no linear relationship whatsoever. Scatter-graph that looks like a line is a strong relationship.

Correlation coefficient = r, r is a standard index from 1 to -1.

Important caveats about Pearson r:

1. Not all important or interesting relationships are linear. (Yerkes-Dodson Law)

2. Watch out for spurious correlations (counterfeit correlation)

A. Restricted range (see handout) - full range shows relationship where restricted range shows counterfeit correlation.

B. Combined groups: combining groups may off-set or wipe out a correlation that exists when the groups are not combined. Breaking out groups by demographics or gender or something helps avoid this problem.

C. Outliers: outliers through off calculations. Why is there an outlier? You have to explain the outliers.

Correlation does not equal causation, it equals a degree of covarying.

Correlaiton does not tell us:
x -> y
y -> x
z -> x and y
coincedence

Pearson r formula is covariance divided by total variability

No comments:

Post a Comment