Friday, February 26, 2010

Correlation and Regression

Covariance is x and y in the same formula as variance is x squared.

Table B3 determines whether or not the observed pearson r is a "rare event" unlikely to have occured by chance.

A large sample size usually makes r values significant.

The formula and calculation for comparing r's will not be required on test.

Regression

Uses the classic equation for a line: y = mx + b, but the letters are different in stats. It is y=bx+a where b is the slope and a is the y intercept.

Slope = rise over run or y1-y2 divided by x1-x2

Prediction comes from graphing the line and predicting x, y coordinates on the line.

Data that can be described as a line is known as perfectly linear relationship.

Best-fitting line is known as the regression line.

Method of least-square creates the regression line (or best-fitting line): It is the line that minimizes the overall distance between the regression line and the data points.

Calculation is not required for test.

Wednesday, February 24, 2010

Correlation

Correlation: Indicates the relationship between 2 or more variables like smoking and lung cancer. Strength of relationship and direction of relationship (positive or negative).

Scale data - Pearson Product-Moment Correlation (aka Pearson Correlation)
Ordinal data - Spearman Rank-Order Correlation
Nominal data - Phi Coefficient

Pearson Correlation: Range is from 1 to -1. Closer to 1 or -1 the stronger the relationship. At 0, no linear relationship whatsoever. Scatter-graph that looks like a line is a strong relationship.

Correlation coefficient = r, r is a standard index from 1 to -1.

Important caveats about Pearson r:

1. Not all important or interesting relationships are linear. (Yerkes-Dodson Law)

2. Watch out for spurious correlations (counterfeit correlation)

A. Restricted range (see handout) - full range shows relationship where restricted range shows counterfeit correlation.

B. Combined groups: combining groups may off-set or wipe out a correlation that exists when the groups are not combined. Breaking out groups by demographics or gender or something helps avoid this problem.

C. Outliers: outliers through off calculations. Why is there an outlier? You have to explain the outliers.

Correlation does not equal causation, it equals a degree of covarying.

Correlaiton does not tell us:
x -> y
y -> x
z -> x and y
coincedence

Pearson r formula is covariance divided by total variability

Monday, February 22, 2010

SPSS introduction

Data, Select cases - Look at a sub-set of the data

Transform, Recode into different variable - Change the data like gender as 1 and 2 could be changed to 3 and 4, or grades like everything above C is 1 and below is 2.

Transform, Compute variable - take several variables and calculate a new variable.

Analyze is where SPSS is powerful.
  • Descriptive statistics
  • ANOVA
  • T-test
  • General linear model
  • Correlate
  • Regression
  • Nonparametric tests
  • Scale

Friday, February 19, 2010

Hypothesis testing - probability

Alpha level (:z-critical) aka cut-off: the likelihood of obtaining a type 1 error (errorenously rejecting Ho or random sampling error), traditional level is 0.05.

Directional: words like below or above or more or less are used.

Non-directional: words like difference or change or impact are used.

P score (:z-observed) aka observed score: Alpha is set by you like 0.05. P score is the probability of the observed score from your sample given that H0 (null hypothesis) is true.

"If your P score is less probable than alpha, you have a score to reject H0 (null hypothesis)".

Decision Errors

Type 1 error is false positive (errorneously rejecting H0) - the likelihood is alpha.
Type 2 error is false negative (errorneously failing to reject H0) - the likelihood is called beta. (Beta is not taught in this class)

As alpha decreases beta increases and visa versa.

Power: Probability that the test will lead to a reject H0 when H0 is actually false. You rejected H0 when you should reject H0.

Telescope example...type 2 error is a telescope that doesn't have enough power to see the asteriod that exists. If it has enough power, then you correctly reject H0.

Tuesday, February 16, 2010

Hypothesis testing - Standard error of the means

"If you criticize something, you are obligated to know it better than those that espouse it."

Raw data -> Summarized, Organized, Simplified (Descriptive statistics: s, x-bar, s-squared) -> Sample to population inferences (Inferential statistics: p, z, t, F, q)

Hypothesis testing

1. Simple random sampling: used for statistical inference, where populations are inaccessible, and are often more accurate. All units in the population have an equal chance of being selected.

2. Proportional Stratified Random Sample: sample maps exactly onto the population in terms of proportions of sub-groups (e.g. population has 10% seniors and sample has 10% seniors)

3. "Errors" in sampling (sampling error and non-sampling error) must be dealt with. Samples and population don't match-up. Non-sampling errors include question text and framing that creates confusion. Other things like cultural issues can cause non-sampling error.

Sampling Distributions

How do you detect how much error (sampling error) is in the sample? Use a standard-deviation-like calculation (spread of scores with respect to the mean).

By selecting multiple samples and calculating the means of those samples, then using the means in the place of raw scores and calculate the standard deviation of the means-like statsitic called the standard error of the means (s-sub-x-bar).

Standard error of the means = sample standard deviation divided by the square root of the number of observations in the sample. (s / sqrt n)

or theoretical (sigma-sub-x-bar = sigma / sqrt N)

A sampling (sample of means) distribution is normally distributed when it is drawn from a normally distributed population or the size of the samples is reasonably large (at least 30).

Friday, February 12, 2010

Paper writing tips

Writing tips:
  1. Begin with the end in mind (goals of the writing)
  2. Flow with the end in mind (each sentence and paragraph has "end" purpose)
  3. Look for gaps or open space between sentences and paragraph where the connections are weak (readers willingness to move on)
  4. Claims must be supported (claims are supported by logic both yours and others)
  5. Abstracts are the lean and mean of here's what we did and here's what we got. No lit review stuff in abstracts.
  6. Don't forget about the bridge from the literature and your hypotheses.

Writing abstracts:

  1. Opening (one sentence)
  2. Purpose of the study (include hypotheses)
  3. Research design/method description
  4. Results (brief description)
  5. Conclusion (one sentence)

* Don't put any sentence in the abstract that could be cut and pasted into another abstract.

Probability

P score or probability is the area under the distribution curve towards the tails from the z-score.

Plotting z scores is important for conceptually understanding z-score relationships.