Anecdotal Evidence: Substantiating a claim with potentially biased stories, memories, etc., instead of scientific and statistically rigorous examination.
Selection bias: Inaccuracy from only studying a specific subpopulation that is different than the whole.
Confirmation bias: Inaccuracy from focusing on proving something you already believe.
Estimation: Using sample characters to generalize about the overall population.
Hypothesis testing: A scientific examination of whether an effect is likely by random chance or not.
Cross-sectional study: Snapshot of a group at a point in time.
Longitudinal study: Observes the same group repeatedly over time.
Cycle: One of the points in time a group was observed during a longitudinal study.
Respondents: Participant in a study.
Representative: Sample roughly estimates population.
Oversample: Sample has a higher percentage of a factor than would be found in the total population.
Codebook: Documents design of study.
Data cleaning: Filling in missing parts of the total data in an unbiased way reflective to the total population.
Chapter 3
Probability mass function: Maps each value to it's chance of occurring.
Normalization: Dividing a wide range of values by a common value (usually n
or n/2
) to get a smaller range of values for algorithmic efficiency.