STAT 250, Fall 2022: Introduction to Biostatistics

Lecture section 001-LEC; Lab sections 001L, 002L, 003L, 004L

This online document is the official course syllabus.
It is found at http://www.stat.psu.edu/~dhunter/250/index.html
Any changes made to the syllabus after the first week of the semester will be announced in class and appear in this document in red.

COURSE SCHEDULE
       Lectures: MF 10:10-11:00 a.m., 101 Thomas
       Lab Section 001: W 12:20-1:10 p.m., 211 Keller
       Lab Section 002: W 11:15 a.m.-12:05 p.m, 211 Keller
       Lab Section 003: W 1:25-2:15 p.m, 115 Keller
       Lab Section 004: W 4:40-5:30 p.m, 115 Keller

TEACHING TEAM
    Instructor:
       David Hunter, 326 Thomas Building, dhunter@stat.psu.edu
    Teaching Assistant:
       Ann Johnston, abj5162@psu.edu
    Learning Assistants:
       Danah Altassan (Lab section 001, 12:20 p.m.), dka5259@psu.edu
       Emanuel Rios (Lab section 002, 11:15 a.m.), ebr5240@psu.edu
       Haoyue Zhang (Lab section 003, 1:25 p.m.), hbz5153@psu.edu
       Alec Jones and Jacob Berry (Lab section 004, 4:40 p.m.), arj5344@psu.edu and jab7758@psu.edu
    Guided Study Group Leader: To be determined

OFFICE HOUR SCHEDULE
Regular office hours begin the second week of classes. Additional office hours will be added once we finalize the LA and GSG leader schedules.
Day Time Who Location
Monday 11:15 a.m. - 12:15 a.m. David Hunter 310 Thomas
Monday 2:30 p.m. - 3:30 p.m. Alec Jones 330 Thomas
Monday 6:30 p.m. - 8:30 p.m. Ann Johnston 320 Thomas
Thursday 10:30 a.m. - 11:30 a.m. Jacob Berry Zoom: See Canvas for link
Thursday 3:00 p.m. - 4:00 p.m. Danah Altassan Zoom: See Canvas for link
Thursday 6:00 p.m. - 7:00 p.m. Emanuel Rios Zoom: See Canvas for link
Thursday 6:30 p.m. - 8:30 p.m. Ann Johnston 320 Thomas
Friday 2:00 p.m. - 3:00 p.m. David Hunter 310 Thomas
Friday 2:00 p.m. - 3:00 p.m. Haoyue Zhang Zoom: See Canvas for link

REQUIRED COURSE MATERIALS

GRADING
Learning outcomes will be assessed based on performance in each of the following categories accompanied by their impact on the overall grade:

Category Percent of Total Grade
Engagement 4%
Labs 24%
Homework 12%
Midterm #1 15%
Midterm #2 20%
Final Exam 25%

Final letter grades will be determined as follows after rounding to the nearest whole number percent:
  B+: 88-89% C+: 78-79%  
A : 94-100% B : 84-87% C : 70-77% D : 60-69%
A-: 90-93% B-: 80-83%   

COMPONENTS OF OVERALL GRADE

Engagement: Engagement points are earned through in-class Top Hat activities as well as occasional Canvas activities. For in-class activities, each non-exam class is worth 1 point. You get credit as long as you participate in most of the questions, regardless of whether or not you answer correctly.

Labs: Labs are Wednesdays in the Keller Building. The primary goals of the labs are to teach you how to use statistical software, to reinforce ideas covered in the lectures, and to offer valuable hands-on experience doing statistics in a supervised setting. If you wake up the day of lab not feeling well, or if you need to miss a lab for some other reason, you must notify the teaching assistant in advance to arrange a makeup. Non-illness reasons require more than 24 hours notice. Except for excused absences, the quiz must be taken in class in your appointed classroom.

Homework: Homework assignments will be roughly weekly, usually due Monday or Friday before lecture. All homework will be done through WileyPLUS, the online component of our textbook. Late homework is accepted for half credit, up until the first exam on that material, after which it is no longer accepted. You may collaborate on homework (this is encouraged!), but we recommend trying problems on your own first to prepare you for exams.

Exams: There will be two midterm exams in class and a final exam. These exams will be closed to all materials except for a non cell-phone calculator and one (Exam 1), two (Exam 2), or three (Final) double-sided 8.5 by 11 inch pages of notes. Exams are mandatory, and must be taken at the given time. Unavoidable legitimate reasons for not being able to take the exam must be submitted to and approved by Dr. Hunter at least 24 hours before the beginning of the exam. Excuses submitted less than 24 hours before the exam might not be accepted.

OVERALL COURSE GOALS
In this course we'll learn how to effectively collect data, describe data, and use data to make inferences and conclusions about real world phenomena. After finishing this course, you should be able to:

  1. Recognize the importance of data collection and its role in determining the scope of inference.
  2. Demonstrate a solid understanding of interval estimation and hypothesis testing.
  3. Choose and apply appropriate statistical methods for analyzing one or two variables.
  4. Use technology to perform descriptive and inferential data analysis for one or two variables.
  5. Interpret statistical results correctly, effectively, and in context.
  6. Understand and critique data-based claims.
  7. Appreciate the power of data.

SPECIFIC COURSE OBJECTIVES

  1. Data Collection: By the end of the course you should be able to...
    1. Identify cases and variables in a dataset, and classify variables as categorical or quantitative.
    2. Recognize that data and knowledge of statistics allows you to investigate a wide variety of interesting phenomena.
    3. Distinguish between a sample and a population.
    4. Recognize when it is, and is not, appropriate to use sample data to infer information about a population.
    5. Recognize that not every association implies causation.
    6. Identify potential confounding variables in an observational study.
    7. Distinguish between an observational study and a randomized experiment.
    8. Recognize that only randomized experiments can lead to claims of causation and explain why randomization is important for causality.
    9. Explain how and why placebos and blinding are used in experiments.
    10. Distinguish between a completely randomized experiment and a matched pairs experiment.
    11. Design and implement a basic randomized experiment.
  2. Exploratory Data Analysis: By the end of the course you should be able to...
    1. Create (with technology) and interpret a dotplot, boxplot, or histogram, and side-by-side dotplots, boxplots, or histograms.
    2. Calculate (with technology) and interpret summary statistics for a quantitive variable, including mean, median, standard deviation, five number summary, range, and IQR, and be able to calculate and compare these within groups.
    3. Compute and interpret a z-score for an individual value.
    4. Interpret percentiles.
    5. Create (with technology) a scatterplot between two quantitative variables, and use the plot to describe the association.
    6. Explain what a positive or negative association means between two quantitative variables.
    7. Calculate (with technology) and interpret a correlation.
    8. Identify outliers (informally or formally) and explain how they effect different statistics.
    9. Realize that it is important to plot your data if any variables are quantitative.
    10. Create (with technology) bar graphs and side-by-side or segmented bar graphs for categorical variables.
    11. Create a frequency, relative frequency, or two-way table to summarize categorical variables.
    12. Use a frequency, relative frequency, or two-way table to calculate proportions, difference in proportions, odds, and odds ratios.
    13. Determine an appropriate numerical summary statistic(s) and visualization for any one or two variables being analyzed.
    14. Appreciate the power of data visualization for more than two variables.
  3. Estimation: By the end of the course you should be able to...
    1. Distinguish between a population parameter and a sample statistic, recognizing that a parameter is fixed while a statistic varies from sample to sample.
    2. Determine and define an appropriate parameter of interest, based on a question.
    3. Compute a point estimate for a parameter using an appropriate statistic from a sample.
    4. Recognize that a sampling distribution shows how sample statistics tend to vary, but that in reality a sampling distribution can never be obtained in situations where estimation is needed.
    5. Recognize that statistics from random samples tend to be centered at the population parameter.
    6. Explain how to generate a bootstrap distribution for a given sample and statistic.
    7. Use technology to generate a bootstrap distribution, and recognize that it will be centered around the sample statistic.
    8. Demonstrate an understanding of standard error as the standard deviation of the statistic.
    9. Calculate a standard error from a bootstrap distribution (using technology), and from a formula for means, difference in means, proportions, and difference in proportions.
    10. Recognize that a confidence interval will capture the true parameter for the specified percentage of all random samples.
    11. Use a bootstrap distribution to construct a 95% confidence interval using the formula statistic +- 2xSE.
    12. Use a bootstrap distribution to construct a confidence interval using percentiles of the bootstrap distribution.
    13. Use the normal or t-distribution to construct a confidence interval for a mean, proportion, difference in means, difference in proportions, or correlation using technology.
    14. Use the normal or t-distribution and the standard error formulas to constuct a confidence interval using the formula statistic +- z*xSE for proportions and difference in proportions or statistic +- t*xSE for means, difference in means, and slope.
    15. Interpret a confidence interval in context.
    16. Explain how sample size affects standard error and the width of a confidence interval.
    17. Demonstrate an understanding of the central limit theorem.
    18. Determine whether the conditions are met for the chosen method to be valid.
  4. Testing: By the end of the course you should be able to...
    1. Recognize when and why statistical tests are needed.
    2. Specify null and alternative parameters based on a question of interest, defining relevant parameters.
    3. Demonstrate an understanding of the concept of statistical significance.
    4. Recognize that the strength of evidence against the null hypothesis depends on how unlikely it would be to get a statistic as extreme just by random chance, if the null hypothesis were true.
    5. Use technology to generate a randomization distribution, and realize that it will be centered around the null parameter value.
    6. For a given sample and null hypothesis, describe the process of creating a randomization distribution.
    7. Use a randomization distribution to calculate a p-value.
    8. Connect the definition of a p-value to the motivation behind a randomization distribution.
    9. Distinguish between one and two-tailed tests in stating the alternative hypothesis and calculating the p-value.
    10. Interpret a p-value.
    11. Make a formal decision in a hypothesis test by comparing the p-value to the significance level.
    12. State the conclusion to a hypothesis test in context.
    13. Recognize that two types of errors can occur, and interpret false positives (Type I) and false negatives (Type II) in context.
    14. Recognize a significance level as the tolerable chance of getting a false positive (making a Type I error).
    15. Explain the problem of multiple testing and publication bias.
    16. Recognize that statistical significance is not always the same as practical significance.
    17. Make a less formal statement about the strength of evidence in a p-value.
    18. Determine the decision for a two-tailed hypothesis test from the corresponding confidence interval.
    19. Use technology and the normal or t-distribution to calculate a p-value for tests for means, difference in means, proportions, difference in proportions, correlation, and slope.
    20. Use the normal or t-distribution, the standard error formulas, and the formula (statistic - null value)/SE to calculate a p-value for tests for means, difference in means, proportions, difference in proportions, correlation, and slope.
    21. Determine whether a chi-square goodness of fit test or a chi-square test for association is appropriate to answer a question of interest.
    22. State hypotheses for a chi-square goodness-of-fit test for one categorical variable and for a chi-square test for association for two categorical variables.
    23. Calculate the test statistic for a chi-square goodness-of-fit test and a chi-square test for association both with and without technology.
    24. Use a randomization distribution or a chi-square distribution to calculate a p-value for a chi-square test.
    25. State the conclusion in context for a chi-square goodness-of-fit test and a chi-square test for association.
    26. Determine whether the conditions are met to use a normal, t, or chi-square distribution for inference.
    27. Conduct a hypothesis test from start to finish for a variety of different situations.
    28. Determine whether a confidence interval, a hypothesis test, both, or neither is most appropriate for answering a question of interest.
  5. Modeling: By the end of the course you should be able to...
    1. Use technology to find the regression line for two quantitative variables, giving the equation and plotting the line on a scatterplot.
    2. Calculate predicted values from a regression equation.
    3. Interpret the slope (and intercept, when appropriate) of a regression line in context.
    4. Calculate residuals and visualize residuals on a scatterplot.
    5. Beware of extrapolating when making predictions, fitting a line to nonlinear data, and the effect of outliers.
    6. Recognize the importance of plotting your data.
    7. Check a scatterplot for obvious violations of the assumptions of simple linear regression.
    8. Construct a confidence interval and test a hypothesis about the slope in a linear regression model.
    9. Compute (with technology) and interpret R2 in a regression model.
    10. Use technology to fit a multiple regression model.
    11. Interpret coefficients in a multiple regression model, recognizing that care should be taken when interpreting coefficients of predictors that are strongly associated with each other.
    12. Use a multiple regression model to make predictions.

ACACEMIC INTEGRITY POLICY.
All Penn State and Eberly College of Science policies regarding academic integrity apply to this course. See https://science.psu.edu/current-students/integrity/policies for details. Please understand that the integrity policy also applies to Top Hat participation. In particular, participating in any engagement activity in place of another person is a violation of the policy and will result in academic sanctions that could go well beyond the total value of all engagement assignments, depending on the severity of the violation.

CODE OF MUTUAL RESPECT.
As the instructor for this course, I strongly endorse the Eberly College Code of Mutual Respect and Cooperation. I intend to adhere to these tenets in my dealings with students and I hope that students will reciprocate in their interations with all other students, teaching assistants, learning assistants, and me. The code may be found online at https://science.psu.edu/climate-and-diversity/code-mutual-respect-and-cooperation.

DISABILITY ACCOMMODATION STATEMENT.
Penn State welcomes students with disabilities into the University's educational programs. The Student Disability Resources (SDR) website at http://equity.psu.edu/sdr/disability-coordinator provides contact information for every Penn State campus. At University Park, the SDR office is in 116 Boucke. In order to receive consideration for reasonable accommodations, you must contact SDR and provide documentation as explained in the guidelines at http://equity.psu.edu/sdr/guidelines.

COUNSELING AND PSYCHOLOGICAL SERVICES STATEMENT.
Many students at Penn State face personal challenges or have psychological needs that may interfere with their academic progress, social development, or emotional wellbeing. The university offers a variety of confidential services to help you through difficult times, provided by staff who welcome all students. At University Park, Counseling and Psychological Services at University Park (CAPS) may be reached at 814-863-0395 or on the web at http://studentaffairs.psu.edu/counseling/. For emergencies 24 hours a day, 7 days a week, call the Penn State Crisis Line at 877-229-6400 or contact the Crisis Text Line by texting LIONS to 741741.

EDUCATIONAL EQUITY/REPORT BIAS STATEMENT.
Students who believe they have experienced or observed a hate crime, an act of intolerance, discrimination, or harassment that occurs at Penn State are urged to report this incident as outlined on the University's Report Bias webpage at http://equity.psu.edu/reportbias/.