6.S085 Statistics for Research Projects: IAP 2015
Instructor: Ramesh Sridharan
Contact: iap-stats at mit dot edu
Meeting time: 10am-12pm
Meeting place: 32-144
6 units (U) graded P/D/F
Class description
This class is a practical introduction to statistical modeling and experimental design, intended to provide essential skills for doing research. We'll cover basic techniques (e.g., hypothesis testing and regression models) for both traditional experiments and newer paradigms such as evaluating simulations. Students with research projects will be encouraged to share their experiences and project-specific questions.
Students are expected to attend class and participate in discussions. Coursework will consist of two "practicals"—analyzing simple datasets to solidify core concepts—and two "case studies"—critical reading assignments of actual articles. Each assignment should take roughly one hour. Students are welcome to work in groups, but each student must submit an individual write-up in his or her own words. If you do work in a group, please also indicate with whom you worked. To pass, students must get a check/check+ on all assignments.
Finally, as this class is meant to be practical, we welcome any suggestions on topics and teaching style that will help you gain more from this course.
What will you get out of this class?
By the end of the class, you'll be able to:
- Summarize and visualize data
- Choose the right statistical analyses for your experiment and data
- Analyze data using classical and modern statistical techniques
- Critique and evaluate statistical analyses
- Design a robust, statistically sound experiment
- Deal with many kinds of data: numerical and categorical, single-variable or multivariate, big or small
Schedule
Due to the snowstorm and MIT closing on Tuesday, Jan 27, all classes from then on will be pushed back by one day. Note that Practical 2 will still be due on Tuesday by email!
| Week | Date | Topic | Assignments due | Notes |
|---|---|---|---|---|
| 1 | Tue Jan 20 | Introduction to statistics terminology, exploratory data analysis, and important distributions | [PDF] (last updated 1/19) | |
| Wed Jan 21 | Confidence intervals and hypothesis testing | Paragraph on your research interests | [PDF] (last updated 1/19) | |
| Thu Jan 22 | Linear regression | [PDF] (last updated 1/20) | ||
| Fri Jan 23 | Regression diagnostics, advanced regression topics | [PDF] (last updated 1/25) | ||
| 2 | Mon Jan 26 | Nonparametric tests, model fitting | Practical 1 | [PDF] (last updated 1/25) |
| Categorical data | Practical 2 Due by email no later than noon on Tuesday Jan 28 |
[PDF] (last updated 1/26) | ||
| Experimental design | [PDF] | |||
(32-124) |
Machine learning, predictive analytics | Case study writeups |
Practicals
Each of the practicals involves carrying out some statistical analysis on small, real-world datasets. You may use any software to complete the assignments; all the data is in comma-separated format which should be readable by most software packages. If you do not already have a favorite, we encourage you to try out R, which is available on any Athena machine. We're also familiar with Python, Matlab, Julia, and Excel/OpenOffice. Outside of those, we'll do our best to help, but can't promise to get you unstuck. Finally, keep in mind that in most cases, each analysis will be a single line of R code; rarely will it be more than five. Please contact us if you find yourself getting bogged down in trying to run the analyses.
In your write-up (feel free to use bullet points/keep it brief), make sure you explain your reasoning for the tests that you ran and the parameter settings that you used. Also explain and interpret the results of any exploratory data analysis and statisical inference. Include relevant plots and output to back up your claims; however, we don't want to just see loads of print-outs! Your job is to provide succinct summaries of your analysis, not just copy-paste the computer output.
Additional pointers for those using R: This short reference card contains a quick-lookup list of a lot of common functions. If you need more extensive data manipulation, this card is also a good reference. We've also listed the key commands/syntax you'll need for the assignments here.
These assignments should be handed in at the start of class on the day they're due.
Case studies
Review two of the articles listed below, or of your own choosing. Each review should be no more than one page. Lists, bullet points, etc. are fine as long as your writing is clear. Reviews should consist of:
- Summary: What was the objective of the study? Summarize the hypothesis, design methodology, analysis approach, and major findings. (This is to check whether you understood the study.)
- Experimental Design: Was the experimental design appropriate for the study? Provide your reasoning for both sound and unsound aspects.
- Statistical Analysis: Was the statistical analysis sound? Provide your reasoning for both sound and unsound aspects.
Case study papers: pick two from this list or select your own. If you choose your own, you should be able to find at least one sound and unsound aspect of the paper's statistical and design methodology.
- Overview of the DARPA Augmented Cognition Technical Integration Experiment (2004)
- Evaluation of Mangosteen juice blend on biomarkers of inflammation in obese subjects (2009)
- Racial Preferences in Dating (2007)
- Sex Differences in the structural connectome of the human brain (2013)
The case studies should be turned in by Friday, January 30 at 11:59 by emailing your writeups to iap-stats AT mit DOT edu.
Links and references
- Graphpad: Table of which statistical test to run for different data types.
- Onlinestatsbook: contains many of the demos used in class.
- Statnotes: Nice overview of many common methods (more methods than we will cover in class).
- 18.443 OCW Website: Contains course notes and references (Probability and Statistics, deGroot) for applied statistics at MIT. Good if you want a little more math/derivations behind the tests.
- Introduction to the Practice of Statistics, Moore and McCabe: clear introduction to the key concepts without a lot of math (good for intuition, but you'll probably need something more detailed for your actual analysis).
- Also, note that Wikipedia is often a good starting point if you want to look up a summary about a test or distribution (the descriptions are generally quite accurate, and the references will get you more information).
Acknowledgments
- Finale Doshi-Velez for starting this class and designing the curriculum.
- George Chen for co-teaching the class in 2014 and contributing to the development of the curriculum and the notes.
- Michael Bernstein and Mary (Missy) Cummings for their syllabus suggestions.
- Bobby Gramacy for his regression notes.