This intensive laboratory course will focus on doing data analysis projects with real data selected by the students. The core skills are oriented around first framing good research questions, then having these guide interacting with data of all types & varying quality (e.g., web-scraped, or clickstream-based rather than large national surveys) via visualization, principled modeling & evaluation of models using statistical learning techniques such as regression, classification & clustering, & presentation of results, using "reproducible research" tools (e.g., knitr, sweave) in the R programming language.