Skip to main content

Search NYU Steinhardt

Spring 2024 Seminars

Accessible Causal Inference

A PRIISM Seminar by George Perrett

Watch recording of George Perrett's seminar


Many of the most pressing questions in the social and medical sciences are causal questions. Randomized studies are presented as “the gold standard” of causal inference, however, randomization alone does not help us understand who benefits the most from the treatment or intervention in question. Moreover, in many applied contexts, it is impossible to run randomized studies for crucial causal questions. Causal inference methods that leverage flexible non-parametric machine learning models have clear advantages for identifying what works for whom and estimating causal effects for observational studies where the treatment or intervention is not randomly assigned. While the advantages of these approaches have been demonstrated, they carry a hidden assumption that they are implemented correctly by practitioners. This hidden assumption is often overlooked by methodologists and little is known about how machine learning based approaches to causal inference are used by applied practitioners. In this talk, George Perrett will provide an overview of causal inference methods for randomized and observational studies, describe the benefits of flexible non-parametric methods offered by machine learning algorithms and present a new software to aid the implementation of these approaches. As part of this discussion Perrett will present new data from a recently conducted randomized experiment that highlights the importance of making causal inference methods scaffolded and accessible.

Preferences among Queer Individuals when Asking Gender, Sex, and Sexuality Survey Questions

A PRIISM seminar by NYU's QUEER Data Lab

Watch recording of QUEER Data Lab's seminar


Academics and social scientists often ask demographic questions about sex assigned at birth, gender identity, sexual orientation, and gender expression in research studies. The phrasing of these questions, however, can fail to provide queer individuals with response options that reflect their identity or experience. This study analyzes results from two randomized experiments that test different options for asking these demographic questions among a sample of 1,473 queer-identifying adult individuals residing in the United States. We test questions from the United States Census Household Pulse Survey, recommendations from the National Academies of Science, Engineering, and Medicine, and questions written by the study team. We demonstrate meaningful differences across question wording and response options for respondents’ personal satisfaction and perceived representation for members of the queer community. We conclude that careful consideration should be taken when constructing identity questions and provide recommendations for asking demographic survey questions.

Beyond Exclusion: The Role of High-Stakes Testing on Attendance the Day of the Test

A seminar by Magdalena Bennett, assistant professor at University of Texas at Austin

Watch recording of Magdalena Bennett's seminar


High-stakes testing plays a crucial role in many educational systems, guiding policies of accountability, resource allocation, and school choice. However, skewed patterns of attendance can undermine the tests’ effectiveness in achieving their primary objectives. Using rich administrative data from Chile, this paper explores the impact of high-stakes testing on student attendance on the day of the test. We employ an event-study framework and machine learning predictive analytics to focus on three main objectives: (i) to gauge the average effect of high-stakes tests on school attendance across various grades and performance levels, (ii) to identify schools that may be encouraging non-representative attendance patterns, and (iii) to refine existing imputation methods. Our analysis reveals a highly heterogeneous effect of high-stakes testing on attendance, particularly impacting younger students more than older ones. These results are robust against alternative explanations such as self-selection, lack of communication, and disability exemptions. Our predictive models further show a wide variance between schools in terms of observed and predicted attendance, suggesting some may strategically discourage attendance of lower-performing students.

Design Sensitivity and Its Implications for Weighted Observational Studies

A seminar by Sam Pimentel, assistant professor at UC Berkeley 

Watch recording of Sam Pimentel


Sensitivity to unmeasured confounding is not typically a primary consideration in designing weighted treated-control comparisons in observational studies. We introduce a framework allowing researchers to optimize robustness to omitted variable bias at the design stage using a measure called design sensitivity. Design sensitivity, which describes the asymptotic power of a sensitivity analysis, allows transparent assessment of the impact of different weighted estimation strategies on sensitivity. We apply this general framework to two commonly-used sensitivity models, the marginal sensitivity model and the variance-based sensitivity model. By comparing design sensitivities, we interrogate how key features of weighted designs, including choices about trimming of weights and model augmentation, impact robustness to unmeasured confounding, and how these impacts may differ for the two different sensitivity models. We illustrate the proposed framework on a study examining drivers of support for the 2016 Colombian peace agreement.

Read the paper

Domain Adaptation under MNAR Missingness Shift

A seminar by Tyrel Stokes, postdoc at NYU Langone

Watch recording of Tyrel Stokes' seminar


Current domain adaptation methods under missingness shift are restricted to Missing At Random (MAR) missingness mechanisms, which can be considered a more general version of the covariate shift problem. However, in many real-world examples, the MAR assumption may be too restrictive. When covariates are Missing Not At Random (MNAR) in both source and target data, the common covariate shift solutions, including importance weighting, are not directly applicable. We show that under reasonable assumptions, the problem of MNAR missingness shift can be reduced to an imputation problem. This allows us to leverage recent methodological developments in both the traditional statistics and machine/deep-learning literature for MNAR imputation to develop a novel domain adaptation procedure for MNAR missingness shift. We further show that our proposed procedure can be extended to handle simultaneous MNAR missingness and covariate shifts. We apply our procedure to Electronic Health Record (EHR) data from two hospitals in south and northeast regions of the US. In this setting we expect different hospital networks and regions to serve different populations and to have different procedures, practices, and softwares for inputting and recording data causing simultaneous missingness and covariate shifts.

Fall 2023 Seminars

Measurement error and interpretability of latent class models

A seminar by Brian Flaherty, associate professor of quantitative psychology, University of Washington

Watch recording of Brian Flaherty's seminar

This seminar was co-sponsored by the Department of Biostatistics.


Latent class (LC) models identify homogeneous groups within a population. Theoretical motivations for use of these models typically involve covariate or outcome differences by class. Unrestricted LC models are most commonly used, however they often include difficult to interpret classes and unreasonable measurement error estimates. Dr. Brian Flaherty will present a highly restricted LC model, called many classes, restricted measurement models (MACREM). The goal of these models is to produce clearly interpretable classes with reasonable measurement error rates. However, the MACREM parameterization also revealed class-covariate specific associations that were not discernible in standard LC analysis.

Inferential Artificial Intelligence (iAI): Case Studies in Computational Statistics, Machine Learning, and Global Health

A seminar by Seth Flaxman, associate professor, Department of Computer Science, Oxford

Watch Recording of Seth Flaxman's talk

This seminar is co-sponsored by the Center for Urban Science and Progress (CUSP).


Machine learning is the computational beating heart of the modern AI renaissance. Behind the hype, a range of machine learning and computational statistical methods are quietly revolutionizing our approach to difficult statistical and scientific inference problems. He presented his perspective on the emerging field of “inferential Artificial Intelligence” (iAI) through a series of case studies on important global health challenges. He conceived of iAI as a big tent, encompassing modern probabilistic programming, replicable data scientific workflows, methods for assessing Big Data quality, uncertainty quantification, active learning, and a range of computational and deep learning approaches to transform applied statistical analyses.

Dr. Flaxman discussed iAI in the context of my work during the COVID-19 pandemic as part of the Imperial College COVID-19 Response Team and the collaborations he  is now leading through the Machine Learning & Global Health Network.

Case studies will include:

  • An open source semi-mechanistic hierarchical Bayesian model of SARS-CoV-2 transmission to estimate R(t), the effectiveness of non-pharmaceutical interventions, and to characterize changed epidemiological properties of Variants of Concern (Flaxman et al, Nature 2020; Volz et al, Nature 2021; Faria et al; Science 2021, Mlcochova et al, Nature 2021; Dhar et al, Science 2021)
  • The Big Data Paradox, or: how large surveys of COVID-19 vaccine uptake missed the mark so spectacularly (Bradley et al, Nature 2021)
  • Global estimates of COVID-19-associated orphanhood and deaths of caregivers (Hillis et al, Lancet 2021; Unwin et al, Lancet Child & Adolescent Health 2022; Hillis et al, JAMA Pediatrics 2022) and ongoing work strengthening data collection to identify orphans through death certificates in Zambia, Brazil, Colombia, and Utah (Flaxman et al, Science 2023)
  • πVAE/PriorVAE: Scalable MCMC inference for computationally challenging prior choices with Bayesian deep generative modeling (Semenova et al, Royal Society Interface 2022; Mishra et al, Statistics & Computing 2022) and other neural network approaches (Giovanni et al, AAAI 2023)
  • Adaptive Learning Survey Design (A-LSD): a new survey methodology based on active learning that we are piloting to improve representativeness of vulnerable subpopulations and provide spatially fine-grained maps of food insecurity with the World Food Program


Measuring Disparities in Automated Speech Recognition

A seminar by Allison Koenecke, Assistant Professor of Information Science at Cornell University

October 4, 2023

Watch Recording of Allison Koenecke's talk


Automated speech recognition (ASR) systems are now used in a variety of applications to convert spoken language to text, from virtual assistants, to closed captioning, to hands-free computing. By analyzing a large corpus of sociolinguistic interviews with white and African American speakers, we demonstrate large racial disparities in the performance of popular commercial ASR systems developed by Amazon, Apple, Google, IBM, and Microsoft. Our results point to hurdles faced by African Americans in using increasingly widespread tools driven by speech recognition technology. More generally, our work illustrates the need to develop methods to routinely audit emerging speech systems -- including the underlying acoustic and language models -- to ensure they are broadly inclusive. See more at

Shortening and adapting in person vocabulary assessments in the era of online data collection

A seminar by Daphna Harel, Associate Professor of Applied Statistics and Susannah Levi, Associate Professor of Communicative Sciences and Disorders, NYU

September 20, 2023


Numerous tasks have been developed to measure receptive vocabulary, many of which were designed to be administered in person with a trained researcher or clinician. However, as data collection shifts into an online era, both because of the covid-19 pandemic and resulting shut down of in person research, as well as the convenience of being able to conduct a study remotely, new methods and measures must be employed. Even though numerous tests exist from which a researcher or clinician can choose from, it is not clear that they all perform equally as measures of receptive vocabulary, nor are they all equally transferable to an online or computer-based environment. These tests vary substantially in their composition, participant experience, and what the participant is asked to do. Because of the burden of data collection on both the researcher and participant, efficient measurement and study protocols are required. In this study, we collect data from 53 participants who completed four measures of receptive vocabulary as well as a "gold-standard" measure that requires human administration. We explore the benefits of online data collection while asking whether it is possible to obtain equally good measurements through a shorter test instead of the long form assessment. 

Spring 2023 Seminars

Big Team Science Means Big Method Opportunities

A seminar by Erin Buchanan, Professor of Cognitive Analytics at Harrisburg University of Science and Technology

April 19, 2023

Watch Recording of Erin Buchanan's talk


The big team science movement in social science has predominantly focused on large scale replications and the diversification of participant samples represented in research. The resulting datasets provide ample evidence for both the psychological phenomenon and the ability to test additional hypotheses. However, behind the scenes, an overlooked area of big team science is the potential for experimental methodology that is tailored to adapt to the evidence collected. The Semantic Priming Across Many Languages (SPAML) project is an ongoing large-scale priming study examining the semantic facilitation effect in matched stimuli. Semantic priming occurs when the cognitive processing of a new concept is aided by previous processing of a related concept. For example, TREE is read faster when first proceeded by LEAVES, rather than SPOON. Priming is a well-studied cognitive effect that illuminates the underlying structure of concept knowledge and the processes that integrate knowledge into current awareness. In this talk, Dr. Buchanan will discuss how the SPAML project has integrated newer computational linguistic methods, power estimations, and adaptive testing to design and implement the study. Preliminary results in context of these methods will be highlighted.

Applications of Bayesian Item Response Theory and Bounded Continuous Models

A seminar by Chelsea Parlett-Pelleriti, faculty member in the Fowler School of Engineering at Chapman University

March 29, 2022

Watch Recording of Chelsea Parlett-Pelleriti's talk


Item Response Theory (IRT) is commonly applied to validate surveys/measures and do computer adaptive testing. However IRT model structure which combines item level and subject level parameters can be applied to a broader range of models and analyses. For example, Beta Regression, its extension of Zero-One-Inflated Beta Regression, and Ordered Beta Regression which provide modeling frameworks for continuous bounded data. These models will be presented in a Bayesian Framework, and modifications to these models to include IRT parameterization will be suggested. Applications to metamemory data will be shown, and the basic ideas behind using cumulative logit, inflation, and logistic models with IRT parameterization will be discussed.

Can Foreign Aid Boost State Legitimacy During Conflict? Experimental Evidence from Education Aid in Afghanistan

A seminar by Associate Professor Dana Burde and Doctoral Candidate Rena Deitz from New York University

March 1, 2023

Watch Recording of Dana Burde and Rena Deitz's talk


Debate remains whether foreign aid undercuts government legitimacy by making recipient governments look weak or ineffective compared to high-performing international non-governmental organizations (INGOs), or whether citizens may reward recipient governments for facilitating the work of INGOs. This debate is especially salient in conflict-affected countries, for which government legitimacy is often crucial for de-escalating the conflict. This paper assesses such legitimacy effects from a randomized controlled trial in which community-based education classes substantially increased access to primary education for villages in rural Afghanistan. Prior to the Taliban takeover in August 2021, we found that the education intervention translated into increased government legitimacy. Importantly, these effects are apparent in places where government legitimacy was low due to political factors related to ethnicity, conflict exposure, and unpopularity of the incumbent president. The findings show that foreign aid may contribute to state legitimacy, particularly in places like Afghanistan where foreign aid can be crucial for material well-being.

Authors: Dana Burde, Rena Deitz, Joel Middleton, and Cyrus Samii

Estimating Test Score Reliability From Self-Selected Test Repeaters: An Application to the Duolingo English Test

A seminar by Duolingo's J.R. Lockwood

February 15, 2023


The Duolingo English Test (DET) is a computer-adaptive assessment of English proficiency used primarily for admissions decisions by English-medium universities. Stakeholders who make high-stakes decisions about individuals using DET test scores must have confidence that the scores would not vary consequentially for the same individual across repeated test administrations. The stability, or reliability, of test scores across repeated administrations is often estimated using naturally occurring data from people who take the test on more than one occasion. Because test takers choose if and when to retake a high-stakes assessment, observed repeater samples can yield biased estimators of test score reliability without adjustments. We develop a general method for reducing bias and improving precision of test score reliability estimated from test repeaters, and demonstrate its benefits using data from repeat test takers of the DET.

This work is joint with William C. M. Belzak, Senior Assessment Scientist, Duolingo. 

Event Archive


Estimating Child Mortality From Complex Household Survey Data

A seminar by Jessica Godwin, Statistical Demographer and the Training Director for the Center for Studies in Demography & Ecology (CSDE) at the University of Washington

November 30, 2022

Watch Recording of Jessica Godwin's talk


Child mortality is a key indicator of the overall health of a population, and the UN’s Sustainable Development Goals (SDGs) call for continued decreases in child mortality and improvements of data systems for geographic subpopulations in less developed countries by the year 2030. Subnational child mortality data in low and middle income countries (LMICs) that can be used to quantify progress toward the SDGs is often limited to decennial censuses and national household surveys, and small sample sizes within administrative subdivisions of a country can yield imprecise estimates. We describe a methodological framework and reproducible pipeline for subnational estimation of the under-five mortality rate (U5MR) using birth histories from the Demographic & Health Surveys. We distinguish existing methods by how they address the complex survey design (design-based versus model-based), whether they are implemented at the area or unit level, whether or not spatiotemporal smoothing is used, and which methods are appropriate as area-level sample sizes decrease. Estimates for 22 countries were made in collaboration with the UN Inter-agency Group for Mortality Estimation.

The Role of Markets and Norms in Regulating Disinformation

A seminar by Ceren Budak, Associate Professor at the School of Information at the University of Michigan

November 16, 2022

Watch Recording of Ceren Budak's seminar


Regulating disinformation requires a systemic approach. In this talk, Dr. Budak used a socioeconomic theory of regulation to systematically examine the set of potential regulators in this space and focus primarily on two of them: markets and norms. Focusing on markets, she summarized results from her recent audit studies and described the role ad firms and retailers play in providing monetary support for misinformation producers. Focusing on norms, she presented findings from semi-structured qualitative interviews with Reddit moderators and characterized how covid-19 misinformation is moderated across multiple communities and the role norms play in this process.

Using an Online Sample to Learn About an Offline Population

A seminar by Dennis Feehan, Associate Professor of Demography at the University of California, Berkeley

October 19, 2022

Watch Recording of Dennish Feehan's seminar


Online data sources offer tremendous promise to demography and other social sciences, but researchers worry that the group of people who are represented in online data sets can be different from the general population. We show that by sampling and anonymously interviewing people who are online, researchers can learn about both people who are online and people who are offline. Our approach is based on the insight that people everywhere are connected through in-person social networks, such as kin, friendship, and contact networks. We illustrate how this insight can be used to derive an estimator for tracking the digital divide in access to the Internet, an increasingly important dimension of population inequality in the modern world. We conducted a large-scale empirical test of our approach, using an online sample to estimate Internet adoption in five countries (n ≈ 15,000). Our test embedded a randomized experiment whose results can help design future studies. Our approach could be adapted to many other settings, offering one way to overcome some of the major challenges facing demographers in the information age.

Read the Paper

Effects of the Expanded Child Tax Credit on Employment and Well-Being

A seminar by Elizabeth Oltmans Ananat, Mallya Professor of Women and Economics at Barnard College, Columbia University

October 5, 2022

Watch Recording of Elizabeth Ananat's seminar


2021’s temporary expansion of the Child Tax Credit (CTC) was intended to reduce child poverty. Simulation studies posit, however, that the payments may lower parent employment, potentially offsetting poverty and hardship reduction effects. To empirically test for the expansion's real-world impacts, we apply a series of difference-in-differences analyses. Across two datasets and several model specifications, we find very small, inconsistently signed, and statistically insignificant impacts of the CTC on both labor force participation and employment among adults living in households with children, along with large impacts on well-being. Labor supply responses to the policy change were equally negligible for households for whom the CTC’s expansion eliminated a previous work incentive.

Inequitable Problems Need Equitable Solutions: An Augmented Synthetic Control Analysis of the Effects of Naloxone Access Laws on Fatal Opioid Overdose

A seminar by John R. Pamplin II, incoming assistant professor at Columbia's Mailman School of Public Health in the Department of Epidemiology


More than 1,000,000 people have died in the U.S. due to drug overdose since 1999, with nearly 100,000 overdose deaths occurring in 2021 alone: more than in any year prior. Many states have enacted policy interventions designed to curb the overdose crisis, including an array of laws designed to increase public access to Naloxone, a lifesaving drug that when administered in time, can reverse the effects of an opioid overdose. However, methodological challenges, such as widespread policy co-enactment, have limited our ability to assess the effectiveness of these interventions. Even less is known about the effects of these interventions for Black people, even though overdose mortality rates have increased faster for Black people than for any other group over the last five years. In this talk, Dr. Pamplin will describe some of the common methodological challenges to assessing effectiveness of policy interventions for the overdose crisis and ways to address them using a Heterogenous Treatment Effect framework. Additionally, he will discuss ongoing work that aims to fill these gaps by using Augmented Synthetic Control Models with considerations for variations by county, to assess the effectiveness of state-level Naloxone Access Laws, overall and equitably across communities of varying racial composition.

Towards Explainable Deep Survival Analysis Models with Guarantees

A seminar by George Chen, assistant professor of information systems at Carnegie Mellon University's Heinz College

April 20, 2022


Survival analysis is about modeling how much time will elapse until a critical event occurs. Examples of such critical events include death, disease relapse, readmission to the hospital, device failure, a customer ending a subscription service, or a convicted criminal reoffending. Recent machine learning advances in survival analysis have largely focused on architecting deep nets to achieve state-of-the-art prediction accuracy, with very little focus on whether the learned models are easy for application domain experts to interpret. In this talk, Dr. Chen will discuss ongoing work on developing neural survival models that not only achieve prediction accuracy competitive with the state-of-the-art but also aim to be explainable and come with statistical accuracy guarantees. Specifically, I present a new class of scalable deep kernel survival analysis based on automatically learning a similarity score between any two data points (e.g., patients). Dr. Chen and colleagues show experimental results on healthcare survival analysis datasets (in which they predict time until death for patients with various diseases) and also a music subscription dataset (in which they predict time until a customer ends their subscription).

Of Mis-Defined Causal Questions: The Case of Race and Multi-Stage Outcomes

A seminar by Issa Kohler-Hausmann (professor of law at Yale Law School and associate professor of sociology at Yale) and Lily Hu (PhD Candidate, Harvard, and soon to be assistant professor of philosophy at Yale University)

March 30, 2022


A number of influential causal inference researchers have asked the following question: Can we quantify an effect of race on a decision that takes place downstream of other decisions that were themselves causally affected by race? If so, how? A recent paper by such researchers addressing police use of force argued that the existence of racial discrimination at a stage prior to the decision of interest biases standard estimates of a causal effect of race on that decision. This work subsequently generated a flurry of debate among researchers about the exact conditions under which certain race-causal estimands can be properly identified. In this talk, we address conceptual questions that must be answered prior to methodological ones of whether and under what assumptions these race-causal estimands can be identified. Without addressing these conceptual questions, it is unclear precisely which, if any, causal dependencies these estimands claim to represent. We argue that the existence of race selection in the prior stage poses a conceptual problem with the causal quantities that are the target of identification. Namely, the target causal quantities have been mis-defined in this literature. Finally, we address whether these race-causal estimands could plausibly correspond to legal concepts of discrimination. 

Prioritize Patients Not Patience - Using optimal test assembly to shorten patient reported outcome measures: A case study of the PHQ-9

A seminar by Daphna Harel, associate professor of applied statistics at NYU

February 23, 2022

How can we learn more by asking less? Dr. Harel demonstrates ways to reduce the burden on survey respondents by shortening surveys without compromising the information received.

Watch Recording of Daphna Harel's seminar


Patient-reported outcome measures are widely used to assess respondent experiences, well-being, and treatment response in clinical trials and cohort-based observational studies in both medicine and psychological studies. However, respondents may be asked to respond to many different scales in order to provide researchers and clinicians with a wide array of information regarding their experiences. Therefore, collecting such long and cumbersome patient-reported outcome measures may burden respondents and increase research costs. However, little research has been conducted on optimal, replicable, and reproducible methods to shorten these instruments. In this talk, Dr. Harel proposes the use of mixed integer programming through Optimal Test Assembly as a method to shorten patient-reported outcome measures. She will describe this through a case study of the Patient Health Questionnaire - 9.

A Multistate Approach for Mediation Analysis in the Presence of Semi-competing Risks with Application in Cancer Survival Disparities

A seminar by Linda Valeri, assistant professor in biostatistics at the Columbia University Mailman School of Public Health. 

February 2, 2022

Assistant Professor Linda Valeri talks about how new approaches to mediation analysis can help understand racial disparities in cancer survival.

Watch Recording of Linda Valeri's seminar


We provide novel definitions and identifiability conditions for causal estimands that involve stochastic interventions on non-terminal time-to-events that lie on the pathway between an exposure and a terminal time-to-event outcome. Causal contrasts are estimated in continuous time within a multistate modeling framework accounting for semi-competing risks and analytic formulae for the estimators of the causal contrasts are developed. We employ this novel methodology to investigate the role of delaying treatment uptake in explaining racial disparities in cancer survival in a cohort study of colon cancer patients.


Police Violence Reduces Civilian Cooperation and Engagement

A seminar by Desmond Ang, applied economist and assistant professor at the Harvard Kennedy School of Government

November 10, 2021


How do high-profile acts of police brutality affect public trust and cooperation with law enforcement? To investigate this question, we develop a new measure of civilian crime reporting that isolates changes in community engagement with police from underlying changes in crime: the ratio of police-related 911 calls to gunshots detected by ShotSpotter technology. Examining detailed data from eight major American cities, we show a sharp drop in both the call-to-shot ratio and 911 call volume immediately after the police murder of George Floyd in May 2020. Notably, reporting rates decreased significantly in both non-white and white neighborhoods across the country. These effects persist for several months, and we find little evidence that they were reversed by the conviction of Floyd’s murderer. Together, the results illustrate how acts of police violence may destroy a key input into effective law enforcement and public safety: civilian engagement and reporting. Joint work with Panka Bencsik, Jesse Bruhn and Ellora Derenoncourt.

Optimal Tests of the Composite Null Hypothesis Arising in Mediation Analysis

A seminar by Caleb Miles, assistant professor in the Department of Biostatistics at the Columbia University Mailman School of Public Health

October 27, 2021

Watch Recording of Caleb Miles' Seminar


The indirect effect of an exposure on an outcome through an intermediate variable can be identified by a product of regression coefficients under certain causal and regression modeling assumptions. Thus, the null hypothesis of no indirect effect is a composite null hypothesis, as the null holds if either regression coefficient is zero. A consequence is that existing hypothesis tests are either severely underpowered near the origin (i.e., when both coefficients are small with respect to standard errors) or do not preserve type 1 error uniformly over the null hypothesis space. We propose hypothesis tests that (i) preserve level alpha type 1 error, (ii) meaningfully improve power when both true underlying effects are small relative to sample size, and (iii) preserve power when at least one is not. One approach gives a closed-form test that is minimax optimal with respect to local power over the alternative parameter space. Another uses sparse linear programming to produce an approximately optimal test for a Bayes risk criterion. We provide an R package that implements the minimax optimal test.

Using Machine Learning to Increase Equality in Healthcare and Public Health

A seminar by Emma Pierson, assistant professor of computer science at the Jacobs Technion-Cornell Institute at Cornell Tech and the Technion

October 13, 2021

Watch Recordingof Emma Pierson's Seminar


Our society remains profoundly unequal. Worse, there is abundant evidence that algorithms can, improperly applied, exacerbate inequality in healthcare and other domains. This talk pursues a more optimistic counterpoint -- that data science and machine learning can also be used to illuminate and reduce inequality in healthcare and public health -- by presenting vignettes about women's healthCOVID-19, and pain.

Understanding Human Factors in Forensic Science using Item Response Theory

A seminar by Amanda Luby, assistant professor of statistics at Swarthmore College

September 15, 2021

Watch Recording of Amanda Luby's seminar


Forensic science often involves the evaluation of crime-scene evidence to determine whether it matches a known-source sample, such as determining if a fingerprint or DNA was left by a suspect. Even as forensic measurement and analysis tools become increasingly sophisticated, final source decisions are often left to individual examiners' interpretation. However, the current approach to characterizing uncertainty in forensic decision-making has largely centered around conducting error rate studies (in which examiners evaluate a set of items consisting of known-source evidence) and calculating aggregated error rates. This approach is not ideal for comparing examiner performance, as decisions are not always unanimous and error frequency is likely to vary depending on the quality of the physical evidence. Item Response Theory (IRT), a class of statistical methods used prominently in educational testing, is one approach that accounts for differences in proficiency among participants and additionally accounts for varying difficulty among items. Using simple IRT models, more elaborate decision tree models, and extensions, along with data from the FBI “Black Box” and “White Box” studies, Dr. Luby and her team find that there is considerable variability in print quality assessments, inconclusive rates, perceived difficulty, and minutiae identification even when examiners largely agree on a final source decision. In this talk, Dr. Luby will review some of our recent advances, outline challenges in applying IRT in practice, and discuss the implications of these findings within the criminal justice system.

Disappearing Students During COVID-19? Evidence from Large-Scale Messaging Data

A seminar by Rebecca Johnson, assistant professor in the Program in Quantitative Social Science at Dartmouth College

September 29, 2021


A large body of research documents inequalities in family-school interactions. Yet the methodologies used---either intensive ethnographic observation of families and teachers or survey-based measures that ask families to self-report their school involvement---create gaps in our understanding of how family-school interactions impact inequality. These gaps became more apparent during COVID-19, as policy concerns emerged about families “disappearing’’ from contact during virtual learning confronted methods ill-suited to measure these changes. The present project draws upon a randomized controlled trial of “TeacherText”, a web and mobile-based application that makes it easier for teachers and school administrators to interact with families (e.g., auto-translations; training on positive messages). We use large-scale metadata and messaging data from the platform (~340,000 messages between 208 school staff and 4,298 parent-student dyads; 6 DC Public and Public Charter schools; messaging in 2019-2020 before and during COVID-19 online pivots), linked to administrative data from the district’s student information system (SIS), to investigate two questions about family-school interactions during virtual pivots (pre-analysis plan). First, we show that when examining interactions longitudinally, disappearance from contact is much rarer than two other statuses: interactions both before and after the COVID-19 virtual pivot (modal status) or no interactions either period. Then, we use text analysis to highlight two mechanisms for how school staff continued to engage families: the use of tools to simultaneously interact with many families and the platform expanding beyond academic-focused messages to messages connecting families with social services. Concluding, we discuss the benefits and challenges of using “digital trace data” to measure family-school interactions. Joint work with Vicky Mei.

Reimagining the Role of Schools in Society: A Conversation and Call to Action

May 26, 2021

Watch the recording

For 200 years, we have debated the role of schools in U.S. society. Today, we face an unprecedented opportunity to reexamine long-standing assumptions and to include voices that have been marginalized in the construction of our current systems. As we struggle with the impacts of a global pandemic and ongoing racial injustice, how do we take this moment as an opportunity to re-envision the role of schools in U.S. society? How do we enact this fresh vision for the 2021-2022 school year and beyond?

Our forum aims to re-imagine a role for schools that rectifies societal inequities rather than replicates them, embraces new opportunities to meet student needs, and "builds back better" in the areas of mental health, teaching and learning, and racial and social justice. Join us for a vigorous, forward-thinking conversation and call to action at this unprecedented moment, co-sponsored by the Institute of Human Development and Social Change (IHDSC), the Institute of Education Sciences-funded Predoctoral Interdisciplinary Research Training (IES-PIRT) program, the Center for Practice and Research at the Intersection of Information, Society, and Methodology (PRIISM), and the Research Alliance for New York City Schools.

This conversation will feature research and practice ideas from a new generation of education scholars, and a moderated discussion and Q&A from a panel of seasoned education leaders.

PRIISM Data Science For Social Impact 

May 12, 2021

Watch the recording

PRIISM, funded by the Moore Sloan Data Science Environment at NYU, created a competitive social impact research fellowship program that awarded funding and provided mentorship to five NYU graduate students, with an emphasis on awarding fellowships to students from groups currently underrepresented in STEM fields. These five students were matched with a research project at NYU. As a reflection after the end of their fellowships, we organized an event with a series of short talks for the fellows to highlight the challenges and opportunities that arise when data science tools are used to understand and make a positive impact on the world around us.

The event featured the work of the following five NYU research teams:

  1. The THINK: Tracking Hope in Nairobi and Karachi project uses a regression discontinuity design to understand the effect of education access on hope, peace, and conflict among youth in Nairobi and Karachi. The PIs on this project, Elisabeth King, Dana Burde, Jennifer Hill and Daphna Harel mentored, Dorothy Seaman, PRIISM Social Impact Research Fellow. 
  2. A Consensus Among Asset Managers on Fostering Counterintuitive Skill Development project tries to understand the role of organizational practices and structures needed for asset managers to make investment decisions with sustainability in mind. PI Tracy Van Holt mentored George Perrett, PRIISM Social Impact Research Fellow. 
  3. The Public Safety Lab's Jail Data Initiative is an effort to collect and match daily jail records with criminal records, providing anonymized data to research and policy communities. Anna Harvey, PI on this project mentored, Chutang Luo, PRIISM Social Impact Research Fellow. 
  4. The Háblame Bebé & Nurse-Family Partnership project examined infant brain functioning in relation to experiences of maternity leave and physiological stress. Natalie Brito, PI on this project mentored, John Zhang, PRIISM Social Impact Research Fellow. 
  5. The Segregation of the School Segregation Literature project presented the role of implicit bias in school segregation research citations by conducting a bibliometric network analysis of peer-reviewed publications. Ying Lu and L'Heureux Lewis-Mccoy (PIs), mentoried Evaristus Ezekwem, PRIISM Social Impact Research Fellow. 

Quasi-Experimental Methods for Estimating the Impact of Vacant Lot Remediation on Crime

A seminar by John MacDonald, professor of criminology and sociology at the University of Pennsylvania

April 28, 2021

Watch Recording of John MacDonalds' seminar 


Place-based blight remediation programs have gained popularity in recent years as a crime reduction approach. This study estimated the impact of a citywide vacant lot greening program in Philadelphia on changes in crime over multiple years, and whether the effects were moderated by nearby land uses.
The vacant lot greening program was assessed using quasi-experimental and experimental designs. Entropy distance weighting was used in the quasi-experimental analysis to match control lots to be comparable to greened lots on pre-existing crime trends. Fixed-effects difference-in-differences models were used to estimate the impact of the vacant lot greening program in quasi-experimental and experimental analyses.
Vacant lot greening was estimated to reduce total crime and multiple subcategories in both the quasi-experimental and experimental evaluations. Remediating vacant lots had a smaller effect on reducing crime when they were located nearby train stations and alcohol outlets. The crime reductions from vacant lot remediations were larger when they were located near areas of active businesses. There is some suggestive evidence that the effects of vacant lot greening are larger when located in neighborhoods with higher pre-intervention levels of social cohesion.
The findings suggest that vacant lot greening provides a sustainable approach to reducing crime in disadvantaged neighborhoods, and the effects may vary by different surrounding land uses. To better understand the mechanisms through which place-based blight remediation interventions reduce crime, future research should measure human activities and neighborly socialization in and around places before and after remediation efforts are implemented.

Marginal Structural Models for Causal Inference with Continuous-Time Treatments

A seminar by Liangyuan Hu, assistant professor of biostatistics in the Department of Population Health Science & Policy at Mount Sinai School of Medicine

April 14, 2021

Watch Recording of Liangyuan Hu's seminar


Public health research often involves evaluating the effects of continuous-time treatments. Causal inference has traditionally focused on the estimation of causal effects of a number of treatments defined at baseline. In the case where treatment assignment is time-dependent, the treatment is often categorized in terms of time intervals for treatment initiation. This categorization can lead to the coarsening of information on treatment initiation and fails to answer the question of the causal effect of actual treatment timing. The marginal structural model, pioneered by Robins and colleagues, has been widely used for causal inference. It is easy to implement and provides a general infrastructure for the weighting based methods to address confounding, particularly time-varying confounding. In this talk, Dr. Hu will show how the marginal structural model can be used to capture the causal effect of the continuous-time treatment when treatment initiation is either static or dynamic. Dr. Hu will derive estimation strategies amenable to marginal structural models to overcome complications frequently encountered in observational healthcare data, including incomplete treatment initiation time and censored survival outcomes. A case study applying our approaches to a large-scale electronic health record data will estimate the optimal antiretroviral therapy initiating rules for patients presenting with HIV/TB coinfection and HIV-infected adolescents. New insights that can be gained relative to findings from randomized trials will be discussed. Finally, Dr. Hu will discuss how the methods can be used and extended to address important emerging questions related to cardiovascular and COVID-19 diseases.

Does Science Self-Correct? What We've Learned At Retraction Watch

A seminar by Ivan Oransky

April 9, 2021

Co-sponsored event with CoHRR

Ivan Oransky, MD, is co-founder of Retraction Watch, vice president of editorial at Medscape, and distinguished writer in residence at New York University's Arthur Carter Journalism Institute.  He also serves as president of the Association of Health Care Journalists. Ivan previously was global editorial director of MedPage Today, executive editor of Reuters Health, and held editorial positions at Scientific American and The Scientist. A 2012 TEDMED speaker, he is the recipient of the 2015 John P. McGovern Medal for excellence in biomedical communication from the American Medical Writers Association, and in 2017 was awarded an honorary doctorate in civil laws from The University of the South (Sewanee).

Dropping Standardized Testing for Admissions: Differential Variance and Access

A seminar by Nikhil Garg, assistant professor at Cornell Tech

March 31, 2021

Watch Recording of Nikhil Garg's seminar


The University of California suspended through 2024 the requirement that applicants from California submit SAT scores, upending the major role standardized testing has played in college admissions. We study the impact of such decisions and its interplay with other policies on admitted class composition. We consider a theoretical framework to study the effect of requiring test scores on academic merit and diversity in college admissions. The model has a college and set of potential students. Each student has observed application components and group membership, as well as an unobserved noisy skill level generated from an observed distribution. The college is Bayesian and maximizes an objective that depends on both diversity and merit. It estimates each applicant’s true skill level using the observed features and then admits students with or without affirmative action. We characterize the trade-off between the (potentially positive) informational role of standardized testing in college admissions and its (negative) exclusionary nature. Dropping test scores may exacerbate disparities by decreasing the amount of information available for each applicant, especially those from non-traditional backgrounds. However, if there are substantial barriers to testing, removing the test improves both academic merit and diversity by increasing the size of the applicant pool. Finally, using application and transcript data from the University of Texas at Austin, we demonstrate how an admissions committee could measure the trade-off in practice to better decide whether to drop their test scores requirement. Joint work with Hannah Li and Faidra Monachou. Read the full paper

Statistical Learning with Electronic Health Records Data

A seminar by Jessica Gronsbell, assistant professor at the University of Toronto

March 17, 2021

Watch Recording of Jessica Gronsbell's seminar


The adoption of electronic health records (EHRs) has generated massive amounts of routinely collected medical data with potential to improve our understanding of healthcare delivery and disease processes. In this talk, Dr. Gronsbell will discuss methods that bridge classical statistical theory and modern machine learning tools in an effort to extract reliable insights from imperfect EHR data. She will focus primarily on (i) the challenges in obtaining annotated outcome data, such as presence of a disease or clinical condition, from patient records and (ii) how leveraging unlabeled examples to improve model estimation and evaluation can reduce the annotation burden.  

Revisiting the Gelman-Rubin Diagnostic

A seminar by Christina Knudson, assistant professor of statistics at the University of St. Thomas

February 24, 2021

Watch Recordingof Christina Knudson's seminar


Gelman and Rubin's (1992) convergence diagnostic is one of the most popular methods for terminating a Markov chain Monte Carlo (MCMC) sampler. Since the seminal paper, researchers have developed sophisticated methods for estimating variance of Monte Carlo averages. We show that these estimators find immediate use in the Gelman-Rubin statistic, a connection not previously established in the literature. We incorporate these estimators to upgrade both the univariate and multivariate Gelman-Rubin statistics, leading to improved stability in MCMC termination time. An immediate advantage is that our new Gelman-Rubin statistic can be calculated for a single chain. In addition, we establish a one-to-one relationship between the Gelman-Rubin statistic and effective sample size. Leveraging this relationship, we develop a principled termination criterion for the Gelman-Rubin statistic. Finally, we demonstrate the utility of our improved diagnostic via examples.

COVID Tracking as a Prism for Refracting Tech Ethics

A seminar by Laura Norén, VP of Privacy and Trust at Obsidian Security

February 10, 2021

Watch Recording of Laura Noren's seminar


COVID landed in a culture accustomed to having "an app for that" - whatever "that" may be - and has now generated hundreds of apps designed to address COVID. The technical and social variation from app to app and from one community of engagement to the next provides an exquisite refractory prism for reflection about technical ethics, the "good" outcome, and the longstanding tension between utilitarian ethics (generally favored by the tech community) and virtue or duty ethics (more frequently called upon within the institutions of family, religion, and/or outside the US context). In this talk, standard data project management questions about what an app can/should do, who pays for it, what type of data to collect, how long to retain it, with whom to share it, which other data streams should be combined, what types of predictions and decisions to make with it, and what context these decisions will occur in are considered. Scoffing at the "app for that" answer is short-sighted. Unpacking the social and ethical impacts that accrue along the way is particularly important as COVID apps continue to proliferate during the vaccine rollout and hybrid open/closed urban reality. More broadly, working through these questions in a context that impacts us all provides a particularly sticky set of lessons and questions pertinent to many processes of technical intervention in social life and the public sphere.


Spatially-coupled hidden Markov models for short-term forecasting of wind speeds

A seminar by Vianey Leos Barajas, Assistant Professor at the University of Toronto, Dept. of Statistical Sciences and School of the Environment

November 18, 2020

Watch Recording of Vianey Leos Barajas seminar


Hidden Markov models (HMMs) provide a flexible framework to model time series data where the observation process, Yt, is taken to be driven by an un-derlying latent state process, Zt. In this talk, we will focus on discrete-time, finite-state HMMs as they provide a flexible framework that facilitates extending the basic structure in many interesting ways.

HMMs can accommodate multivariate processes by (i) assuming that a single state governs the M observations at time t, (ii) assuming that each observation process is governed by its own HMM, irrespective of what occurs elsewhere, or (iii) a balance between the two, as in the coupled HMM framework. Coupled HMMs assume that a collection of M observation processes is governed by its respective M state processes. However, the mth state process at time t, Zm,t not only depends on Zm,t−1 but also on the collection of state processes Z−m,t−1. We introduce spatially-coupled hidden Markov models whereby the state processes interact according to an imposed neighborhood structure and the observations are collected across S spatial locations. We outline an application to short-term forecasting of wind speed using data collected across multiple wind turbines at a wind farm.

Digital Trace Data: Modes of Data Collection, Applications, and Errors

A seminar by Frauke Kreuter, Professor of Statistics and Data Science at the Ludwig-Maximilians-University of Munich

October 28, 2020

Watch Recording of Frauke Kreuter's PRIISM seminar


Digital traces, left by individuals when they act or interact online provide researchers with new opportunities for studying social and behavioral phenomena. This talk covers digital trace data and their use in the computational social sciences. Key to a successful use of digital trace data is a clear vision of the research goals. Knowing how to match available data and research needs is just as important as the evaluation of data quality, and respecting respondents privacy. We will discuss inferential challenges and possible ways to deal with them, finding the right measures to ensure reproducibility and replicability, and how to create sufficient transparency when working with digital trace data. 

Bayesian Canonicalization of Voter Registration Files 

A seminar by Andee Kaplan, Assistant Professor, Colorado State University

October 14, 2020

Watch Recording of PRIISM Seminar with Andee Kaplan


Entity resolution (record linkage or de-deduplication) is the process of merging noisy databases to remove duplicate entities in the absence of a unique identifier. One major challenge of utilizing linked data is identifying the canonical (or representative) records without duplicate information to pass to an inferential downstream task. The canonicalization step is particularly crucial after entity resolution, as a multi-stage approach allows for multiple analyses to be performed on the same linked data. While this approach can be scalable, the uncertainty from each stage of the entity resolution process is not naturally propagated throughout the pipeline and into the downstream task. In this talk, Dr. Kaplan presented five fully unsupervised methods to choose canonical records from linked data, including a fully Bayesian approach which propagates the error from linkage through to the downstream inference. This multi-stage approach is illustrated and evaluated on simulated entity resolution data sets as well as voter registration data available from the North Carolina State Board of Elections (NCSBE). The NCSBE has released a snapshot of their voter registration databases regularly since 2005, providing a changing view of the voter registration information over time as new voters register, voters are dropped from the register, and voter information is updated. Dr. Kaplan and her team compared the proposed canonicalization methods after performing entity resolution on five snapshots and examined the relationship between demographic information and party affiliation on the resulting canonical data sets.

A unified framework for the latent variable approach to statistical social network analysis

A seminar by Samrachana Adhikari, Assistant Professor, NYU School of Medicine

September 30, 2020

Watch Recording of PRIISM Seminar with Samrachana Adhikari


While social network data provide new opportunities to understand complex relational mechanisms, they also present modeling challenges. Units of observation in social network are often not independent and identically distributed, as commonly assumed in many statistical models, and hence require new tools to analyze the data, to make inference and address issues of model selection and goodness of fit, while accounting for the complex dependence structures. Many recent developments have been made in statistical methodologies to account for such complications. In particular, latent variable network models that accommodate edge correlations implicitly, by assuming an underlying latent factor, are increasing in popularity. Although these models are examples of what is a growing body of research, much of the research is focused on proposing new models or extending others. There has been very little work on unifying the models in a single framework.

In this talk, Dr. Adhikari first reviewed different latent variable network models for analyzing social network data. She then introduced a complete framework that organizes existing latent variable network models within an integrative generalized additive model, called the Conditionally Independent Dyad (CID) models. The class of CID models includes existing network models that assume dyad (or edge) independence conditional on latent variables and other components in the model.  By presenting analysis of advice seeking network of teachers as an example, she illustrated the utility of the proposed framework. Dr. Adhikari ended with discussion of existing and future extensions of the proposed class of network models to incorporate multiple related networks.

Understanding reasons for differences in intervention effects across sites

A seminar by Kara Rudolph, Assistant Professor, Columbia University

September 16, 2020

Watch Recording of Kara Rudolph's seminar


Multi-site interventions are common in public health, public policy, and economics. Do we expect an intervention effect in one site to be the same as the intervention effect in another site? In many cases, we would answer “no”. First, there could be differences in site-level variables related to intervention design/implementation or contextual variables, like the economy, that would modify intervention effectiveness. Such variables suggest that the intervention either is not the same or does not work the same in the two sites. Second, there could be differences in person-level variables—population composition—across sites that also modify intervention effectiveness. There could also be differences in the mechanisms producing intermediate variables on the pathway from the intervention to the outcome. The latter two reasons could cause intervention effects to differ across sites even if the interventions are structured and implemented in an identical fashion. An example of this, which we use to motivate this work, is from the Moving to Opportunity (MTO) trial. MTO is a five-site, encouragement-design intervention in which families in public housing were randomized to receive housing vouchers and logistical support to move to low-poverty neighborhoods. When we started this work, there had been no quantitative examination of the underlying reasons for differences in MTO’s effects across sites. We propose doubly robust and efficient estimators to predict total, direct, and indirect effects of treatment in a new site based on data in source sites. The extent to which these predicted estimates correspond with the observed estimates can shed light on reasons for site differences in intervention effects.

Inferential LASSO in Single Case Experimental Design to Estimate Effect Size

A seminar by Jay Verkulien, Associate Professor, CUNY

February 26, 2020


Single case experimental design (SCED) is widely used in areas such as behavior modification, rehabilitation medicine, or training of special participants. The general goal in most analyses of SCED data is to assess the effect size of the intervention, often for subsequent processing in, for example, a meta-analysis. The design itself is a strong one for this purpose, particularly randomized multiple baseline designs (MBD). However, SCEDs tend to be fairly small N designs due to the nature of the target population and intervention. This means that regression analysis of SCED data is already running fairly lean in terms of error df even before one factors in covariates needed to model trend, external events, or other nuisance data features. These additional variables are not typically of substantive interest but need to be accounted for properly. Linear and generalized linear models do not cope with this situation well and fail completely when the number of predictors, P, exceeds N. Many machine learning methods have been devised to address this problem, which occurs in other areas such as genetic microarray studies, text mining, and neuroscience. These methods all incorporate some kind of regularization to provide enough structure to allow estimation to proceed. They are also useful even when N > P to avoid overfitting or to help determine if a visual object on a graph is "real" or not. However, they are fundamentally predictive in nature and do not work well when the goal involves interpreting the resulting regression components. Recent research has focused on adapting methods such as tree-based analyses or regularized regression to causal effect estimation. While the talk provides a conceptual overview to the problem of high dimension broadly, I focus on the inferential LASSO (e.g., Belloni, Chernozhukov, & Hansen, 2014) as a promising new method that is a straightforward extension of multiple regression. (Joint work with Mariola Moeyaert, SUNY Albany).

Measuring Poverty

A seminar by Chaitra Nagaraja, Associate Professor, Fordham University

February 12, 2020


The development of any measure, particularly an official one, is a mixture of ideology, convenience, and chance. To illustrate this, I will use poverty measurement as an example. The primary choice in measuring poverty is between taking an absolute versus a relative approach. The former is favored by the U.S. whereas the latter is preferred by the European Union. In this talk, I will compare various measures (with a focus on the U.S.) using a historical perspective to understand the effects of those statistics on a country’s residents.


Born in the Wrong Months? The role of Kindergarten entrance age cut-off in students’ academic progress in NYC public schools

A seminar by Ying Lu, Associate Professor, New York University
November 20, 2019 


The age cut-off for public Kindergarten entrance in New York City is December 31, while the common practice in the country is to have age cut-off in September or earlier. This means that on average, about a quarter of NYC public school Kindergarteners start formal schooling younger than five in a public school setting. Extensive research has suggested that children exhibit different social and cognitive development growth trajectories in early childhood. In particular, students who start Kindergarten at an older age (earlier birth month) are better prepared socially and cognitively for formal schooling. On the other hand, other research also argues that relative advantage of age disappears as students get older. In this paper, we use proprietary data from NYC DOE to show how birth month plays an important role in determining the path of children’s academic progress. Following a birth cohort of students (born in 2005) starting Kindergarten till 7th grade, and using discrete event history analysis, we show that students who are born in later birth months, especially those who were born after September 1 (entering Kindergarten before turning 5) show higher risk of repeating grades (whether voluntarily or involuntarily) and being classified into the special education category throughout elementary school. The academic progression gap widens further when considering other factors such as students’ race, gender, and socio-economic backgrounds. We further use longitudinal growth curve models to explore the patterns of students grade level achievements over time (3rd to 7th grades common core test) considering their ages at Kindergarten entrance as well as their experiences of academic progression (ever had been held back), and the interplays of these factors with students demographic and socioeconomic characteristics. A regression discontinuity design was also employed to explore the impact of holding very young students back at earlier grades on their academic achievement trajectories.

Sensitivity analyses for unobserved effect moderation when generalizing from trial to population

November 6, 2019 
A seminar by Elizabeth Stuart, Associate Dean and Professor at John Hopkins


In the presence of treatment effect heterogeneity, the average treatment effect (ATE) in a randomized controlled trial (RCT) may differ from the average effect of the same treatment if applied to a target population of interest. But for policy purposes we may desire an estimate of the target population ATE. If all treatment effect moderators are observed in the RCT and in a dataset representing the target population, then we can obtain an estimate for the target population ATE by adjusting for the difference in the distribution of the moderators between the two samples. However, that is often an unrealistic assumption in practice. This talk will discuss methods for generalizing treatment effects under that assumption, as well as sensitivity analyses for two situations: (1) where we cannot adjust for a specific moderator observed in the RCT because we do not observe it in the target population; and (2) where we are concerned that the treatment effect may be moderated by factors not observed even in the RCT. Outcome-model and weighting-based sensitivity analysis methods are presented. The methods are applied to examples in drug abuse treatment. Implications for study design and analyses are also discussed, when interest is in a target population ATE.

Permutation Weighting: A classification-based approach to balancing weights

October 23, 2019 
A seminar by Drew Dimmery from Facebook


 This work provides a new lens through which to view balancing weights for observational causal inference as approximating a notional target trial. We formalize this intuition and show that our approach -- Permutation Weighting -- provides a new way to estimate many existing balancing weights. This allows the estimation of weights through a standard binary classifier (no matter the cardinality of treatment). Arbitrary probabilistic classifiers may be used in this method; the hypothesis space of the classifier corresponds to the nature of the balance constraints imposed through the resulting weights. We provide theoretical results which bound bias and variance in terms of the regret of the classifier, show that these disappear asymptotically and demonstrate that our classification problem directly minimizes imbalance. Since a wide variety of existing methods may be estimated through this regime, the approach allows for direct model comparison between balancing weights (both existing methods and new ones) based on classifier loss as well as hyper-parameter tuning using cross-validation. We compare estimating weights with permutation weighting to minimizing the classifier risk of a propensity score model for inverse propensity score weighting and show that the latter does not necessarily imply minimal imbalance on covariates. Finally, we demonstrate how the classification-based view provides a flexible mechanism to define new balancing weights; we demonstrate this with balancing weights based on gradient-boosted decision trees and neural networks. Simulation and empirical evaluations indicate that permutation weighting outperforms existing weighting methods for causal effect estimation.

Secrecy, Criminal Justice, and Variable Importance

September 25, 2019
A seminar by Cynthia Rudin, Professor of Computer Science, Electrical and Computer Engineering, and Statistical Science at Duke University


 The US justice system often uses a combination of (biased) human decision makers and complicated black box proprietary algorithms for high stakes decisions that deeply affect individuals. All of this is still happening, despite the fact that for several years, we have known that interpretable machine learning models were just as accurate as any complicated machine learning methods for predicting criminal recidivism. It is much easier to debate the fairness of an interpretable model than a proprietary model. The most popular proprietary model, COMPAS, was accused by the ProPublica group of being racially biased in 2016, but their analysis was flawed and the true story is much more complicated; their analysis relies on a flawed definition of variable importance that was used to identify the race variable as being important. In this talk, I will start by introducing a very general form of variable importance, called model class reliance. Model class reliance measures how important a variable is to any sufficiently accurate predictive model within a class. I will use this and other data-centered tools to provide our own investigation of whether COMPAS depends on race, and what else it depends on. Through this analysis, we find another problem with using complicated proprietary models, which is that they seem to be often miscomputed. An easy fix to all of this is to use interpretable (transparent) models instead of complicated or proprietary models in criminal justice.

Health Benefits of Reducing Air Traffic Pollution: Evidence from Changes in Flight Paths

September 18, 2019 
A seminar by Augustin de Coulon, IZA Institute of Labor Economics 


This paper investigates externalities generated by air transportation pollution on health. As a source of exogenous variation, we use an unannounced five-month trial that reallocated early morning aircraft landings at London Heathrow airport. Our measure of health is prescribed medications pending on conditions known to be aggravated by pollution, especially sleep disturbances. Compared to the control regions, we observe a significant and substantial decrease in prescribed drugs for respiratory and central nervous system disorders in the areas subjected to reduced air traffic between 4:30am and 6.00am. Our findings suggest therefore a causal influence of air traffic on health conditions.

Data Tripper: Authorship Attribution Analysis of Lennon-McCartney Songs

September 6, 2019
A seminar by Mark Glickman, Senior Lecturer in Statistics at Harvard


The songwriting duo of John Lennon and Paul McCartney, the two founding members of the Beatles, have composed some of the most popular and memorable songs of the last century. Despite having authored songs under the joint credit agreement of Lennon-McCartney, it is well-documented that most of their songs or portions of songs were primarily written by exactly one of the two. Some Lennon-McCartney songs are actually of disputed authorship. For Lennon-McCartney songs of known and unknown authorship written and recorded over the period 1962-66, we extracted musical features from each song or song portion. These features consist of the occurrence of melodic notes, chords, melodic note pairs, chord change pairs, and four-note melody contours. We developed a prediction model based on variable screening followed by logistic regression with elastic net regularization. We applied our model to the prediction of songs and song portions with unknown or disputed authorship.

This talk is co-sponsored by the NYU Stern Department of Technology, Operations, and Statistics

Urban Modeling's Future - a Big Data Reality

May 8, 2019 

A seminar by Debra Laefer, Professor of Civil and Urban Engineering, New York University


Until recently, the history of urban modeling has relied on relatively simplified models. This has been a function of data collection limitations and computing barriers. Consequently, two streams of modeling have emerged. At a local level, highly detailed Building Information Modeling has dominated. At a broader scale, CityGML has been the major player. The absence of key pieces of data and major inconsistencies in the respective schema of the systems prevent their interoperability. While efforts continue to align the systems, recent tandem advancements in remote sensing technology and distributed computing now offer a complete circumvention of those problems and have lifted the previous restrictions in data acquisition and processing. This lecture will show the emerging state-of-technology in remote sensing and BigData computing and present some of the clear value of such a workflow, as well as the remaining challenges from both the remote sensing and the computing side.

How Education Systems Undermine Gender Equity

May 1, 2019

A seminar by Joseph Cimpian, Associate Professor of Economics and Education Policy, New York University


 From the time students enter kindergarten, teachers overestimate the abilities of boys in math, relative to behaviorally and academically matched girls, contributing to a gender gap favoring boys in both math achievement and confidence. Using data from numerous nationally representative studies spanning kindergarten through university level, as well as experimental evidence, I demonstrate how girls and young women face discrimination and bias throughout their academic careers and suggest that a substantial portion of the growth in the male–female math achievement gap is socially constructed. Each of the studies leads to a broader set of considerations about why females are viewed as less intellectually capable than their male peers. The studies also demonstrate that biases can be exhibited and perpetuated by members of negatively stereotyped groups (e.g., female teachers demonstrate greater bias against girls than do male teachers), and raise questions about the root causes of their biases and the long-term effects of being negatively stereotyped oneself. This research also suggests that comparing boys and girls on metrics such as standardized tests and grades may contribute to a false belief that education systems promote the success of females. Together, the studies suggest several implications for research, teacher professional development, and policy.

Modelling intergenerational exchanges using models for multivariate longitudinal data with latent variables in the presence of zero excess.

April 17, 2019

A seminar by Irini Moustaki, Professor and Deputy Head of Department (Teaching) at London Scholl of Economics


In this talk we will discuss some primary results from the modelling of dyadic data that provide information on intergenerational exchanges in the UK. We will use longitudinal data from three waves of the UK Household Longitudinal Survey, to study and explain associations between exchanges of support from the respondent to their parents and to their children. The data resemble the structure of dyadic data, they are collected across time and they are also multivariate because constructs of interest are measured by multiple indicators. Support is measured by a set of binary indicators of different kinds of help.
We propose two different joint models of bidirectional exchanges with support given and support received treated as a multivariate response, and covariances between responses measuring the extent of reciprocation between generations. Moreover, joint modelling of longitudinal data allows for the possibility that reciprocation may occur contemporaneously or may be postponed until the donor is in need of help or the recipient is in a position to reciprocate.

Difference-in-Differences Estimates of Demographic Processes

April 10, 2019

A seminar by Lawrence Wu, Professor of Sociology and Director of NYU Population Center, New York University


We examine difference-in-differences procedures for estimating the causal effect of treatment when the outcome is a single-decrement demographic process. We use the classic case of two groups and two periods to contrast a standard and widely-used linear probability difference-in-differences estimator with an analogous proportional hazard difference-in-differences estimator. Formal derivations and illustrative examples show that the linear probability estimator is inconsistent, yielding estimates that, for example, evolve with time since treatment. We conclude that knowledge of how the data are generated is a necessary component for causal inference.

Statistics of Police Shootings and Racial Profiling

April 3, 2019

A seminar by Gregory Ridgeway, Professor of Criminology and Statistics, University of Pennsylvania

Abstract: The police are chronically a topic of heated debate. However, most statistical analyses brought to bear on questions of police fairness rarely provide clarity on or solutions to the problems. This talk will cover statistical methods for estimating racial bias in traffic stops, identifying problematic cops, and determining which officers are most at risk for police shootings. All of these methods have been part of investigations of police departments in Oakland, Cincinnati, and New York and show that statistics has an important role in prominent crime and justice policy questions.

Statistical Intuitions and the Reproducibility Crisis in Science

February 27, 2019

A seminar by Eric Loken, Associate Professor, Univeristy of Conneticut


Science is responding well to the so-called reproducibility crisis with positive improvements in methodology and transparency. Another area for improvement is awareness of statistical issues impacting inference. We explore how some problematic intuitions about measurement, statistical power, multiple analyses, and levels of analysis can affect the interpretation of research results, perhaps leading to mistaken claims.

Quantitative Measures to Assess Community Engagement in Research

February 6, 2019

A seminar by Melody Goodman, Associate Professor of Biostatistics, New York University


 The utility of community-engaged health research has been well established. However, measurement and evaluation of community engagement in research activities (patient/stakeholder perceptions of the benefit of collaborations that indicate how engaged the patient/stakeholder feels) has been limited. The level of community engagement across studies can vary greatly from minimal engagement to fully collaborative partnerships. Methods for measuring the level of community engagement in research are still emerging in the field due to the methodological gap in the assessment of stakeholder engagement, likely due to the lack of existing measures. There is a need to rigorously evaluate the impact of community/stakeholder engagement on the development, implementation and outcomes of research studies, which requires the development, validation, and implementation of tools that can be used to assess stakeholder engagement.

We use community-engaged research approaches and mixed-methods (qualitative/quantitative) study design to validate a measure to assess the level of community engagement in research studies from the stakeholder perspective. As part of the measurement validation process, we are conducting a series of web-based surveys of community members/community health stakeholders who have participated in previous community-engaged research studies. The surveys examine construct validity and internal consistency of the measure. We examined content validity through a five round modified Delphi process to reach consensus among experts and construct validity is assessed through participant surveys.

Research that develops standardized, reliable, and accurate measures to assess community engagement is essential to understanding the impact of community engagement on the scientific process and scientific discovery. Implementation of gold standard quantitative measures to assess community engagement in research would make a major contribution to community-engaged science. These measures are necessary to assess associations between community engagement and research outcomes.


Decision-driven sensitivity analyses via Bayesian optimization

A seminar by Russell Steele, Associate Professor, McGill

December 5, 2018


Every statistical analysis requires at least some subjective or untestable assumptions. For example, in Bayesian modelling, the analysis requires specification of hyperparameters for prior distributions which are either intended to reflect subjective beliefs about the model or to reflect relative ignorance about the model under a certain notion of ignorance. Similarly, causal models require assumptions about parameters related to unmeasured confounding. Violations of these untestable or subjective assumptions can invalidate the conclusions of analyses or lead to conclusions that only hold for a narrow range of choices for those assumptions. Currently, researchers compute several estimates based on either multiple “reasonable” values or a wide range of “possible” values for these inestimable parameters. Even when the dimension of the inestimable parameter space is relatively small, the sensitivity analyses generally are not systematically conducted and may either waste valuable computational time on choices that lead to roughly the same inference or will miss examining values of those parameters that would change the conclusions of the analysis.

In this talk, I will propose the use of Bayesian optimization approaches for decision-driven sensitivity analyses. We assume that a decision will be made as a function of the model estimates or predictions from particular model which relies on inestimable parameters. We use a Bayesian optimization approach to identify partitions of the space of inestimable parameter values where the decision based on the observed data and assumed parameter values change, rather to rely on non-systematically chosen values for the sensitivity analysis. We will illustrate our proposed approach on a hierarchical Bayesian meta-analysis example from the literature.

The work that will be presented was done in collaboration with Louis Arsenault-Mahjoubi, an undergraduate mathematics and statistics student at McGill University.

Omitted and included variable bias in tests for disparate impact

A seminar by Ravi Shroff, Assistant Professor of Applied Statistics, New York University 

November 14, 2018


Policymakers often seek to gauge discrimination against groups defined by race, gender, and other protected attributes. A common strategy is to estimate disparities after controlling for observed covariates in a regression model. However, not all relevant factors may be available to researchers, leading to omitted variable bias. Conversely, controlling for all available factors may also skew results, leading to so-called "included variable bias". We introduce a simple strategy, which we call risk-adjusted regression, that addresses both concerns in settings where decision makers have clear and measurable policy objectives. First, we use all available covariates to estimate the expected utility of possible decisions. Second, we measure disparities after controlling for these utility estimates alone, omitting other factors. Finally, we examine the sensitivity of results to unmeasured confounding. We demonstrate this method on a detailed dataset of 2.2 million police stops of pedestrians in New York City.

Structural Equation Modeling in Stata

A seminar by Chuck Huber, Director of Statistical Outreach, Stata Corp

October 31, 2018

Co-sponsored with CUNY Grad Center EPSY


This talk introduces the concepts and jargon of structural equation modeling (SEM) including path diagrams, latent variables, endogenous and exogenous variables, and goodness of fit. I demonstrate how to fit many familiar models such as linear regression, multivariate regression, logistic regression, confirmatory factor analysis, and multilevel models using -sem-. I wrap up by demonstrating how to fit structural equation models that contain both structural and measurement components. *

Adaptive Designs in Clinical Trials: An Introduction and Example

A seminar by Leslie McClure, Professor, Chair of the Department of Epidemiology and Biostatistics, and Associate Dean for Faculty Affairs, Drexel

October 24, 2018


Planning for randomized clinical trials relies on assumptions that are often incorrect, leading to inefficient designs that could spend resources unnecessarily. Recently, trialists have been advocating for implementation of adaptive designs, which allow researchers to modify some aspect of their trial part-way through the study based on accumulating data. In this talk, I will introduce the concept of adaptive designs and describe several different adaptations that can be made in clinical trials. I will then describe a real-life example of a sample size re-estimation from the Secondary Prevention of Small Subcortical Strokes (SPS3) study, describe the statistical impact of implementing this design change, and describe the effect of the adaptation on the practical aspects of the study.

Disrupting Education? Experimental Evidence on Technology-Aided Instruction in India

A seminar by Alejandro Ganimian, Assistant Professor of Applied Psychology and Economics, New York University 

May 2, 2018


We present experimental evidence on the impact of a personalized technology-aided after-school instruction program on learning outcomes. Our setting is middle-school grades in urban India, where a lottery provided winning students with a voucher to cover program costs. We find that lottery winners scored 0.36σ higher in math and 0.22σ higher in Hindi relative to lottery losers after just 4.5-months of access to the program. IV estimates suggest that attending the program for 90 days would increase math and Hindi test scores by 0.59σ and 0.36σ respectively. We find similar absolute test score gains for all students, but the relative gain was much greater for academically-weaker students because their rate of learning in the control group was close to zero. We show that the program was able to effectively cater to the very wide variation in student learning levels within a single grade by precisely targeting instruction to the level of student preparation. The program was cost effective, both in terms of productivity per dollar and unit of time. Our results suggest that well-designed technology-aided instruction programs can sharply improve productivity in delivering education.

BART for Causal Inference

A seminar by Jennifer Hill, Professor of Applied Statistics and Co-Director of PRIISM, New York University

April 28, 2018


There has been increasing interest in the past decade in use of machine learning tools in causal inference to help reduce reliance on parametric assumptions and allow for more accurate estimation of heterogeneous effects. This talk reviews the work in this area that capitalizes on Bayesian Additive Regression Trees, an algorithm that embeds a tree-based machine learning technique within a Bayesian framework to allow for flexible estimation and valid assessments of uncertainty. It will further describe extensions of the original work to address common issues in causal inference: lack of common support, violations of the ignorability assumption, and generalizability of results to broader populations. It will also describe existing R packages for traditional BART implementation as well as debut a new R package for causal inference using BART, bartCause.

Simulating a Marginal Structural Model

A seminar by Keith Goldfeld, Associate Professor, New York Univerity Langone Health

February 28, 2018


In so many ways, simulation is an extremely useful tool to learn, teach, and understand the theory and practice of statistics. A series of examples (interspersed with minimal theory) will hopefully illuminate the underbelly of confounding, colliding, and marginal structural models. Drawing on the potential outcomes framework, the examples will use the R simstudy package, a tool that is designed to make data simulation as painless as possible.

Graphs as Poetry

A seminar by Howard Wainer, Research Scientist,  National Board of Medical Examiners

February 7, 2018


Visual displays of empirical information are too often thought to be just compact summaries that, at their best, can clarify a muddled situation. This is partially true, as far as it goes, but it omits the magic. We have long known that data visualization is an alchemist that can make good scientists great and transform great scientists into giants. In this talk we will see that sometimes, albeit too rarely, the combination of critical questions addressed by important data and illuminated by evocative displays can achieve a transcendent, and often wholly unexpected, result. At their best, visualizations can communicate emotions and feelings in addition to cold, hard facts.


Unraveling and Anticipating Heterogeneity: Single Subject Designs & Individualized Treatment Protocols

Leading experts in SSD, Causal & Bayesian Inference

November 3, 2017


This was a 1-day symposium on the topic of Single Subject Design (SSD) and methods for their analysis. It brought together leading researchers in the areas of multilevel models, Bayesian modeling, and meta-analysis to discuss best practices with leading practitioners who utilize SSDs as well as how to use results from single case designs to better inform larger scale clinical trials in this field. These practitioners were drawn from the fields of special education and rehabilitation science. In particular, the areas of Physical Therapy, Occupational Therapy and Communication Science Disorders.

Panel discussions were convened in which methodologists are paired with practitioners to discuss each phase of the science, from exploratory data analysis (related to designs employing graphical methods), more general design aspects, and analysis.  Particular emphasis was given to research supporting Individualized Treatment Protocols. In addition, there will be individual presentations representing new methodology for these designs, and reports from practitioners on their ongoing clinical trials to spur additional discussion of appropriate methodology.

Introduction to Bayesian Analysis Using Stata

A seminar by Chuck Huber, Director of Statistical Outreach, Stata Corp

October 18, 2017


Bayesian analysis has become a popular tool for many statistical applications. Yet many data analysts have little training in the theory of Bayesian analysis and software used to fit Bayesian models. This talk provided an intuitive introduction to the concepts of Bayesian analysis and demonstrated how to fit Bayesian models using Stata. Specific topics included the relationship between likelihood functions, prior, and posterior distributions, Markov Chain Monte Carlo (MCMC) using the Metropolis-Hastings algorithm, and how to use Stata's Bayes prefix to fit Bayesian models.

Embedding the Analysis of Observational Data for Causal Effects within a Hypothetical Randomized Experiment

Don Rubin, Professor of Statistics, Harvard

September 14, 2017 


Consider a statistical analysis that draws causal inferences using an observational data set, inferences that are presented as being valid in the standard frequentist senses; that is an analysis that produces (a) point estimates, which are presented as being approximately unbiased for their estimands, (b) p-values, which are presented as being valid in the sense of rejecting true null hypotheses at the nominal level or less often, and/or (c) confidence intervals, which are presented as having at least their nominal coverage for their estimands. For the hypothetical validity of these statements (that is, if certain explicit assumptions were true, then the validity of the statements would follow), the analysis must embed the observational study in a hypothetical randomized experiment that created the observed data, or a subset of that data set. This effort is a multistage effort with thought-provoking tasks, especially in the first stage, which is purely conceptual. Other stages may often rely on modern computing to implement efficiently, but the first stage demands careful scientific argumentation to make the embedding plausible to thoughtful readers of the proffered statistical analysis. Otherwise, the resulting analysis is vulnerable to criticism for being simply a presentation of scientifically meaningless arithmetic calculations. In current practice, this perspective is rarely implemented with any rigor, for example, completely eschewing the first stage. Instead, often analyses appear to be conducted using computer programs run with limited consideration of the assumptions of the methods being used, producing tables of numbers with recondite interpretations, and presented using jargon, which may be familiar but also may be scientifically impenetrable. Somewhat paradoxically, the conceptual tasks, which are usually omitted in publications, often would be the most interesting to consumers of the analyses. These points will be illustrated using the analysis of an observational data set addressing the causal effects of parental smoking on their children’s lung function. This presentation may appear provocative, but it is intended to encourage applied researchers, especially those working on problems with policy implications, to focus on important conceptual issues rather than on minor technical ones.

Multilevel modeling of single-subject experimental data: Handling data and design complexities

Mariola Moeyaert, Associate Professor, University at Albany

May 10, 2017


There has been a substantial increase in the use of single-subject experimental designs (SSEDs) over the last decade of research to provide detailed examination of the effect of interventions. Whereas group comparison designs focus on the average treatment effect at one point of time, SSEDs allow researchers to investigate at the individual level the size and evolution of intervention effects. In addition, SSED studies may be more feasible than group experimental studies due to logistical and resource constraints, or due to studying a low incidence or highly fragmented population.

To enhance generalizability, researchers replicate across subjects and use meta-analysis to pool effects from individuals. Our research group was one of the first to propose, develop and promote the use of multilevel models to synthesize data across subjects, allowing for estimation of the mean treatment effect, variation in effects over subjects and studies, and subject and study characteristic moderator effects (Moeyaert, Ugille, Ferron, Beretvas, & Van den Noortgate, 2013a, 2013b, 2014). Moreover, multilevel models can handle unstandardized and standardized raw data or effect sizes, linear and nonlinear time trends, treatment effects on time trends, autocorrelation and other complex covariance structures at each level.

This presentation considers multiple complexities in the context of hierarchical linear modeling of SSED studies including the estimation of the variance components, which tend to be biased and imprecisely estimated. Results of a recent simulation study using Bayesian estimation techniques to deal with this issue will be discussed (Moeyaert, Rindskopf, Onghena & Van den Noortgate, 2017).

Collaborative targeted learning using regression shrinkage

Mireille Schnitzer, Associate Professor, University of Montreal

May 3, 2017


Causal inference practitioners are routinely presented with the challenge of wanting to adjust for large numbers of covariates despite limited sample sizes. Collaborative Targeted Maximum Likelihood Estimation (CTMLE) is a general framework for constructing doubly robust semiparametric causal estimators that data-adaptively reduce model complexity in the propensity score in order to optimize a preferred loss function. This stepwise complexity reduction is based on a loss function placed on a strategically updated model for the outcome variable, assessed through cross-validation. New work involves integrating penalized regression methods into a stepwise CTMLE procedure that may allow for a more flexible type of model selection than existing variable selection techniques. Two new algorithms are presented and assessed through simulation. The methods are then used in a pharmacoepidemiology example of the evaluation of the safety of asthma mediation during pregnancy.

Remarks on the Mean-Difference Transformation and Bland-Altman Plot

Speaker: Jay Verkulien, Associate Professor, CUNY

April 26, 2017


Tukey's mean-difference transformation and the Bland-Altman plot (e.g., Bland & Altman, 1986) are widely used in method comparison studies throughout the sciences, particularly in the health sciences. While intuitively appealing, easy to compute, and giving some notable advantages over simply reporting coefficients such as the concordance coefficient or intraclass correlations, they exhibit unusual behavior. In particular, one often observes systematic trends in the BA plot and they are very subject to outliers, among other issues. The purpose of this talk is to propose and study a generative model that lays out the logic of the mean-difference transformation and hence the BA plot, indicating when and why systematic trend may occur. The model provides insight into when users should expect problems with the BA plot and suggests that it should not be applied in circumstances when a more informative design such as instrumental variables is necessary. I also suggest some improvements to the graphics based on semi-parametric regression methods and discuss how putting the BA plot in a Bayesian framework could be helpful.

Bayesian Causal Forests: Heterogeneous Treatment Effects from Observational Data

Carlos Carvalho, Professor, UT Austin

April 19, 2017


This paper develops a semi-parametric Bayesian regression model for estimating heterogeneous treatment effects from observational data. Standard nonlinear regression models, which may work quite well for prediction, can yield badly biased estimates of treatment effects when fit to data with strong confounding. Our Bayesian causal forests model avoids this problem by directly incorporating an estimate of the propensity function in the specification of the response model, implicitly inducing a covariate-dependent prior on the regression function. This new parametrization also allows treatment heterogeneity to be regularized separately from the prognostic effect of control variables, making it possible to informatively “shrink to homogeneity”, in contrast to existing Bayesian non- and semi-parametric approaches. Joint work with P. Richard Hahn and Jared Murray.

Log-Linear Bayesian Additive Regression Trees

Jared Murray, Assistant Professor, Carnegie Mellon University

April 5, 2017


Bayesian additive regression trees (BART) have been applied to nonparametric mean regression and binary classification problems in a range of applied areas. To date BART models have been limited to models for Gaussian "data", either observed or latent, and with good reason - the Bayesian backfitting MCMC algorithm for BART is remarkably efficient in Gaussian models. But while many useful models are naturally cast in terms of observed or latent Gaussian variables, many others are not. In this talk I extend BART to a range of log-linear models including multinomial logistic regression and count regression models with zero-inflation and overdispersion. Extending to these non-Gaussian settings requires a novel prior distribution over BART's parameters. Like the original BART prior, this new prior distribution is carefully constructed and calibrated to be flexible while avoiding overfitting. With this new prior distribution and some data augmentation techniques I am able to implement an efficient generalization of the Bayesian backfitting algorithm for MCMC in log-linear (and other) BART models. I demonstrate the utility of these new methods with several examples and applications.

Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman's Critique

Winston Lin, Lecturer and Research Scholar, Yale University

March 23, 2017


This talk will be mostly based on my 2013 Annals of Applied Statistics paper, which reexamines David Freedman's critique of ordinary least squares regression adjustment in randomized experiments. Random assignment is intended to create comparable treatment and control groups, reducing the need for dubious statistical models. Nevertheless, researchers often use linear regression models to adjust for random treatment-control differences in baseline characteristics. The classic rationale, which assumes the regression model is true, is that adjustment tends to reduce the variance of the estimated treatment effect. In contrast, Freedman used a randomization-based inference framework to argue that under model misspecification, OLS adjustment can lead to increased asymptotic variance, invalid estimates of variance, and small-sample bias. My paper shows that in sufficiently large samples, those problems are either minor or easily fixed. Neglected parallels between regression adjustment in experiments and regression estimators in survey sampling turn out to be very helpful for intuition.

Finding common support through largest connected components and predicting counterfactuals for causal inference

Sharif Mahmood, Kansus State University

March 22, 2017


Finding treatment effects in observational studies is complicated by the need to control for confounders. Common approaches for controlling include using prognostically important covariates to form groups of similar units containing both treatment and control units (e.g. statistical matching) and/or modeling responses through interpolation. Hence, treatment effects are only reliably estimated for a subpopulation under which a common support assumption holds--one in which treatment and control covariate spaces overlap. Given a distance metric measuring dissimilarity between units, we use techniques in graph theory to find common support. We construct an adjacency graph where edges are drawn between similar treated and control units. We then determine regions of common support by finding the largest connected components (LCC) of this graph. We show that LCC improves on existing methods by efficiently constructing regions that preserve clustering in the data while ensuring interpretability of the region through the distance metric. We apply our LCC method on a study of the effectiveness of right heart catheterization (RHC). To further control for confounders, we implement six matching algorithms for analyses. We find that RHC is a risky procedure for the patients and that clinical outcomes are significantly worse for patients that undergo RHC.

Simple Rules for Decision-Making

Ravi Shroff, NYU CUSP

March 9, 2017


Doctors, judges, and other experts typically rely on experience and intuition rather than statistical models when making decisions, often at the cost of significantly worse outcomes. I'll present a simple and intuitive strategy for creating statistically informed decision rules that are easy to apply, easy to understand, and perform on par with state-of-the art machine learning methods in many settings. I'll illustrate these rules with two applications to the criminal justice system: investigatory stop decisions and pretrial detention decisions.

Scaling Latent Quantities from Text: From Black-and-White to Shades of Gray

March 1, 2017

Patrick Perry, NYU Stern


Probabilistic methods for classifying texts according to the likelihood of class membership form a rich tradition in machine learning and natural language processing. For many important problems, however, class prediction is either uninteresting, because it is known, or uninformative, because it yields poor information about a latent quantity of interest. In scaling political speeches, for instance, party membership is both known and uninformative, in the sense that in systems with party discipline, what is interesting is a latent trait in the speech, such as ideological position, often at odds with party membership. Predictive tools common in machine learning, where the goal is to predict a black-or-white class--such as spam, sentiment, or authorship--are not directly designed for the measurement problem of estimating latent quantities, especially those that are not inherently unobservable through direct means.

In this talk, I present a method for modeling texts not as black or white representations, but rather as explicit mixtures of perspectives. The focus shifts from predicting an unobserved discrete label to estimating the mixture proportions expressed in a text. In this "shades of gray" worldview, we are able to estimate not only the graynesses of texts but also those of the words making up a text, using likelihood-based inference. While this method is novel in its application to text, it be can situated in and compared to known approaches such as dictionary methods, topic models, and the wordscores scaling method. This new method has a fundamental linguistic and statistical foundation, and exploring this foundation exposes implicit assumptions found in previous approaches. I explore the robustness properties of the method and discuss issues of uncertainty quantification. My motivating application throughout the talk will be scaling legislative debate speeches.


Large, Sparse Optimal Matching in an Observational Study of Surgical Outcomes

Abstract: How do health outcomes for newly-trained surgeons' patients compare with those for patients of experienced surgeons? To answer this question using data from Medicare, we introduce a new form of matching that pairs patients of 1252 new surgeons to patients of experienced surgeons, exactly balancing 176 surgical procedures and closely balancing 2.9 million finer patient categories. The new matching algorithm (which uses penalized network flows) exploits a sparse network to quickly optimize a match two orders of magnitude larger than usual in statistical matching, and allowing for extensive use of a new form of marginal balance constraint.

Generalized Ridge Regression Using an Iterative Solution

Speaker: Kathryn, postdoc at Columbia University's Earth Institute. Her PhD is in applied economics with interests in development economics, and applied statistics.


An iterative method is introduced for solving noisy, ill-conditioned inverse problems, where the standard ridge regression is just the first iteration of the iterative method to be presented. In addition to the regularization parameter, lambda, we introduce an iteration parameter k, which generalizes the ridge regression. The derived noise damping filter is a generalization of the standard ridge regression filter (also known as Tikhonov). Application of the generalized solution performs better than the pseudo-inverse (the default solution to OLS in most statistical packages), and better than standard ridge regression (L-2 regularization), when the covariate matrix or design matrix is ill-conditioned, or highly collinear. A few examples are presented using both simulated and real data.

Latent Space Models for Affiliation Networks

Catherine (“Kate”) Calder, professor of statistics, The Ohio State University


An affiliation network is a particular type of two-mode social network that consists of a set of `actors' and a set of `events' where ties indicate an actor's participation in an event. Methods for the analysis of affiliation networks are particularly useful for studying patterns of segregation and integration in social structures characterized by both people and potentially shared activities (e.g., parties, corporate board memberships, church attendance, etc.) One way to analyze affiliation networks is to consider one-mode network matrices that are derived from an affiliation network, but this approach may lead to the loss of important structural features of the data. The most comprehensive approach is to study both actors and events simultaneously. Statistical methods for studying affiliation networks, however, are less well developed than methods for studying one-mode, or actor-actor, networks. In this talk, I will describe a bilinear generalized mixed-effects model, which contains interacting random effects representing common activity pattern profiles and shared patterns of participation in these profiles. I will demonstrate how the proposed model is able to capture forth-order dependence, a common feature of affiliation networks, and describe a Markov chain Monte Carlo algorithm for Bayesian inference. I then will use the latent space interpretation of model components to explore patterns in extracurricular activity membership of students in a racially-diverse high school in a Midwestern metropolitan area. Using techniques from spatial point pattern analysis, I will show how our model can provide insight into patterns of racial segregation in the voluntary extracurricular activity participation profiles of adolescents. This talk is based on joint work with Yanan Jia and Chris Browning.

Why so many research hypotheses are mostly false and how to test

Paul De Boeck, professor of quantitative psychology, The Ohio State University


From a recent Science article with a large number of replications of psychological studies the base rate of the null hypothesis of no effect can be estimated. It turns out to be extremely high, which implies that many research hypotheses are false. As I will explain they are perhaps not fully false but mostly false. A possible explanation for why unlikely hypotheses tend to be selected for empirical studies can be found in expected utility theory. It can be shown that for low to moderately high power rates, the expected utility of studies increases with the probability of the null hypothesis being true. A high probability of the null hypothesis being true can be understood as reflecting a contextual variation of effects that are in general not much different from zero. Increasing the power of studies has become a popular remedy to counter the replicability crisis but this strategy is highly misleading if effects vary. Meta-analysis is considered another remedy but it is a suboptimal and labor-intensive approach and it is only long-term method. Two more feasible methods will be discussed to deal with contextual variation.

Be the Data and More: Using interactive, analytic methods to enhance learning from data for students

Leanna House, Associate Professor of Statistics, Virginia Tech

The Ohio State University


Datasets, no matter how big, are just tables of numbers without individuals to learn from the data, i.e., discover, process, assess, and communicate information in the data. Data visualizations are often used to present data to individuals, but most are created independently of human learning processes and lack transparency. To bridge the gap between people thinking critically about data and the utility of visualizations, we developed Bayesian Visual Analytics (BaVA) and its deterministic form, Visual to Parametric Interaction (V2PI). BaVA and V2PI transform static images of data to dynamic versions that respond to expert feedback. When applied iteratively, experts may explore data progressively in a sequence that parallels their personal sense-making processes. BaVA and V2PI have shown useful in both industry settings and the classroom. For example, we merged V2PI with motion detection software to create Be the Data. In Be the Data students physically move in a space to communicate their expert feedback about data projected overhead. The idea is that participants have an opportuntity to explore analytical relationships between data points by exploring relationships between themselves. This talk will focus on presenting the BaVA paradigm and its education applications.

Bayesian Inference and Stan Tutorial

Vincent Dorie, Postdoctoral Researcher, NYU PRIISM 


This two hour session is focused on getting started with Stan and how to use it in your research. Stan is an open-source Bayesian probabilistic programming environment that takes a lot of the work out of model fitting so that researchers can focus on model building and interpretation. List of topics will include: overview of Bayesian statistics, overview of Stan and MCMC, writing models in Stan, and a tutorial session where participants can write a model on their own or develop models that they have been working on independently. Stan has interfaces to numerous programming languages, but the talk will focus on R.

Basing Causal Inferences about Policy Impacts on Non-Representative Samples of Sites – Risks, Consequences, and Fixes

Stephen Bell, Abt Associates Fellow 


Randomized impact evaluations of social and educational interventions—while constituting the “gold standard” of internal validity due to the lack of selection bias between treated and untreated cases—usually lack external validity. Due to cost and convenience, or local resistance, they are almost always conducted in a set of sites that are not a probability sample of the desired inference population— the nation as a whole for social programs or a given state or school district for educational innovations. We use statistical theory and data from the Reading First evaluation to examine the risks and consequences for social experiments of non-representative site selection, asking when and to what degree policy decisions are led astray by tarnished “gold standard” evidence. We also explore possible ex ante design-based solutions to this problem and the performance of ex post methods in the literature for overcoming non-representative site selection through analytic adjustments after the fact.

Mediation: From Intuition to Data Analysis

Ilya Shpitser, Assistant Professor in the Department of Computer Science, Johns Hopkins University. 


Modern causal inference links the "top-down" representation of causal intuitions and "bottom-up" data analysis with the aim of choosing policy. Two innovations that proved key for this synthesis were a formalization of Hume's counterfactual account of causation using potential outcomes (due to Jerzy Neyman), and viewing cause effect relationships via directed acyclic graphs (due to Sewall Wright). I will briefly review how a synthesis of these two ideas was instrumental in formally representing the notion of "causal effect" as a parameter in the language of potential outcomes, and discuss a complete identification theory linking these types of causal parameters and observed data, as well as approaches to estimation of the resulting statistical parameters. I will then describe, in more detail, how my collaborators and I are applying the same approach to mediation, the study of effects along particular causal pathways. I consider mediated effects at their most general: I allow arbitrary models, the presence of hidden variables, multiple outcomes, longitudinal treatments, and effects along arbitrary sets of causal pathways. As was the case with causal effects, there are three distinct but related problems to solve -- a representation problem (what sort of potential outcome does an effect along a set of pathways correspond to), an identification problem (can a causal parameter of interest be expressed as a functional of observed data), and an estimation problem (what are good ways of estimating the resulting statistical parameter). I report a complete solution to the first two problems, and progress on the third. In particular, my collaborators and I show that for some parameters that arise in mediation settings, triply robust estimators exist, which rely on an outcome model, a mediator model, and a treatment model, and which remain consistent if any two of these three models are correct. Some of the reported results are a joint work with Eric Tchetgen Tchetgen, Caleb Miles, Phyllis Kanki, and Seema Meloni.

Bayes vs Maximum Likelihood: The case of bivariate probit models

Adriana Crespo-Tenorio, PhD is on a mission to connect people’s online behavior to their offline lives. 


Bivariate probit models are a common choice for scholars wishing to estimate causal effects in instrumental variable models where both the treatment and outcome are binary. However, standard maximum likelihood approaches for estimating bivariate probit models are problematic. Numerical routines in common software suites frequently generate inaccurate parameter estimates, and even estimated correctly, maximum likelihood routines provide no straightforward way to produce estimates of uncertainty for causal quantities of interest. In this article, we show that adopting a Bayesian approach provides more accurate estimates of key parameters and facilitates the direct calculation of causal quantities along with their attendant measures of uncertainty.

Scalable Bayesian Inference with Hamiltonian Monte Carlo

Michael Betancourt, Postdoctoral Research Associate, Warwick


The modern preponderance of data has fueled a revolution in data science, but the complex nature of those data also limits naive inferences. To truly take advantage of these data we also need tools for building and fitting statistical models that capture those complexities. In this talk I’ll discuss some of the practical challenges of building and fitting such models in the context of real analyses. I will particularly emphasize the importance of Hamiltonian Monte Carlo and Stan, state-of-the-art computational tools that allow us to tackle these contemporary data without sacrificing the fidelity of our inferences.

Improving Human Learning with Unified Machine Learning Frameworks: Towards Faster, Better, and Less Expensive Education

José González-Brenes, Pearson


Seminal results from cognitive science suggest that personalized education is effective to improve learners’ outcomes. However, the effort for instructors to create content for each of their students can sometimes be prohibitive. Recent progress in machine learning has enabled technology for teachers to deliver personalized education. Unfortunately, the statistical models used by these systems are often tailored for ad-hoc domains and do not generalize across applications. In this talk, I will discuss my work towards the goal of a unified statistical framework of human learning. This line of work is more flexible, more efficient, and more accurate than previous technology. Moreover, it generalizes previous popular models from the literature. Additionally, I will outline recent progress on novel methodology to evaluate statistical models for education with a learner-centric perspective. My findings suggest that prior work often uses evaluation methods that may misrepresent the educational value of educational systems. My work is a promising alternative that improves the evaluation of machine learning models in education.

Probabilistic Cause-of-death Assignment using Verbal Autopsies

Tyler McCormick, University of Washington, Seattle


In regions without complete-coverage civil registration and vital statistics systems there is uncertainty about even the most basic demographic indicators. In such areas the majority of deaths occur outside hospitals and are not recorded. Worldwide, fewer than one-third of deaths are assigned a cause, with the least information available from the most impoverished nations. In populations like this, verbal autopsy (VA) is a commonly used tool to assess cause of death and estimate cause-specific mortality rates and the distribution of deaths by cause. VA uses an interview with caregivers of the decedent to elicit data describing the signs and symptoms leading up to the death. This paper develops a new statistical tool known as InSilicoVA to classify cause of death using information acquired through VA. InSilicoVA shares uncertainty between cause of death assignments for specific individuals and the distribution of deaths by cause across the population. Using side-by-side comparisons with both observed and simulated data, we demonstrate that InSilicoVA has distinct advantages compared to currently available methods.

Topic-adjusted visibility metric for scientific articles

Tian Zheng, Columbia University. 


Measuring the impact of scientific articles is important for evaluating the research output of individual scientists, academic institutions and journals. While citations are raw data for constructing impact measures, there exist biases and potential issues if factors affecting citation patterns are not properly accounted for. In this talk, I present a new model that aims to address the problem of field variation and introduce an article level metric useful for evaluating individual articles’ topic-adjusted visibility. This measure derives from joint probabilistic modeling of the content in the articles and the citations amongst them using latent Dirichlet allocation (LDA) and the mixed membership stochastic blockmodel (MMSB). This proposed model provides a visibility metric for individual articles adjusted for field variation in citation rates, a structural understanding of citation behavior in different fields, and article recommendations which take into account article visibility and citation patterns. For this work, we also developed an efficient algorithm for model fitting using variational methods. To scale up to large networks, we developed an online variant using stochastic gradient methods and case-control likelihood approximation. Results from an application of our methods to the benchmark KDD Cup 2003 dataset with approximately 30,000 high energy physics papers will also be presented.

Small sample adjustments to F-tests for cluster robust standard errors

Elizabeth Tipton, Teachers College, Columbia University


Data analysts commonly ‘cluster’ their standard errors to account for correlations arising from the sampling of aggregate units (e.g., states), each containing multiple observations. When the number of clusters is small to moderate, however, this approach can lead to biased standard errors and hypothesis tests with inflated Type I error. One solution that is receiving increased attention is the use of the bias-reduced linearization (BRL). In this paper, we extend the BRL approach to include an F-test that can be implemented in a wide range of applications. A simulation study reveals that that this test has Type I error close to nominal even with a very small number of clusters, and importantly, that it outperforms the usual estimator even when the number of clusters is moderate.

The Controversies over Null Hypothesis Testing and Replication

Barry Cohen, New York University


The arguments against null hypothesis significance testing (NHST) have been greatly exaggerated, and do not apply equally to all types of psychological research. I will discuss the conditions under which NHST serves several useful purposes, which may outweigh its undeniable drawbacks. In brief, NHST works best when the null hypothesis is rarely true, the direction of the results is more important than the magnitude, extremely large samples are not used, and tiny effects have no serious consequences. Priming studies in social psychology will be used as an example of this type of research. Part of the controversy over failures to replicate notable psychological studies is related to misunderstandings and misuses of NHST. I will conclude by discussing the resistance to banning NHST and its p values in favor of reports of effects sizes and/or confidence intervals, and describing some of the possible solutions to the drawbacks of NHST.

The Curious Case of the Instrumental Variable Estimator for the Complier Average Causal Effect

Russell Steele, McGill University


In randomized clinical trials, subjects often do not comply with their randomized treatment arm. Although one can still unbiasedly estimate the causal effect of being assigned to treatment using the common Intention-to-Treat (ITT) estimator, there is now potential confounding of the causal effect of actually *receiving* treatment. Basic alternative estimators such as the per protocol or as treated estimators have been used, but are generally biased for estimating the causal effect of interest. Balke and Pearl (1997) and Angrist, et al. (1996) independently proposed an instrumental variable (IV) estimator that would estimate the causal effect (the Complier Average Causal Effect — CACE) of receiving treatment in a subpopulation of people who would comply with treatment assignment (i.e. the compliers). In this talk, I will first review the CACE and the IV estimator. I will then dissect the instrumental variable estimator in order to compare it to the per protocol and as treated estimators. I will show that the basic IV estimator and its confidence interval can be computed from basic summary statistics that should be reported in any randomized trial. My formulation of the IV estimator will also allow for simple sensitivity analyses that can be done using a basic Excel spreadsheet. I will then describe future interesting directions for compliance research that I am currently working on. Most of this work appears in a recently published article at the American Journal of Epidemiology and is co-authored by Ian Shrier, Jay Kaufmann and Robert Platt.

Covariate Selection with Observational Data: Simulation Results and Discussion

Bryan Keller, Teachers College, Columbia University


In an effort to protect against omitted variable bias, statisticians have traditionally favored an inclusive approach to covariate selection for causal inference, so long as covariates were measured before any treatment was administered. There are, however, three classes of variables, which, if conditioned upon, are known to degrade either the bias or efficiency of an estimate of a causal effect: non-informative variables (NVs), instrumental variables (IVs), and collider variables. The decision about whether to control for a potential collider variable must be based on theory about how the data were generated. In contrast, one need only establish a lack of association with the outcome variable in order to identify an NV or an IV. We investigate three empirical methods – forward stepwise selection, the lasso, and recursive feature elimination with random forests – for detection of NVs and IVs through simulation studies in which we judge their efficacy by (a) sensitivity and specificity in identifying true or near NVs and IVs and (b) the overall effect on bias and mean-squared error of the causal effect estimator, relative to inclusion of all pretreatment variables. Results and implications are discussed.

The End of Intelligence? What might Big Data, Learning Analytics and the Information Age Mean for how we Measure Education

Charles Lang, Postdoctoral Associate, ew York University  


For over a century educational measurement has developed analytical tools designed to maximize the inferential power of limited samples: a biannual state test, a regular accreditation exam, a once in a lifetime SAT. But can this methodology adapt to a world in which previous limitations on data collection have been dramatically reduced? A world with a greater variety of data formats, representing a larger number of conditions, on a finer timescale, with a larger sample of students. Starting from a methodological basis, Charles will discuss the implications that changes in data collection may have on how education is measured and the impact that this might have on the disciplines, institutions, and practitioners that utilize educational measurement.

Studying Change with Difference Scores versus ANCOVA: Issues, Perspectives and Advances

Pat Shrout, New York University 


Nearly 50 years ago, Lord (1967) described a so-called paradox in statistical analysis whereby two reasonable analyses of pre-treatment/post-treatment data lead to different results. I revisit the issues, review some of the historical discussion, and present an analysis of the alternate analyses with a causal model that distinguishes treatment effects from trait, state, and error variation. In addition to comparing numerical results from difference score and ANCOVA adjustment for pre-treatment group differences, I consider results based on propensity score adjustment.


Classroom Context and Observed Teacher Performance: What Do Teacher Observation Scores Really Measure?

Matthew Steinberg

March 25, 2015


As federal, state, and local policy reforms mandate the implementation of more rigorous teacher evaluation systems, measures of teacher performance are increasingly being used to support improvements in teacher effectiveness and inform decisions related to teacher retention. Observations of teachers’ classroom instruction take a central role in these systems, accounting for the majority of a teacher’s summative evaluation rating upon which accountability decisions are based. This study explores the extent to which classroom context influences measures of teacher performance based on classroom observation scores. Using data from the Measures of Effective Teaching (MET) study, we find that the context in which teachers work—most notably, the incoming academic performance of their students—plays a critical role in determining teachers’ measured performance, even after accounting for teachers’ endowed instructional abilities. The influence of student achievement on measured teacher performance is particularly salient for English Language Arts (ELA) instruction; for aspects of classroom practice that depend on a teacher’s interactions with her students; and for subject-specific teachers compared with their generalist counterparts. Further, evidence suggests that the intentional sorting of teachers to students has a significant influence on measured ELA (though not math) instruction. Implications for high-stakes teacher-accountability policies are discussed.


Mining NYPD’s 911 Call Data: Resource Allocation, Crimes, and Civic Engagement

Theo Damoulas, NYU CUSP

December 10, 2014


NYPD’s 911 calls capture some of the most interesting urban activity in New York City such as serious crimes, family disputes, bombing attacks, natural disasters, and of course prank phone calls.In this talk I will describe research in progress conducted at the Center for Urban Science and Progress at NYU, in collaboration with NYPD. The work spans multiple areas of applied statistical interest such as sampling bias, time series analysis, and spatial statistics. The domain is very rich and offers many opportunities for research in core statistical and computational areas such as causal inference, search and pattern matching algorithms, evidence and data integration, ensemble models, and uncertainty quantification. At the same time there is great potential for positively impacting the quality of life of New Yorkers, and the day-to-day operation of NYPD.

Lixing Zhu PRIISM Seminar

Lixing Zhu, Department of Mathematics/Hong Kong Baptist University

October 29, 2014


For a factor model, the involved covariance matrix often has no row sparse structure because the common factors may lead some variables to strongly associate with many others. Under the ultra-high dimensional paradigm, this feature causes existing methods for sparse covariance matrices in the literature to be not directly applicable. In this paper, for a general covariance matrix, a novel approach to detect these variables that are called the pivotal variables is suggested. Then, two-stage estimation procedures are proposed to handle ultra-high dimensionality in a factor model. In these procedures, pivotal variable detection is performed as a screening step and then existing approaches are applied to refine the working model. The estimation efficiency can be promoted under weaker assumptions on the model structure. Simulations are conducted to examine the performance of the new method.

Estimating Post-Treatment Effect Modification With Generalized Structural Mean Models

Luke Keele

February 24, 2014


In randomized controlled trials, the evaluation of an overall treatment effect is often followed by effect modification or subgroup analyses, where the possibility of a different magnitude or direction of effect for varying values of a covariate is explored. While studies of effect modification are typically restricted to pretreatment covariates, longitudinal experimental designs permit the examination of treatment effect modification by intermediate outcomes, where intermediates are measured after treatment but before the final outcome. We present a generalized structural mean model (GSMM) for analyzing treatment effect modification by post-treatment covariates. The model can accommodate post-treatment effect modification with both full compliance and noncompliance to assigned treatment status. The methods are evaluated using a simulation study that demonstrates that our approach retains unbiased estimation of effect modification by intermediate variables which are affected by treatment and also predict outcomes. We illustrate the method using a randomized trial designed to promote re-employment through teaching skills to enhance self-esteem and inoculate job seekers against setbacks in the job search process. Our analysis provides some evidence that the intervention was much less successful among subjects that displayed higher levels of depression at intermediate post-treatment waves of the study.

Didactic Talk: Causal Mediation Analysis

Luke Keele

February 25, 2014


Causal analysis in the social sciences has largely focused on the estimation of treatment effects. Researchers often also seek to understand how a causal relationship arises. That is, they wish to know why a treatment works. In this talk, I introduce causal mediation analysis, a statistical framework for analyzing how a specific treatment changes an outcome. Using the potential outcomes framework, I outline both the counterfactual comparison implied by a causal mediation analysis and exactly what assumptions are sufficient for identifying causal mediation effects. I highlight that commonly used statistical methods for identifying causal mechanisms rely upon untestable assumptions and may be inappropriate even under those assumptions. Casual mediation analysis is illustrated via an intervention study that seeks to understand whether single-sex classrooms improve academic performance.

Research Talk: The Effect of Collapsing Categories on the Estimation of the Latent Trait

Daphna Harel, NYU PRIISM

February 26, 2014


Researchers often collapse categories of ordinal data out of convenience or in an attempt to improve model performance. Collapsing categories is quite common when fitting item response theory (IRT) when items are deemed to behave poorly. In this talk, I define the true model for the collapsed data both from a marginal and conditional perspective and develop a new paradigm for thinking about the problem of collapsing categories. I explore the issue of collapsing categories through the lens of model misspecification and explore the asymptotic behaviour of the parameter estimates from the misspecified model. I review and critique several current methods for deciding when to collapse categories and present simulation results on the effect of collapsing on the estimation of the latent trait.

Didactic Talk: An Introduction to Item Response Theory and Its Applications

Daphna Harel, NYU PRIISM

February 25, 2014


When a trait or construct cannot be measured directly, researchers often use multi-item questionnaires or tests to collect data that can provide insight about the underlying (or latent) trait. Item Response Theory (IRT) provides a class of statistical models that relate these observed responses to the latent trait allowing for inference to be made while still accounting for item-level characteristics. In this talk, I will introduce four commonly used IRT models: the Rasch model, the two-parameter model, the Partial Credit model and the Generalized Partial Credit model. My comparison will focus on the interpretation of and selection amongst these four models. One common use of IRT models is to determine whether an item functions the same for all types of people. This issue of Differential Item Functioning will be explored in the case of dichotomous items for both the Rasch model and two-parameter model. Lastly, three important summary statistics, the empirical Bayes estimator, the summed score and the weighted summed score will be presented and the use of each will be explained, specifically for the Partial Credit model and Generalized Partial Credit model.

Research Talk: Definition and estimation of causal effects for continuous exposures: theory and applications

Ivan Diaz

February 13, 2014


The definition of a causal effect typically involves counterfactual variables resulting from interventions that modify the exposure of interest deterministically. However, this approach might yield infeasible interventions in some applications. A stochastic intervention generalizes the framework to define counterfactuals in which the post-intervention exposure is stochastic rather than deterministic. In this talk I will present a new approach to causal effects based on stochastic interventions, I will focus on an application of this methodology to the definition and estimation of the causal effect of a shift of a continuous exposure. This parameter is of general interest since it generalizes the interpretation of the coefficient in a main effects regression model to a nonparametric model. I will discuss two estimators of the causal effect: an M-estimator and a targeted minimum loss based estimator (TMLE), both of them efficient in the nonparametric model. I will discuss the methods in the context of an application to the evaluation of the effect of physical activity on all-cause mortality in the elderly.


Income attraction: An online dating field experiment

David Ong, Assistant Professor of Economics at Peking University Business School

November 21, 2013


Marriage rates have been decreasing in the US contemporaneously as women’s relative wages have been increasing. Dr. Ong found the opposite pattern in China. Prior empirical studies with US marriage data indicate that women marry up (and men marry down) economically. Furthermore, if the wife earns more, less happiness and greater strife are reported, the gender gap in housework increases, and they are more likely to divorce. However, these observational studies cannot identify whether these consequences were due to men’s preference for lower income women, or women’s preference for higher income men, or to other factors. Dr Ong complements this literature by measuring income based attraction in a field experiment. He randomly assigned income levels to 360 unique artificial profiles on a major online dating website and recorded the incomes of nearly 4000 visits. He found that men of all income levels visited women’s profiles with different income levels at roughly equal rates. In contrast, women at all income levels visited men with higher income at higher rates, and surprisingly, these higher rates increased with the women’s own income. Men with the highest level of income got ten times more visits than the lowest. He discussed how the gender difference in “income attraction” might shed light on marriage and gender wage patterns, the wage premium for married men, and other stylized facts, e.g., why the gender gap in housework is higher for women who earn more than their husbands. This is the first field experimental study of gender differences in preferences for mate income.

Front-door Difference-in-Differences Estimators: The Effects of Early In-person Voting on Turnout

Adam Glynn, Harvard University

November 7, 2013 


In this talk, Dr. Glynn discussed front-door difference-in-differences estimators that utilize mechanistic information from post-treatment variables in addition to information from pre-treatment covariates. Even when the front-door criterion does not hold, these estimators allow the identification of causal effects by utilizing assumptions that are analogous to standard difference-in-differences assumptions. He also demonstrated that causal effects can be bounded by front-door and front-door difference-in-differences estimators under relaxed assumptions. He illustrated these points with an application to the effects of early in-person voting on turnout. Despite recent claims that early in-person voting had either an undetectable effect or a negative effect on turnout in 2008, he found evidence that early in-person voting had small positive effects on turnout in Florida in 2008. Moreover, he found evidence that early in-person voting disproportionately benefits African-American turnout.

Gaussian Processes for Causal Inference

Vincent Dorie, IES Postdoctoral Fellow, PRIISM Center

October 24, 2013


This brown bag talk provided a mathematical and literature background for Gaussian Processes (GP) and discussed the use of GP in non-parametric modeling of the response surface for use in making straightforward causal comparisons. Additional topics included scalability, incorporating treatment levels as a spatial dimension, and the requirements for a fully-automated "black box" system for causal inference.

Linkage of viral sequences among HIV-infected village residents in Botswana: estimation of clustering rates in the presence of missing data

Nicole Carnegie, Harvard University

September 19, 2013


Linkage analysis is useful in investigating disease transmission dynamics and the effect of interventions on them, but estimates of probabilities of linkage between infected people from observed data can be biased downward when missingness is informative. We investigate variation in the rates at which subjects' viral genotypes link by viral load (low/high) and ART status using blood samples from household surveys in the Northeast sector of Mochudi, Botswana. The probability of obtaining a sequence from a sample varies with viral load; samples with low viral load are harder to amplify. Pairwise genetic distances were estimated from aligned nucleotide sequences of HIV-1C env gp120. It is first shown that the probability that randomly selected sequences are linked can be estimated consistently from observed data. This is then used to develop maximum likelihood estimates of the probability that a sequence from one group links to at least one sequence from another group under the assumption of independence across pairs. Furthermore, a resampling approach is developed that adjusts for the presence of correlation within individuals, with diagnostics for assessing the reliability of the method.

Sequences were obtained for 65% of subjects with high viral load (HVL, n=117), 54% of subjects with low viral load but not on ART (LVL, n=180), and 45% of subjects on ART (ART, n=126). The probability of linkage between two individuals is highest if both have HVL, and lowest if one has LVL and the other has LVL or is on ART. Linkage across groups is high for HVL and lower for LVL and ART. Adjustment for missing data increases the group-wise linkage rates by 40-100%, and changes the relative rates between groups. Bias in inferences regarding HIV viral linkage that arise from differential ability to genotype samples can be reduced by appropriate methods for accommodating missing data.

The Inadequacy of the Summed Score (and How You Can Fix It!)

Daphna Harel, McGill University, Department of Mathematics and Statistics

October 17, 2013


Health researchers often use patient and physician questionnaires to assess certain aspects of health status. Item Response Theory (IRT) provides a set of tools for examining the properties of the instrument and for estimation of the latent trait for each individual. In my research, I critically examine the usefulness of the summed score over items and an alternative weighted summed score (using weights computed from the IRT model) as an alternative to both the empirical Bayes estimator and maximum likelihood estimator for the Generalized Partial Credit Model. First, I will talk about two useful theoretical properties of the weighted summed score that I have proven as part of my work. Then I will relate the weighted summed score to other commonly used estimators of the latent trait. I will demonstrate the importance of these results in the context of both simulated and real data on the Center for Epidemiological Studies Depression Scale. 


Brown Bag talk: Information Extraction from Music Audio

Juan Bello, New York Univesity

April 18, 2012


This talk will overview a mix of concepts, problems and techniques at the crossroads between signal processing, machine learning and music. Dr. Bellow started by motivating the use of content-based methods for the analysis and retrieval of music. Then, he introduced work in three projects being investigated at the Music and Audio Research Lab (MARL): automatic chord recognition using hidden Markov models, music structure analysis using probabilistic latent component analysis, and feature learning using convolutional neural networks. In the process of doing so, he hoped to illustrate some of the challenges and opportunities in the field of music informatics.

The Impact of Data Science on the Social Sciences: Perspective of a Political Scientist

Drew Conway

April 4, 2012


As an emergent discipline, "data science" is by its very nature interdisciplinary.  But what separates this new discipline from traditional data mining work is a fundamental interest in human behavior. Data science has been borne out of the proliferation of massive records of online human behavior, e.g., Facebook, Twitter, LinkedIn, etc. It is the very presence of this data, and the accompanying tools for processing it, which have lead to the meteoric rise in demand for data science. As such, principles from social science and a deep understanding of the data's substance represent core components in most data science endeavors. In this talk, Drew Conway described this and the other core components of data science through examples from my own experience, highlighting the role of social science.

Estimation of Contextual Effects through Multilevel Latent Variable Modeling with a Metropolis-Hastings Robbins-Monro Algorithm 

Ji Seung Yang 

March 20, 2012


Since human beings are social, their behaviors are naturally influenced by social groups such as one’s family, classroom, school, workplace, and country. Therefore, understanding human behaviors through not only an individual level perspective but also the lens of social context helps social researchers obtain a more complete picture of the individuals as well as society. The main theme of this talk was the definition and estimation of a contextual effect using nonlinear multilevel latent variable modeling in which measurement error and sampling error are more properly addressed. The discussion centered around an on-going research project that adopts a new algorithm, Metropolis-Hastings Robbins-Monro (MH-RM), to improve estimation efficiency in obtaining full-information maximum likelihood estimates (FIML) of the contextual effect. The MH-RM combines Markov chain Monte Carlo (MCMC) sampling and Stochastic Approximation to obtain FIML estimates more efficiently in complex models. This talk considered contextual effects not only as compositional effects but also as cross-level interactions, in which latent predictors are measured by categorical manifest variables. 

Brown Bag Discussion: Statistical modelling strategies for analyzing human movement data

Preeti Raghavan, Motor Recovery Lab, Rusk Institute, and Ying Lu, NYU PRIISM

March 20, 2012


Recent collaborations between Dr. Preeti Raghavan and Dr. Ying Lu were discussed in this talk. Using rich information of kinematic and EMG data collected at the Motor Recovery Lab, they were interested in the movement patterns and how they change when the physiology is modified due to training, injury, disease and disability. Thye have explored Principle Component Analysis as a tool for dimension reduction to identify common patterns. Since the movement data are typically recorded over a period of time, it is important to model the movement pattern over time. They discussed two aspects, treating the movement data as functional data (the functional approach) or as time series data. Accordingly, they discussed the use of functional PCA and dynamic factor analysis. Future directions of connecting EMG (muscle activities) with kinematic measures in these two contexts were discussed.

An Introduction to Item Response Theory

Ji Seung Yang

March 19, 2012


Item Response Theory (IRT) is a state-of-the-art method that has been widely used in large-scale educational assessments. Recently there has been an increased awareness of the potential benefits of IRT methodology not only in education but also in other fields such as health-related outcomes research and mental health assessment. This talk introduced the fundamentals of IRT to an audience who is not acquainted with IRT. In addition to the key concepts of IRT, the three most popular IRT models for dichotomously scored responses will be illustrated, using an empirical data example extracted from Programme for International Student Assessment (PISA, OECD). This talk covered the principles of item analysis and scoring people in IRT framework and provides a list of advanced IRT topics at the end to sketch out the current methodological research stream in IRT. 

Three perspectives on item response theory

Peter Halpin, University of Amsterdam

March 6, 2012


In this talk, Dr. Halpin introduces item response theory (IRT) to a general audience through consideration of three different perspectives. Firstly, he outlined how IRT can be motivated with reference to classical test theory (CTT). This gives us the conventional view of IRT as a theory of test scores. Secondly, he compared IRT and discrete factor analysis (DFA). From a statistical perspective, the differences are largely a matter of emphasis. This situates IRT in the more general domain of latent variable modelling. Thirdly, he showed how IRT can be represented in terms of generalized (non-) linear models. This leads to the notion of explanatory IRT, or the inclusion of covariates to model individual differences. Comparison of these perspectives allows for a relatively up-to-date “big picture” of IRT.

Point process models of human dynamics

Peter Halpin, University of Amsterdam

March 5,  2012


There is an increasing demand for the analysis of intensive time series data collected on relatively few observational units. In this presentation, Dr. Haplin addressed the case of discrete events observed at irregular time points. In particular, he discussed a class of models for coupled streams of events. These models have many natural applications in the study of human behaviour, of which I emphasize relationship counselling and classroom dynamics. He summarized his own results on parameter estimation and illustrate the model using an example from post graduate training. He also discussed ongoing developments regarding inclusion of random, time-varying covariates with measurement error and various other topics.

Brown Bag Seminar: Model Comparison is Judgment, Model Selection is Decision Making

Jay Verkuilen, CUNY Graduate Center, Educational Psychology

February 15, 2012


Model Comparison (MC) and Model Selection (MS) are now commonly used procedures in the statistical analysis of data in the behavioral and biological sciences. However, a number of puzzling questions seem to remain largely unexamined, many of which parallel issues that have been studied empirically in the judgment and decision making literature. In general, both MC and MS involve multiple criteria and are thus likely to be subject to the same difficulties as many other multi-criteria decision problems. For example, standard MS rules based upon Akaike weights employ a variation of Luce’s choice rule. The fact that Luce’s choice rule was constructed to encapsulate a probabilistic version of the ‘independence of irrelevant alternatives’ (IIA) condition has a number of consequences for the choice set of models to be compared. Contractions and dilations of the choice set are likely to be problematic, particularly given that information criteria measure only predictive success and not other aspects of the problem that are meaningful but more difficult to quantify, such as interpretability. In addition, in many models it is not entirely clear how to properly define quantities such as sample size or the number of parameters, and there are a number of key assumptions that are likely to be violated in common models, such as that of a regular likelihood. We consider some alternative ways of thinking about the problem. We offer some examples to illustrate, one using loglinear analysis and the other a binary mixed model.


Dealing with Attrition in Randomized Experiments: Non-parametric and Semi-Parametric Approaches

Cyrus Samii, New York University

December 7, 2011


Uncontrolled missingness in experimental data may undermine randomization as the basis for unbiased inference of average treatment effects. This paper reviews methods that attempt to address this problem for inference on average treatment effects. Dr. Samii reviewed inference with non-parametric bounds and inference with semi-parametric adjustment through inverse-probability weighting, imputation, and their combination. The analysis is rooted in the Neyman-Rubinpotential outcomes model, which helps to expose key assumptions necessary for identification and also for valid statistical inference(e.g., interval construction).

The Psychometrics of College Testing: Why Don't We Practice What We Teach?

Eric Loken, Pennsylvania State University

November 9, 2011


Universities with large introductory classes are essentially operating like major testing organizations. The college assessment model, however, is many decades old, and almost no attention is given to evaluating the psychometric properties of classroom testing. This is surprising considering risks in accountability, and lost opportunities for innovation in pedagogy. As used in colleges, multiple choice tests are often guaranteed to provide unequal information across the ability spectrum, and almost nothing is known about the consistency of measurement properties across subgroups. Course management systems that encourage testing from item banks can expose students to dramatically unequal assessment. Aside from issues of fairness and validity, the neglect of research on testing in undergraduate classes represents a missed opportunity to take an empirical approach to pedagogy. Years of testing have generated vast amounts of data on student performance. These data can be leveraged to inform pedagogical approaches. They can also be leveraged to provide novel assessments and tools to better encourage and measure student learning.

An "Introduction" to Respondent Driven Sampling (RDS) methodology

Krista Gile, University of Massachusetts/Amherst

October 13, 2011


Krista Gile (Department of Mathematics and Statistics University of Massachusetts/Amherst) is a statistician who works closely with social and behavioral scientists in the area of RDS. RDS is an innovative sampling technique for studying hidden and hard-to-reach populations for which no sampling frame can be obtained. RDS has been widely used to sample populations at high risk of HIV infection and has also been used to survey undocumented workers and migrants.

Subsample Ignorable Likelihood for Regression Analysis with Missing Data

Roderick J. Little, University of Michigan

April 15, 2011


Two common approaches to regression with missing covariates are complete-case analysis (CC) and ignorable likelihood (IL) methods. Dr. Little reviewed these approaches, and proposed a hybrid class, subsample ignorable likelihood (SSIL) methods, which applies an IL method to the subsample of observations that are complete on one set of variables, but possibly incomplete on others. Conditions on the missing data mechanism are presented under which SSIL gives consistent estimates, but both CC and IL are inconsistent. He motivated and applied the proposed method to data from National Health and Nutrition Examination Survey, and illustrated properties of the methods by simulation. Extensions to non-likelihood analyses are also mentioned. (Joint Work with Nanhua Zhang)

Confronting selection into and out of social settings: Neighborhood change and children's economic outcomes

Pat Sharkey, NYU Sociology

March 23, 2011


Selection bias continues to be a central methodological problem facing observational research estimating the effects of social settings on individuals. This article develops a method to estimate the impact of change in a particular social setting, the residential neighborhood, that is designed to address non-random selection into a neighborhood and non-random selection out of a neighborhood. Utilizing matching to confront selection into neighborhood environments and instrumental variables to confront selection out of changing neighborhoods, the method is applied to assess the effect of a decline in neighborhood concentrated disadvantage on the economic fortunes of African American children living within changing neighborhoods. Substantive findings indicate that a one standard deviation decline in concentrated disadvantage leads to increases in African American children's adult economic outcomes, but no effects on educational attainment or health.

Modelling Birthweight in the Presence of Gestational Age Measurement Error: A Semi-parametric Multiple Imputation Model

Russ Steel, McGill University

March 2, 2011


Gestational age is an important variable in perinatal research, as it is a strong predictor of mortality and other adverse outcomes, and is also a component of measures of fetal growth. However, gestational ages measured using the date of the last menstrual period (LMP) are prone to substantial errors. These errors are apparent in most population-based data sources, which often show such implausible features as a bimodal distribution of birth weight at early preterm gestational ages (≤ 34 weeks) and constant or declining mean birth weight at postterm gestational ages (≥ 42 weeks). These features are likely consequences of errors in gestational age. Gestational age plays a critical role in measurement of outcome (preterm birth, small for gestational age) and is an important predictor of subsequent outcomes. It is important in the development of fetal growth standards. Therefore, accurate measurement of gestational age, or, failing that, a reasonable understanding of the structure of measurement error in the gestational age variable, is critical for perinatal research. In this talk, I will discuss the challenges in adjusting for gestational age measurement error via multiple imputation. In particular, Dr. Steel emphasizes the tension between flexibly modelling the distribution of birthweights within a gestational age and allowing for gestational age measurement error. He discusses strategies for incorporating prior information about the measurement error distribution and averaging over uncertainty in the distribution of the birthweights conditional on the true gestational age.


Didactic Talk: Using Multilevel Data to Control for Unobserved Confounders: Fixed and Random Effects Approaches

Professor Jack Buckley, New York University 

November 3rd, 2010

Methods Lecture: An Empirical Model for Strategic Network Formation.

Guido Imbens, Harvard University

October 29, 2010

Co-sponsored with the NYU Department of Economics


Dr. Guido Imbens and his team develop and analyze a tractable empirical model for strategic network formation that can be estimated with data from a single network at a single point in time. They model the network formation as a sequential process where in each period a single randomly selected pair of agents has the opportunity to form a link. Conditional on such an opportunity, a link will be formed if both agents view the link as beneficial to them. They base their decision on their own characteristics, the characteristics of the potential partner, and on features of the current state of the network, such as whether the two potential partners already have friends in common.  A key assumption is that agents do not take into account possible future changes to the network.  This assumption avoids complications with the presence of multiple equilibria, and also greatly simplifies the computational burden of analyzing these models.  They use Bayesian markov-chain-monte-carlo methods to obtain draws from the posterior distribution of interest.  The team applies their methods to a social network of 669 high school students, with, in average, 4.6 friends. They then use the model to evaluate the effect of an alternative assignment to classes on the topology of the network. This is joint work with Nicholas Christakis, James Fowler, and Karthik Kalyanaraman.

Brown Bag: Informal discussion of the methodology associated with a work in progress

Pat Shrout, New York University

October 27, 2010


Pat Shrout presented a work-in-progress that examines lagged effects of conflict in intimate couples on same-day closeness. The data was derived from daily diaries, and as such is more intensive (dense) than traditional longitudinal data. Dr. Shrout discussed open issues arising in model selection, which highlight the tension between model choice, substantive questions, interpretation and causality.

Statistics in Society Lecture: Forecasting Large Panel Data with Penalized Least-Squares

Jianqing Fan, Professor of Finance and Professor of Statistics, Princeton University

September 17, 2010

Co-sponsored by the Stern IOMS-Statistics Group 


Large Panel data arise from many diverse fields such as economics, finance, meteorology, energy demand management and ecology where spatial-temporal data are collected. Neighborhood correlations allow us to better forecast future outcomes, yet neighborhood selection becomes an important and challenging task. In this talk, Dr. Fan introduced the penalized least-squares to select the neighborhood variables that have an impact on the forecasting power. An iterative two-scale approach will be introduced. The inherent error (noise level) will also be estimated in the high-dimensional regression problems, which serves as the benchmark for forecasting errors. The techniques will be illustrated in forecasting the US house price indices at various Core Based Statistical Area (CBSA) levels.

An Introduction to Multiple Imputation: A More Principled Missing Data Solution

Jennifer Hill, Professor of Applied Statistics, New York University

May 5th, 2010

Brown Bag Talk - Variable Selection For Linear Mixed Effect Models

Ying Lu, Assistant Professor of Applied Statistics, New York University

March 24, 2010


Mixed effect models are fundamental tools for the analysis of longitudinal data, panel data and cross-sectional data. They are widely used by various fields of social sciences, medical and biological sciences. However, the complex nature of these models has made variable selection and parameter estimation a challenging problem. In this paper, Dr. Lu proposed a simple iterative procedure that estimates and selects fixed and random effects for linear mixed models. In particular, she proposed to utilize the partial consistency property of the random effect coefficients and select groups of random effects simultaneously via a data-oriented penalty function (the smoothly clipped absolute deviation penalty function). She showed that the proposed method is a consistent variable selection procedure and possesses the Oracle properties. Simulation studies and a real data analysis are also conducted to empirically examine the performance of this procedure.

Statistical Methods for Sampling Hidden Networked Populations

Mark S. Handcock

February 12, 2010


Part of the Stern IOMS-Statistics Seminar Series, this talk will provide an overview of probability models and inferential methods for the analysis of data collected using Respondent Driven Sampling (RDS). RDS is an innovative sampling technique for studying hidden and hard-to-reach populations for which no sampling frame can be obtained. RDS has been widely used to sample populations at high risk of HIV infection and has also been used to survey undocumented workers and migrants. RDS solves the problem of sampling from hidden populations by replacing independent random sampling from a sampling frame by a referral chain of dependent observations: starting with a small group of seed respondents chosen by the researcher, the study participants themselves recruit additional survey respondents by referring their friends into the study. As an alternative to frame-based sampling, the chain-referral approach employed by RDS can be extremely successful as a means of recruiting respondents. Current estimation relies on sampling weights estimated by treating the sampling process as a random walk on a graph, where the graph is the social network of relations among members of the target population. These estimates are based on strong assumptions allowing the sample to be treated as a probability sample. In particular, the current estimator assumes a with-replacement sample or small sample fraction, while in practice samples are without-replacement, and often include a large fraction of the population. A large sample fraction, combined with different mean nodal degrees for infected and uninfected population members, induces substantial bias in the estimates. Dr. Handcock introduces a new estimator which accounts for the without-replacement nature of the sampling process, and removes this bias. He then briefly introduce a further extension which uses a parametric model for the underlying social network to reduce the bias induced by the initial convenience sample.

The fifth PRIISM-organized Statistics in Society lecture

Mark S. Handcock, Department of Statistics, University of California - Los Angeles

February 11, 2010

Co-sponsored by the Stern IOMS-Statistics Group


In many situations information from a sample of individuals can be supplemented by information from population level data on the relationship of the explanatory variable with the dependent variables. Sources of population level data include a census, vital events registration systems and other governmental administrative record systems. They contain too few variables, however, to estimate demographically interesting models. Thus in a typical situation, the estimation is done by using sample survey data alone, and the information from complete enumeration procedures is ignored. Sample survey data, however, are subjected to sampling error and bias due to non- response, whereas population level data are comparatively free of sampling error and typically less biased from the effects of non-response.

In this talk, Dr. Handcok reviewed statistical methods for the incorporation of population level information and showed it can lead to statistically more accurate estimates and better inference. Population level information can be incorporated via constraints on functions of the model parameters. In general the constraints are non-linear, making the task of maximum likelihood estimation more difficult. He presented an alternative approach exploiting the notion of an empirical likelihood. He gave an application to demographic hazard modeling by combining panel survey data with birth registration data to estimate annual birth probabilities by parity.


Fixed Effects Models in Causal Inference: A work-in-progress

Michael Sobel, Columbia University

December 9, 2009

This talk focused on a work in progress that clarified the role of fixed effects models in causal inference.  Dr. Sobel made explicit the assumptions researchers implicitly make when using such models and what is actually being estimated both of which are commonly misunderstood by those who use this strategy to identify causal effects.

Does Special Education Actually Work?

Michael Foster, Professor of Maternal and Child Health in the School of Public Health, University of North Carolina, Chapel Hill

October 1, 2009

This talk explored the efficacy of current special education policies while highlighting the role of new methods in causal inference in helping to answer it. Jointly sponsored by the Departments of Teaching and Learning and Applied Psychology, and by the Institute for Human Development and Social Change. The lecture was followed by a reception celebrating the official launch of the PRIISM Center.


This presentation assesses the effect of special education on school dropout (that is, the timing of a significant interruption in schooling) for children at risk for emotional and behavioral disorders (EBD). The analysis assesses the extent to which involvement in special education services raises the likelihood of an interruption in schooling in the presence of time-dependent confounding by aggression. By using a child's observed school interruption time and history of special education and aggression, this strategy for assessing causal effects (which relies on g-estimation) relates the observed timing of school interruption to the counterfactual; that is, what would have occurred had the child never been involved in special education. This analysis involves data on 1,089 children collected by the Fast Track project. Subject to important assumptions, our results indicate that involvement in special education services reduces time to school interruption by a factor of 0.64 to 0.93. In conclusion the efficacy of special education services is questionable which suggests that more research should be devoted to developing effective school-based interventions for children with emotional and behavioral problems.

Weather & Death in India: Mechanisms and Implications for Climate Change

Michael Greenstone

May 5, 2009


Is climate change truly a matter of life and death? Dr. Michael Greenstone discusses revelatory new research on the impact of variations in weather on well-being in India. The results indicate that high temperatures dramatically increase mortality rates; for example, 1 additional day with a mean temperature above 32° C, relative to a day in the 22° - 24° C range, increases the annual mortality rate by 0.9% in rural areas. This effect appears to be related to substantial reductions in the income of agricultural laborers due to these same hot days. Finally, the estimated temperature-mortality relationship and state of the art climate change projections reveal a substantial increase in mortality due to climate change, which greatly exceeds the expected impact in the US and other developed countries. Co-sponsored by the Global MPH program, the NYU Steinhardt School of Culture, Education and Human Development, and the NYU Environmental Studies program. Presented as part of the ongoing series Statistics in Society, organized by PRIISM.

Data analysis in an 'expanded field'

Mark Hansen, UCLA

February 12, 2009

Mark Hansen, a UCLA statistician with joint appointments in Electrical Engineering and Design/Media Art, gave a talk that examined the interface between statistics, computing and society entitled "Data analysis in an 'expanded field' ". Dr. Hansen is perhaps best known locally for his work co-creating a current art installation, "Movable Type" in the New York Times Building here in manhattan. However, his research reaches far beyond this realm drawing on fields as diverse as information theory, numerical analysis, computer science, and ecology. For instance, Hansen served as Co-PI for the Center for Embedded Networked Sensing or CENS, an NSF Science and Technology Center) that describes itself as "a major research enterprise focused on developing wireless sensing systems and applying this revolutionary technology to critical scientific and societal pursuits. In the same way that the development of the Internet transformed our ability to communicate, the ever decreasing size and cost of computing components is setting the stage for detection, processing, and communication technology to be embedded throughout the physical world and, thereby, fostering both a deeper understanding of the natural and built environment and, ultimately, enhancing our ability to design and control these complex systems."


Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do

Andrew Gelman, Professor in the Departments of Statistics and Political Science at Columbia University

October 14, 2008


Andrew Gelman is a Professor in the Departments of Statistics and Political Science at Columbia University.  His new book, "Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do," is receiving tremendous critical praise. Gelman has recently been featured on several radio programs including WNYC's Leonard Lopate Show. Professor Gelman recently appeared on the Leonard Lopate show; his talk will draw from his book on the same topic.