Spring 2022 Seminars
Towards Explainable Deep Survival Analysis Models with Guarantees
A seminar by George Chen, assistant professor of information systems at Carnegie Mellon University's Heinz College
April 20, 2022
Survival analysis is about modeling how much time will elapse until a critical event occurs. Examples of such critical events include death, disease relapse, readmission to the hospital, device failure, a customer ending a subscription service, or a convicted criminal reoffending. Recent machine learning advances in survival analysis have largely focused on architecting deep nets to achieve state-of-the-art prediction accuracy, with very little focus on whether the learned models are easy for application domain experts to interpret. In this talk, Dr. Chen will discuss ongoing work on developing neural survival models that not only achieve prediction accuracy competitive with the state-of-the-art but also aim to be explainable and come with statistical accuracy guarantees. Specifically, I present a new class of scalable deep kernel survival analysis based on automatically learning a similarity score between any two data points (e.g., patients). Dr. Chen and colleagues show experimental results on healthcare survival analysis datasets (in which they predict time until death for patients with various diseases) and also a music subscription dataset (in which they predict time until a customer ends their subscription).
Of Mis-Defined Causal Questions: The Case of Race and Multi-Stage Outcomes
A seminar by Issa Kohler-Hausmann (professor of law at Yale Law School and associate professor of sociology at Yale) and Lily Hu (PhD Candidate, Harvard, and soon to be assistant professor of philosophy at Yale University)
March 30, 2022
A number of influential causal inference researchers have asked the following question: Can we quantify an effect of race on a decision that takes place downstream of other decisions that were themselves causally affected by race? If so, how? A recent paper by such researchers addressing police use of force argued that the existence of racial discrimination at a stage prior to the decision of interest biases standard estimates of a causal effect of race on that decision. This work subsequently generated a flurry of debate among researchers about the exact conditions under which certain race-causal estimands can be properly identified. In this talk, we address conceptual questions that must be answered prior to methodological ones of whether and under what assumptions these race-causal estimands can be identified. Without addressing these conceptual questions, it is unclear precisely which, if any, causal dependencies these estimands claim to represent. We argue that the existence of race selection in the prior stage poses a conceptual problem with the causal quantities that are the target of identification. Namely, the target causal quantities have been mis-defined in this literature. Finally, we address whether these race-causal estimands could plausibly correspond to legal concepts of discrimination.
Prioritize Patients Not Patience - Using optimal test assembly to shorten patient reported outcome measures: A case study of the PHQ-9
A seminar by Daphna Harel, associate professor of applied statistics at NYU
February 23, 2022
How can we learn more by asking less? Dr. Harel demonstrates ways to reduce the burden on survey respondents by shortening surveys without compromising the information received.
Patient-reported outcome measures are widely used to assess respondent experiences, well-being, and treatment response in clinical trials and cohort-based observational studies in both medicine and psychological studies. However, respondents may be asked to respond to many different scales in order to provide researchers and clinicians with a wide array of information regarding their experiences. Therefore, collecting such long and cumbersome patient-reported outcome measures may burden respondents and increase research costs. However, little research has been conducted on optimal, replicable, and reproducible methods to shorten these instruments. In this talk, Dr. Harel proposes the use of mixed integer programming through Optimal Test Assembly as a method to shorten patient-reported outcome measures. She will describe this through a case study of the Patient Health Questionnaire - 9.
A Multistate Approach for Mediation Analysis in the Presence of Semi-competing Risks with Application in Cancer Survival Disparities
A seminar by Linda Valeri, assistant professor in biostatistics at the Columbia University Mailman School of Public Health.
February 2, 2022
Assistant Professor Linda Valeri talks about how new approaches to mediation analysis can help understand racial disparities in cancer survival.
We provide novel definitions and identifiability conditions for causal estimands that involve stochastic interventions on non-terminal time-to-events that lie on the pathway between an exposure and a terminal time-to-event outcome. Causal contrasts are estimated in continuous time within a multistate modeling framework accounting for semi-competing risks and analytic formulae for the estimators of the causal contrasts are developed. We employ this novel methodology to investigate the role of delaying treatment uptake in explaining racial disparities in cancer survival in a cohort study of colon cancer patients.
Fall 2021 Seminars
Police Violence Reduces Civilian Cooperation and Engagement
A seminar by Desmond Ang, applied economist and assistant professor at the Harvard Kennedy School of Government
November 10, 2021
Join PRIISM and Assistant Professor Desmond Ang to learn how to use statistical methods to analyze impacts of police violence on civilian engagement and reporting.
How do high-profile acts of police brutality affect public trust and cooperation with law enforcement? To investigate this question, we develop a new measure of civilian crime reporting that isolates changes in community engagement with police from underlying changes in crime: the ratio of police-related 911 calls to gunshots detected by ShotSpotter technology. Examining detailed data from eight major American cities, we show a sharp drop in both the call-to-shot ratio and 911 call volume immediately after the police murder of George Floyd in May 2020. Notably, reporting rates decreased significantly in both non-white and white neighborhoods across the country. These effects persist for several months, and we find little evidence that they were reversed by the conviction of Floyd’s murderer. Together, the results illustrate how acts of police violence may destroy a key input into effective law enforcement and public safety: civilian engagement and reporting. Joint work with Panka Bencsik, Jesse Bruhn and Ellora Derenoncourt.
Optimal Tests of the Composite Null Hypothesis Arising in Mediation Analysis
A seminar by Caleb Miles, assistant professor in the Department of Biostatistics at the Columbia University Mailman School of Public Health
October 27, 2021
Join PRIISM and Assistant Professor Caleb Miles to learn about mediation analysis and how it can be used in statistics.
The indirect effect of an exposure on an outcome through an intermediate variable can be identified by a product of regression coefficients under certain causal and regression modeling assumptions. Thus, the null hypothesis of no indirect effect is a composite null hypothesis, as the null holds if either regression coefficient is zero. A consequence is that existing hypothesis tests are either severely underpowered near the origin (i.e., when both coefficients are small with respect to standard errors) or do not preserve type 1 error uniformly over the null hypothesis space. We propose hypothesis tests that (i) preserve level alpha type 1 error, (ii) meaningfully improve power when both true underlying effects are small relative to sample size, and (iii) preserve power when at least one is not. One approach gives a closed-form test that is minimax optimal with respect to local power over the alternative parameter space. Another uses sparse linear programming to produce an approximately optimal test for a Bayes risk criterion. We provide an R package that implements the minimax optimal test.
Using Machine Learning to Increase Equality in Healthcare and Public Health
A seminar by Emma Pierson, assistant professor of computer science at the Jacobs Technion-Cornell Institute at Cornell Tech and the Technion
October 13, 2021
Join PRIISM and Dr. Emma Pierson to get a glimpse into how machine learning can reduce inequality in healthcare.
Our society remains profoundly unequal. Worse, there is abundant evidence that algorithms can, improperly applied, exacerbate inequality in healthcare and other domains. This talk pursues a more optimistic counterpoint -- that data science and machine learning can also be used to illuminate and reduce inequality in healthcare and public health -- by presenting vignettes about women's health, COVID-19, and pain.
Understanding Human Factors in Forensic Science using Item Response Theory
A seminar by Amanda Luby, assistant professor of statistics at Swarthmore College
September 15, 2021
Join PRIISM and Dr. Amanda Luby to learn how using Item Response Theory can improve the analysis and interpretation of forensic science.
Forensic science often involves the evaluation of crime-scene evidence to determine whether it matches a known-source sample, such as determining if a fingerprint or DNA was left by a suspect. Even as forensic measurement and analysis tools become increasingly sophisticated, final source decisions are often left to individual examiners' interpretation. However, the current approach to characterizing uncertainty in forensic decision-making has largely centered around conducting error rate studies (in which examiners evaluate a set of items consisting of known-source evidence) and calculating aggregated error rates. This approach is not ideal for comparing examiner performance, as decisions are not always unanimous and error frequency is likely to vary depending on the quality of the physical evidence. Item Response Theory (IRT), a class of statistical methods used prominently in educational testing, is one approach that accounts for differences in proficiency among participants and additionally accounts for varying difficulty among items. Using simple IRT models, more elaborate decision tree models, and extensions, along with data from the FBI “Black Box” and “White Box” studies, Dr. Luby and her team find that there is considerable variability in print quality assessments, inconclusive rates, perceived difficulty, and minutiae identification even when examiners largely agree on a final source decision. In this talk, Dr. Luby will review some of our recent advances, outline challenges in applying IRT in practice, and discuss the implications of these findings within the criminal justice system.
Disappearing Students During COVID-19? Evidence from Large-Scale Messaging Data
A seminar by Rebecca Johnson, assistant professor in the Program in Quantitative Social Science at Dartmouth College
September 29, 2021
Join PRIISM as we learn from Dr. Rebecca Johnson how her randomized controlled trial study of TeacherText, an online application, investigates questions about family-school interactions during remote instruction.
A large body of research documents inequalities in family-school interactions. Yet the methodologies used---either intensive ethnographic observation of families and teachers or survey-based measures that ask families to self-report their school involvement---create gaps in our understanding of how family-school interactions impact inequality. These gaps became more apparent during COVID-19, as policy concerns emerged about families “disappearing’’ from contact during virtual learning confronted methods ill-suited to measure these changes. The present project draws upon a randomized controlled trial of “TeacherText”, a web and mobile-based application that makes it easier for teachers and school administrators to interact with families (e.g., auto-translations; training on positive messages). We use large-scale metadata and messaging data from the platform (~340,000 messages between 208 school staff and 4,298 parent-student dyads; 6 DC Public and Public Charter schools; messaging in 2019-2020 before and during COVID-19 online pivots), linked to administrative data from the district’s student information system (SIS), to investigate two questions about family-school interactions during virtual pivots (pre-analysis plan). First, we show that when examining interactions longitudinally, disappearance from contact is much rarer than two other statuses: interactions both before and after the COVID-19 virtual pivot (modal status) or no interactions either period. Then, we use text analysis to highlight two mechanisms for how school staff continued to engage families: the use of tools to simultaneously interact with many families and the platform expanding beyond academic-focused messages to messages connecting families with social services. Concluding, we discuss the benefits and challenges of using “digital trace data” to measure family-school interactions. Joint work with Vicky Mei.
Reimagining the Role of Schools in Society: A Conversation and Call to Action
May 26, 2021
For 200 years, we have debated the role of schools in U.S. society. Today, we face an unprecedented opportunity to reexamine long-standing assumptions and to include voices that have been marginalized in the construction of our current systems. As we struggle with the impacts of a global pandemic and ongoing racial injustice, how do we take this moment as an opportunity to re-envision the role of schools in U.S. society? How do we enact this fresh vision for the 2021-2022 school year and beyond?
Our forum aims to re-imagine a role for schools that rectifies societal inequities rather than replicates them, embraces new opportunities to meet student needs, and "builds back better" in the areas of mental health, teaching and learning, and racial and social justice. Join us for a vigorous, forward-thinking conversation and call to action at this unprecedented moment, co-sponsored by the Institute of Human Development and Social Change (IHDSC), the Institute of Education Sciences-funded Predoctoral Interdisciplinary Research Training (IES-PIRT) program, the Center for Practice and Research at the Intersection of Information, Society, and Methodology (PRIISM), and the Research Alliance for New York City Schools.
This conversation will feature research and practice ideas from a new generation of education scholars, and a moderated discussion and Q&A from a panel of seasoned education leaders.
PRIISM Data Science For Social Impact
May 12, 2021
PRIISM, funded by the Moore Sloan Data Science Environment at NYU, created a competitive social impact research fellowship program that awarded funding and provided mentorship to five NYU graduate students, with an emphasis on awarding fellowships to students from groups currently underrepresented in STEM fields. These five students were matched with a research project at NYU. As a reflection after the end of their fellowships, we organized an event with a series of short talks for the fellows to highlight the challenges and opportunities that arise when data science tools are used to understand and make a positive impact on the world around us.
The event featured the work of the following five NYU research teams:
- The THINK: Tracking Hope in Nairobi and Karachi project uses a regression discontinuity design to understand the effect of education access on hope, peace, and conflict among youth in Nairobi and Karachi. The PIs on this project, Elisabeth King, Dana Burde, Jennifer Hill and Daphna Harel mentored, Dorothy Seaman, PRIISM Social Impact Research Fellow.
- A Consensus Among Asset Managers on Fostering Counterintuitive Skill Development project tries to understand the role of organizational practices and structures needed for asset managers to make investment decisions with sustainability in mind. PI Tracy Van Holt mentored George Perrett, PRIISM Social Impact Research Fellow.
- The Public Safety Lab's Jail Data Initiative is an effort to collect and match daily jail records with criminal records, providing anonymized data to research and policy communities. Anna Harvey, PI on this project mentored, Chutang Luo, PRIISM Social Impact Research Fellow.
- The Háblame Bebé & Nurse-Family Partnership project examined infant brain functioning in relation to experiences of maternity leave and physiological stress. Natalie Brito, PI on this project mentored, John Zhang, PRIISM Social Impact Research Fellow.
- The Segregation of the School Segregation Literature project presented the role of implicit bias in school segregation research citations by conducting a bibliometric network analysis of peer-reviewed publications. Ying Lu and L'Heureux Lewis-Mccoy (PIs), mentoried Evaristus Ezekwem, PRIISM Social Impact Research Fellow.
Spring 2021 Seminars
Quasi-Experimental Methods for Estimating the Impact of Vacant Lot Remediation on Crime
A seminar by John MacDonald, professor of criminology and sociology at the University of Pennsylvania
April 28, 2021
Can investments in blight remediation programs reduce crime rates? Join us and John MacDonald to learn how to apply quasi-experimental and experimental approaches to examine whether vacant lot greening programs can provide a sustainable approach to reducing crime in disadvantaged neighborhoods.
Place-based blight remediation programs have gained popularity in recent years as a crime reduction approach. This study estimated the impact of a citywide vacant lot greening program in Philadelphia on changes in crime over multiple years, and whether the effects were moderated by nearby land uses.
The vacant lot greening program was assessed using quasi-experimental and experimental designs. Entropy distance weighting was used in the quasi-experimental analysis to match control lots to be comparable to greened lots on pre-existing crime trends. Fixed-effects difference-in-differences models were used to estimate the impact of the vacant lot greening program in quasi-experimental and experimental analyses.
Vacant lot greening was estimated to reduce total crime and multiple subcategories in both the quasi-experimental and experimental evaluations. Remediating vacant lots had a smaller effect on reducing crime when they were located nearby train stations and alcohol outlets. The crime reductions from vacant lot remediations were larger when they were located near areas of active businesses. There is some suggestive evidence that the effects of vacant lot greening are larger when located in neighborhoods with higher pre-intervention levels of social cohesion.
The findings suggest that vacant lot greening provides a sustainable approach to reducing crime in disadvantaged neighborhoods, and the effects may vary by different surrounding land uses. To better understand the mechanisms through which place-based blight remediation interventions reduce crime, future research should measure human activities and neighborly socialization in and around places before and after remediation efforts are implemented.
Marginal Structural Models for Causal Inference with Continuous-Time Treatments
A seminar by Liangyuan Hu, assistant professor of biostatistics in the Department of Population Health Science & Policy at Mount Sinai School of Medicine
April 14, 2021
In this seminar, Liangyuan Hu explains how causal inference models can help improve health treatments for HIV, COVID-19 and cardiovascular diseases.
Public health research often involves evaluating the effects of continuous-time treatments. Causal inference has traditionally focused on the estimation of causal effects of a number of treatments defined at baseline. In the case where treatment assignment is time-dependent, the treatment is often categorized in terms of time intervals for treatment initiation. This categorization can lead to the coarsening of information on treatment initiation and fails to answer the question of the causal effect of actual treatment timing. The marginal structural model, pioneered by Robins and colleagues, has been widely used for causal inference. It is easy to implement and provides a general infrastructure for the weighting based methods to address confounding, particularly time-varying confounding. In this talk, Dr. Hu will show how the marginal structural model can be used to capture the causal effect of the continuous-time treatment when treatment initiation is either static or dynamic. Dr. Hu will derive estimation strategies amenable to marginal structural models to overcome complications frequently encountered in observational healthcare data, including incomplete treatment initiation time and censored survival outcomes. A case study applying our approaches to a large-scale electronic health record data will estimate the optimal antiretroviral therapy initiating rules for patients presenting with HIV/TB coinfection and HIV-infected adolescents. New insights that can be gained relative to findings from randomized trials will be discussed. Finally, Dr. Hu will discuss how the methods can be used and extended to address important emerging questions related to cardiovascular and COVID-19 diseases.
Does Science Self-Correct? What We've Learned At Retraction Watch
A seminar by Ivan Oransky
April 9, 2021
Co-sponsored event with CoHRR
Ivan Oransky, MD, is co-founder of Retraction Watch, vice president of editorial at Medscape, and distinguished writer in residence at New York University's Arthur Carter Journalism Institute. He also serves as president of the Association of Health Care Journalists. Ivan previously was global editorial director of MedPage Today, executive editor of Reuters Health, and held editorial positions at Scientific American and The Scientist. A 2012 TEDMED speaker, he is the recipient of the 2015 John P. McGovern Medal for excellence in biomedical communication from the American Medical Writers Association, and in 2017 was awarded an honorary doctorate in civil laws from The University of the South (Sewanee).
Dropping Standardized Testing for Admissions: Differential Variance and Access
A seminar by Nikhil Garg, assistant professor at Cornell Tech
March 31, 2021
Nikhil Garg talks about the impacts of changing standardized test score requirements for college admission.
The University of California suspended through 2024 the requirement that applicants from California submit SAT scores, upending the major role standardized testing has played in college admissions. We study the impact of such decisions and its interplay with other policies on admitted class composition. We consider a theoretical framework to study the effect of requiring test scores on academic merit and diversity in college admissions. The model has a college and set of potential students. Each student has observed application components and group membership, as well as an unobserved noisy skill level generated from an observed distribution. The college is Bayesian and maximizes an objective that depends on both diversity and merit. It estimates each applicant’s true skill level using the observed features and then admits students with or without affirmative action. We characterize the trade-off between the (potentially positive) informational role of standardized testing in college admissions and its (negative) exclusionary nature. Dropping test scores may exacerbate disparities by decreasing the amount of information available for each applicant, especially those from non-traditional backgrounds. However, if there are substantial barriers to testing, removing the test improves both academic merit and diversity by increasing the size of the applicant pool. Finally, using application and transcript data from the University of Texas at Austin, we demonstrate how an admissions committee could measure the trade-off in practice to better decide whether to drop their test scores requirement. Joint work with Hannah Li and Faidra Monachou. Read the full paper.
Statistical Learning with Electronic Health Records Data
A seminar by Jessica Gronsbell, assistant professor at the University of Toronto
March 17, 2021
In this day and age, electronic healthcare data is greatly underutilized. In this seminar, Jessica Gronsbell gives us an in-depth look into how state-of-the-art statistical techniques can help improve healthcare delivery and understanding of disease development.
The adoption of electronic health records (EHRs) has generated massive amounts of routinely collected medical data with potential to improve our understanding of healthcare delivery and disease processes. In this talk, Dr. Gronsbell will discuss methods that bridge classical statistical theory and modern machine learning tools in an effort to extract reliable insights from imperfect EHR data. She will focus primarily on (i) the challenges in obtaining annotated outcome data, such as presence of a disease or clinical condition, from patient records and (ii) how leveraging unlabeled examples to improve model estimation and evaluation can reduce the annotation burden.
Revisiting the Gelman-Rubin Diagnostic
A seminar by Christina Knudson, assistant professor of statistics at the University of St. Thomas
February 24, 2021
This seminar by Christina Knudson, an expert in generalized linear mixed models and MCMC methods, takes an in-depth look into new connections between the Gelman-Rubin statistic and Monte Carlo variance estimators.
Gelman and Rubin's (1992) convergence diagnostic is one of the most popular methods for terminating a Markov chain Monte Carlo (MCMC) sampler. Since the seminal paper, researchers have developed sophisticated methods for estimating variance of Monte Carlo averages. We show that these estimators find immediate use in the Gelman-Rubin statistic, a connection not previously established in the literature. We incorporate these estimators to upgrade both the univariate and multivariate Gelman-Rubin statistics, leading to improved stability in MCMC termination time. An immediate advantage is that our new Gelman-Rubin statistic can be calculated for a single chain. In addition, we establish a one-to-one relationship between the Gelman-Rubin statistic and effective sample size. Leveraging this relationship, we develop a principled termination criterion for the Gelman-Rubin statistic. Finally, we demonstrate the utility of our improved diagnostic via examples.
COVID Tracking as a Prism for Refracting Tech Ethics
A seminar by Laura Norén, VP of Privacy and Trust at Obsidian Security
February 10, 2021
In this seminar Laura Norén unpacks the social and ethical impacts around hundreds of apps designed to address COVID.
COVID landed in a culture accustomed to having "an app for that" - whatever "that" may be - and has now generated hundreds of apps designed to address COVID. The technical and social variation from app to app and from one community of engagement to the next provides an exquisite refractory prism for reflection about technical ethics, the "good" outcome, and the longstanding tension between utilitarian ethics (generally favored by the tech community) and virtue or duty ethics (more frequently called upon within the institutions of family, religion, and/or outside the US context). In this talk, standard data project management questions about what an app can/should do, who pays for it, what type of data to collect, how long to retain it, with whom to share it, which other data streams should be combined, what types of predictions and decisions to make with it, and what context these decisions will occur in are considered. Scoffing at the "app for that" answer is short-sighted. Unpacking the social and ethical impacts that accrue along the way is particularly important as COVID apps continue to proliferate during the vaccine rollout and hybrid open/closed urban reality. More broadly, working through these questions in a context that impacts us all provides a particularly sticky set of lessons and questions pertinent to many processes of technical intervention in social life and the public sphere.