- Location
- Virtual
- Series/Type
- DLSPH Event
- Dates
- February 14, 2022 from 3:30pm to 4:30pm
Links
Join us at the Data Science Applied Research and Education Seminar (ARES) with:
Dr. Rebecca Hubbard
Professor of Biostatistics
Department of Biostatistics, Epidemiology & Informatics
University of Pennsylvania Perelman School of Medicine
Free Event | Registration Required
Talk Title: Using electronic health records to accelerate research without sacrificing scientific rigor
Abstract: Opportunities to use “real world data,” including electronic health records and medical claims, have exploded over the past decade. Such data sources facilitate research in a naturalistic setting that can potentially proceed much more quickly than research relying on primary data collection. However, using data that were not collected for research purposes comes at a cost, and naïve use of such data without considering their complexity and imperfect quality can lead to biased inference. Real-world data frequently violate the assumptions of standard statistical methods, and it is not practicable to develop new methods to address every possible complication arising in their analysis. The scientist is faced with a quandary: how to effectively utilize real-world data to advance research without compromising best practices for principled data analysis. Data science, bridging scientific domain expertise with technical facility in working with complex data, offers a solution to this quandary. In this talk I will use examples from my research on methods for the analysis of electronic health records (EHR) derived-data to illustrate approaches to leveraging a scientific understanding of the data generating mechanism to improve the analysis of real-world data. Drawing on this understanding, I will discuss approaches to identify, use, and develop principled methods for EHR data analysis. The overarching goal of this presentation is to raise awareness of challenges associated with the analysis of EHR data and demonstrate how a principled approach can be grounded in an understanding of the scientific context and data generating process.