TY - JOUR
T1 - Automated identification of implausible values in growth data from pediatric electronic health records
AU - Daymont, Carrie
AU - Ross, Michelle E.
AU - Localio, A. Russell
AU - Fiks, Alexander G.
AU - Wasserman, Richard C.
AU - WGrundmeier, Robert
N1 - Funding Information:
This project is supported by the Health Resources and Services Administration (HRSA) of the US Department of Health and Human Services (HHS) under grant number R40MC24943 and title “Primary Care Drug Therapeutics CER in a Pediatric EHR Network,” number UB5MC20286 and title “Pediatric Primary Care EHR Network for CER,” and number UA6MC15585 and title “National Research Network to Improve Child Health Care.” Funding was also provided by the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) under the Best Pharmaceuticals for Children Act. This information, content, and conclusions are those of the authors and should not be construed as the official position or policy of, nor should any endorsements be inferred by, HRSA, HHS, NICHD, or the US government.
Publisher Copyright:
© The Author 2017.
PY - 2017/11/1
Y1 - 2017/11/1
N2 - Objective: Large electronic health record (EHR) datasets are increasingly used to facilitate research on growth, but measurement and recording errors can lead to biased results. We developed and tested an automated method for identifying implausible values in pediatric EHR growth data. Materials and Methods: Using deidentified data from 46 primary care sites, we developed an algorithm to identify weight and height values that should be excluded from analysis, including implausible values and values that were recorded repeatedly without remeasurement. The foundation of the algorithm is a comparison of each measurement, expressed as a standard deviation score, with a weighted moving average of a child's other measurements. We evaluated the performance of the algorithm by (1) comparing its results with the judgment of physician reviewers for a stratified randomselection of 400measurements and (2) evaluating its accuracy in a dataset with simulated errors. Results: Of 2 000 595 growth measurements from 280 610 patients 1 to 21 years old, 3.8% of weight and 4.5% of height values were identified as implausible or excluded for other reasons. The proportion excluded varied widely by primary care site. The automated method had a sensitivity of 97% (95% confidence interval [CI], 94- 99%) and a specificity of 90% (95% CI, 85-94%) for identifying implausible values compared to physician judgment, and identified 95% (weight) and 98% (height) of simulated errors. Discussion and Conclusion: This automated, flexible, and validated method for preparing large datasets will facilitate the use of pediatric EHR growth datasets for research.
AB - Objective: Large electronic health record (EHR) datasets are increasingly used to facilitate research on growth, but measurement and recording errors can lead to biased results. We developed and tested an automated method for identifying implausible values in pediatric EHR growth data. Materials and Methods: Using deidentified data from 46 primary care sites, we developed an algorithm to identify weight and height values that should be excluded from analysis, including implausible values and values that were recorded repeatedly without remeasurement. The foundation of the algorithm is a comparison of each measurement, expressed as a standard deviation score, with a weighted moving average of a child's other measurements. We evaluated the performance of the algorithm by (1) comparing its results with the judgment of physician reviewers for a stratified randomselection of 400measurements and (2) evaluating its accuracy in a dataset with simulated errors. Results: Of 2 000 595 growth measurements from 280 610 patients 1 to 21 years old, 3.8% of weight and 4.5% of height values were identified as implausible or excluded for other reasons. The proportion excluded varied widely by primary care site. The automated method had a sensitivity of 97% (95% confidence interval [CI], 94- 99%) and a specificity of 90% (95% CI, 85-94%) for identifying implausible values compared to physician judgment, and identified 95% (weight) and 98% (height) of simulated errors. Discussion and Conclusion: This automated, flexible, and validated method for preparing large datasets will facilitate the use of pediatric EHR growth datasets for research.
UR - http://www.scopus.com/inward/record.url?scp=85032943891&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85032943891&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocx037
DO - 10.1093/jamia/ocx037
M3 - Article
C2 - 28453637
AN - SCOPUS:85032943891
SN - 1067-5027
VL - 24
SP - 1080
EP - 1087
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 6
ER -