Background & Aims
Electronic Health Records (EHR) data has promised a revolution in understanding how clinical conditions relate to other conditions as well as to many other factors such as social determinants of health. Central to the use of EHR data for research, is the use of clinical coding systems such as the International Classification of Disorders, version 10 (ICD-10) to classify conditions and states of health. These codes are use singly or in related groups to form what are termed ‘condition definition files’ (CDFs), which are of interest primarily for the role that these definitions play in creating cohorts for research. It has been proposed that researchers studying low back pain (LBP) should use a group of multiple codes to study low back pain in EHR data – we asked the question whether there is large data evidence to further support this recommendation or whether code disaggregation might be appropriate for some research questions.
Methods
U.S. Center for Medicare Services 5% standard analytical files from 2017 (2017 CMS-5) were used for this study as previously reported. Data were extracted from the Master Beneficiary Summary and Carrier files to construct working files comprised of a beneficiary identifier, demographics, and all ICD-10 codes included in one or more claims for that beneficiary during the study year. Beneficiaries who were not alive for the entire year were excluded from study as death is a potential competing risk. Those included were over 65 years of age and 12-month participants in Medicare part B but not Medicare part C. Data were segmented into cohorts affected or not affected by specific ICD-10 coded conditions, e.g, low back pain, sciatica without LBP, etc., and diagnostic rate estimates for all ICD-10 codes used in the sample were determined, which in the aggregated formed a ‘clinical profile’ for each condition. The clinical profiles were compared with correlation and regression analysis.
Results
1.4 million beneficiary records were included in the study. For the LBP-like conditions, i.e., sciatica, lumbago with sciatica, and the dorsalgias, approximately half of the patients were also diagnosed with LBP, overlap between the ‘LBP-like’ conditions are far less, e.g. 8-2-16%. To avoid confounding, we eliminated the beneficiaries with LBP from each of the LBP-like condition cohorts to create sciatica without the diagnosis of LBP. We then compared the M54.5, LBP clinical profiles, for males and females separately, to each of sciatica, M54.3-, lumbago with sciatica, M54.4-, dorsalgia, other, M54.89, and Dorsalgia, unspecified, M54.9, all with LBP excluded. This demonstrated that the clinical profiles of these LBP-like conditions while correlated with the LBP clinical profiles, were not identical. R2 values for these conditions compared to LBP ranged from 0.781 to 0.963 meaning that the amount of ‘experimental error’ due to code aggregation could range as high as 22%.
Conclusions
Pain scientists utilizing ICD-10 codes for ascertainment of specific cohorts must know that condition definition files have a major impact on cohort composition. For low back pain, depending on the nature of the research question, it may be necessary to use single ICD-10 codes for some studies whereas other studies may require code aggregation. An example of a study where code aggregation would be appropriate would be an investigation of condition prevalence where the intention is to capture a comprehensive appraisal that includes common variants. An example where single codes are appropriate is a question that needs as little variance as possible in the study population. Our data indicate that low back pain defined using the single principal ICD-10 code does not look just like the related conditions in terms of associated comorbidities. Taking the U.S. data sample as a whole, we find that large data appraisals of data variability provide important perspectives on data science methods.
References
1. Hoy DG, Smith E, Cross M, Sanchez-Riera L, Buchbinder R, Blyth FM, Brooks P, Woolf AD, Osborne RH, Fransen M, Driscoll T, Vos T, Blore JD, Murray C, Johns N, Naghavi M, Carnahan E, March LM. The global burden of musculoskeletal conditions for 2010: an overview of methods. Ann Rheum Dis. 2014 Jun;73(6):982-9. doi: 10.1136/annrheumdis-2013-204344. Epub 2014 Feb 18. PMID: 24550172.
2. Schrepf A, Phan V, Clemens JQ, Maixner W, Hanauer D, Williams DA. ICD-10 Codes for the Study of Chronic Overlapping Pain Conditions in Administrative Databases. J Pain. 2020 Jan-Feb;21(1-2):59-70. doi: 10.1016/j.jpain.2019.05.007. Epub 2019 May 30. PMID: 31154033; PMCID: PMC8177096.
3. Hogans B, Siaton B, Sorkin J. Diagnostic rate estimation from Medicare records: Dependence on claim numbers and latent clinical features. J Biomed Inform. 2023 Sep;145:104463. doi: 10.1016/j.jbi.2023.104463. Epub 2023 Jul 28. PMID: 37517509; PMCID: PMC10576984.