Header

UZH-Logo

Maintenance Infos

Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review


Sedlakova, Jana; Daniore, Paola; Horn Wintsch, Andrea; Wolf, Markus; Stanikić, Mina; Haag, Christina; Sieber, Chloé; Schneider, Gerold; Staub, Kaspar; Alois Ettlin, Dominik; Grübner, Oliver; Rinaldi, Fabio; von Wyl, Viktor (2023). Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review. PLOS Digital Health, 2(10):e0000347.

Abstract

Digital data play an increasingly important role in advancing health research and care. However, most digital data in healthcare are in an unstructured and often not readily accessible format for research. Unstructured data are often found in a format that lacks standardization and needs significant preprocessing and feature extraction efforts. This poses challenges when combining such data with other data sources to enhance the existing knowledge base, which we refer to as digital unstructured data enrichment. Overcoming these methodological challenges requires significant resources and may limit the ability to fully leverage their potential for advancing health research and, ultimately, prevention, and patient care delivery. While prevalent challenges associated with unstructured data use in health research are widely reported across literature, a comprehensive interdisciplinary summary of such challenges and possible solutions to facilitate their use in combination with structured data sources is missing. In this study, we report findings from a systematic narrative review on the seven most prevalent challenge areas connected with the digital unstructured data enrichment in the fields of cardiology, neurology and mental health, along with possible solutions to address these challenges. Based on these findings, we developed a checklist that follows the standard data flow in health research studies. This checklist aims to provide initial systematic guidance to inform early planning and feasibility assessments for health research studies aiming combining unstructured data with existing data sources. Overall, the generality of reported unstructured data enrichment methods in the studies included in this review call for more systematic reporting of such methods to achieve greater reproducibility in future studies.

Abstract

Digital data play an increasingly important role in advancing health research and care. However, most digital data in healthcare are in an unstructured and often not readily accessible format for research. Unstructured data are often found in a format that lacks standardization and needs significant preprocessing and feature extraction efforts. This poses challenges when combining such data with other data sources to enhance the existing knowledge base, which we refer to as digital unstructured data enrichment. Overcoming these methodological challenges requires significant resources and may limit the ability to fully leverage their potential for advancing health research and, ultimately, prevention, and patient care delivery. While prevalent challenges associated with unstructured data use in health research are widely reported across literature, a comprehensive interdisciplinary summary of such challenges and possible solutions to facilitate their use in combination with structured data sources is missing. In this study, we report findings from a systematic narrative review on the seven most prevalent challenge areas connected with the digital unstructured data enrichment in the fields of cardiology, neurology and mental health, along with possible solutions to address these challenges. Based on these findings, we developed a checklist that follows the standard data flow in health research studies. This checklist aims to provide initial systematic guidance to inform early planning and feasibility assessments for health research studies aiming combining unstructured data with existing data sources. Overall, the generality of reported unstructured data enrichment methods in the studies included in this review call for more systematic reporting of such methods to achieve greater reproducibility in future studies.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

6 downloads since deposited on 03 Nov 2023
6 downloads since 12 months
Detailed statistics

Additional indexing

Contributors:University of Zurich Digital Society Initiative (UZH-DSI) Health Community
Item Type:Journal Article, not_refereed, original work
Communities & Collections:04 Faculty of Medicine > Epidemiology, Biostatistics and Prevention Institute (EBPI)
07 Faculty of Science > Institute of Geography
04 Faculty of Medicine > Institute of Evolutionary Medicine
04 Faculty of Medicine > Institute of Implementation Science in Health Care
06 Faculty of Arts > Zurich Center for Linguistics
06 Faculty of Arts > Linguistic Research Infrastructure (LiRI)
04 Faculty of Medicine > Institute of Biomedical Ethics and History of Medicine
08 Research Priority Programs > Digital Society Initiative
Dewey Decimal Classification:610 Medicine & health
910 Geography & travel
Language:English
Date:11 October 2023
Deposited On:03 Nov 2023 11:22
Last Modified:25 Mar 2024 07:42
Publisher:Public Library of Science (PLoS)
ISSN:2767-3170
Additional Information:Bereits als Working Paper in medRvix No. 22278137 erschienen: https://doi.org/10.1101/2022.07.28.22278137
OA Status:Gold
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1371/journal.pdig.0000347
Related URLs:https://www.zora.uzh.ch/id/eprint/219781/
https://doi.org/10.1101/2022.07.28.22278137
PubMed ID:37819910
Project Information:
  • : FunderDigital Society Initiative, University of Zurich, Switzerland
  • : Grant ID
  • : Project Title
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)