Background and purpose Manual annotation and categorization of non-standardized text (“free-text”) of drug orders entered into electronic health records is a labor-intensive task. However, standardization is required for drug order analyses and has implications for clinical decision support. Machine learning could help to speed up manual labelling efforts. The objective of this study was to analyze the performance of deep machine learning methods to annotate non-standardized text of drug order entries with their therapeutically active ingredients.
Materials and methods The data consisted of drug orders entered 8/2009-4/2014 into the electronic health records of inpatients at a large tertiary care academic medical center. We manually annotated the most frequent order entry patterns with the active ingredient they contain (e.g. “Prograf”⟵“Tacrolimus”). We heuristically included additional orders by means of character sequence comparisons to augment the training dataset. Finally, we trained and employed character-level recurrent deep neural networks to classify non-standardized text of drug order entries according to their active ingredients.
Results A total of 26,611 distinct order patterns were considered in our study, of which the top 7.6% (2028) had been annotated with one of 558 distinct ingredients, leaving 24,583 unlabeled observations. Character-level recurrent deep neural networks achieved a Mean Reciprocal Rank (MRR) of 98% and outperformed the best representative baseline, a trigram-based Support Vector Machine, by 2 percentage points.
Conclusion Character-level recurrent deep neural networks can be used to map the active ingredient to non-standardized text of drug order entries, outperforming other representative techniques. While machine learning might help to facilitate categorization tasks, still a considerable amount of manual labelling and reviewing work is required to train such systems.