Header

UZH-Logo

Maintenance Infos

Data-driven information extraction and enrichment of molecular profiling data for cancer cell lines


Smith, Ellery; Paloots, Rahel; Giagkos, Dimitris; Baudis, Michael; Stockinger, Kurt (2024). Data-driven information extraction and enrichment of molecular profiling data for cancer cell lines. Bioinformatics advances, 4(1):vbae045.

Abstract

Motivation
With the proliferation of research means and computational methodologies, published biomedical literature is growing exponentially in numbers and volume. Cancer cell lines are frequently used models in biological and medical research that are currently applied for a wide range of purposes, from studies of cellular mechanisms to drug development, which has led to a wealth of related data and publications. Sifting through large quantities of text to gather relevant information on cell lines of interest is tedious and extremely slow when performed by humans. Hence, novel computational information extraction and correlation mechanisms are required to boost meaningful knowledge extraction.

Results
In this work, we present the design, implementation, and application of a novel data extraction and exploration system. This system extracts deep semantic relations between textual entities from scientific literature to enrich existing structured clinical data concerning cancer cell lines. We introduce a new public data exploration portal, which enables automatic linking of genomic copy number variants plots with ranked, related entities such as affected genes. Each relation is accompanied by literature-derived evidences, allowing for deep, yet rapid, literature search, using existing structured data as a springboard.

Availability and implementation
Our system is publicly available on the web at https://cancercelllines.org.

Abstract

Motivation
With the proliferation of research means and computational methodologies, published biomedical literature is growing exponentially in numbers and volume. Cancer cell lines are frequently used models in biological and medical research that are currently applied for a wide range of purposes, from studies of cellular mechanisms to drug development, which has led to a wealth of related data and publications. Sifting through large quantities of text to gather relevant information on cell lines of interest is tedious and extremely slow when performed by humans. Hence, novel computational information extraction and correlation mechanisms are required to boost meaningful knowledge extraction.

Results
In this work, we present the design, implementation, and application of a novel data extraction and exploration system. This system extracts deep semantic relations between textual entities from scientific literature to enrich existing structured clinical data concerning cancer cell lines. We introduce a new public data exploration portal, which enables automatic linking of genomic copy number variants plots with ranked, related entities such as affected genes. Each relation is accompanied by literature-derived evidences, allowing for deep, yet rapid, literature search, using existing structured data as a springboard.

Availability and implementation
Our system is publicly available on the web at https://cancercelllines.org.

Statistics

Citations

Dimensions.ai Metrics

Altmetrics

Downloads

1 download since deposited on 25 Apr 2024
1 download since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:07 Faculty of Science > Institute of Molecular Life Sciences
Dewey Decimal Classification:570 Life sciences; biology
Scopus Subject Areas:Life Sciences > Structural Biology
Life Sciences > Molecular Biology
Life Sciences > Genetics
Physical Sciences > Computer Science Applications
Uncontrolled Keywords:Computer Science Applications, Genetics, Molecular Biology, Structural Biology
Language:English
Date:16 March 2024
Deposited On:25 Apr 2024 07:26
Last Modified:30 Jun 2024 01:41
Publisher:Oxford University Press
ISSN:2635-0041
OA Status:Gold
Free access at:Publisher DOI. An embargo period may apply.
Publisher DOI:https://doi.org/10.1093/bioadv/vbae045
PubMed ID:38560553
Project Information:
  • : FunderH2020
  • : Grant ID863410
  • : Project TitleINODE - INODE - Intelligent Open Data Exploration
  • Content: Published Version
  • Language: English
  • Licence: Creative Commons: Attribution 4.0 International (CC BY 4.0)