The complexities of cancer care create significant challenges for the extraction of information for retrospective research. As patients progress through diagnosis to treatment and subsequent monitoring, multiple encounters with varying specialists generate a rich set of clinical notes. For patients undergoing lengthy or multimodal (e.g., a combination of surgery, chemotherapy, and radiotherapy) treatment, hundreds or thousands of notes can be generated along the cancer journey. Review of these notes can be a laborious interpretive challenge, often involving many hours of time for medical professionals who must read through collections of notes to prepare summarized abstractions in spreadsheets or databases. This process is also brittle, as reviews conducted for one study may miss items of potential interest to subsequent studies. Although ad hoc solutions such as the oncologic history have spontaneously developed as information collection devices, they are not necessarily universal, accurate, or complete.
The Cancer Deep Phenotype Extraction (DeepPhe) project is developing informatics solutions to overcome these inefficiencies. Unlike prior work applying Natural Language Processing (NLP) techniques to individual cancer documents, DeepPhe combines details from multiple documents to form longitudinal summaries. Classic and state-of-the-art NLP techniques for extracting individual concepts are used alongside a rich information model and techniques for care episode classification, and Ontology-Based Summarization for cross-document co-resolution, and to summarize diagnoses, treatments, responses and temporal relationships as needed to support retrospective research. We expect that DeepPhe will be used either by clinicians or researchers with appropriate permissions to read notes de-identified by honest brokers or through other appropriate means.
The Cancer Deep Phenotype Extraction for the Cancer Registry (DeepPhe-CR) project strives to use Natural Language Processing (NLP) and Ontology Based Summarization (OBS) to extract key cancer attributes from clinical notes, and to use those attributes to increase the efficacy and efficiency of cancer registry data abstraction processes. Achieving this goal requires the development of clear architectures and interfaces allowing submission of documents to DeepPhe-CR and retrieval of relevant results, to be incorporated into cancer registry workflows.