Department of Machine Learning, Moffitt Cancer Center & Research Institute,
Electrical Engineering Department, University of South Florida.
Department of Machine Learning, Moffitt Cancer Center & Research Institute,
Electrical Engineering Department, University of South Florida.
Departments of Machine Learning and Neuro-Oncology, Moffitt Cancer Center & Research Institute,
Electrical Engineering Department and Morsani College of Medicine, University of South Florida
(coming soon)
Cancer datasets contain various data modalities recorded at different scales and resolutions in space and time. The goal of this project is to predict cancer recurrence, survival, and distant metastasis for lung, gastrointestinal (GI), and head and neck (HNSCC) cancers using radiological images, pathology slides, molecular data, and electronic health records. Short for Bayesian Hierarchical Graph Neural Networks, BRIGHT models can learn from multimodal, multi-scale, and heterogeneous datasets to predict clinical outcomes in cancer care settings.
BRIGHT framework for multimodal learning on cancer data. The project involves four learning components (hierarchical, Bayesian, hybrid, and graph structure learning), undertaken in six sequential steps or tasks (pre-training, structure learning, task-specific supervised learning, uncertainty propagation, fine-tuning, and longitudinal retrospective analysis).
BRIGHT models address the multimodal learning challenges by combining graph neural networks (GNNs) with other deep learning models using a hierarchical and Bayesian approach, as shown in Fig.6. There are four architectural components of the project that will be undertaken in six sequential tasks. Hierarchical component integrates multi-scale data, the Bayesian framework introduces probability distribution functions over learnable parameters to improve model robustness and explainability, the graph learning handles data heterogeneity, missing samples, and missing modalities challenges, and hybrid learning component uses self-supervised and supervised training scheme to extract and learn contextualized features in the input data. These components will be implemented as a framework consisting of six sequential steps or tasks of the project, given below.
Modality-Specific Model Selection:
We investigated various modality-specific models for our data modalities, radiology, pathology, -omics, and EHR. Our criterion for evaluating the representativeness or optimality of these modality-specific models includes their predictive performance against the ground-truth data of OS and PFS. As a result of preliminary analysis, we used (1) two flavors of Robust and Efficient MEDical Imaging with Self-supervision (REMEDIS) models for the radiology and histopathology images, (2) GatorTron for the EHR data (including clinical notes, lab tests, and radiology/pathology/surgery reports), and (3) Self-Normalizing Networks (SNNs) for -omics data. Our experiments using pre-trained versions of these models showed that REMEDIS and GatorTron extract optimal embeddings without any fine-tuning/transfer learning for squamous cell carcinoma. On the other hand, SNNs always required fine-tuning, potentially linked to the complexity and variability of the -omics datasets.
Prediction of OS in Lung Cancer Patients:
The preliminary experiments focused on predicting OS for lung squamous cell carcinoma patients using a BRIGHT model and a host of multimodal and uni-modal MLPs. The data were collected at Moffitt (103 patients) and consisted of EHR data (including age at diagnosis, gender, ethnicity, race, smoking status, year of diagnosis, vital status (alive/dead), and tumor cellularity), pathology images, and -omics data (included RNA-Seq expression and protein expression). We generated pathology embeddings using REMEDIS and EHR embeddings using GatorTron, without fine-tuning the models. We trained SNNs for -omics data to generate RNA-Seq expression and protein expression embeddings. We used the same embedding size for all modalities, d=32. Model evaluation was done using C-index and 10-fold cross-validation.
Convergence of maximum data resolutions across varying time occurrences can accrue remarkable discoveries about the disease indiscernible in individual considerations. BRIGHT project solution aims to converge the entire spectrum of the disease and understand the patient’s genetic, physiological, and psychosocial circumstances in a unified framework. Our framework will help the patient in cancer prevention, early detection, and treatment through informed clinical trials and oncology practices in personalized settings.
@misc{bright-website,
title = {{Bayesian Hierarchical Graph Neural Networks}},
year = {2023},
author = {{Asim Waqas, Aakash Tripathi, Ghulam Rasool}},
note = {Available at: \url{https://lab.moffitt.org/rasool/bright/}}
}