St. Jude Children's Research Hospital scientists have built a cloud-based, data-sharing ecosystem that is a model for how to accelerate genomic data-sharing to advance treatment and improve long-term outcomes for pediatric cancers. The platform, St. Jude Cloud, is detailed today in the journal Cancer Discovery.
St. Jude Cloud includes more than 12,000 whole genomes plus whole exome and whole transcriptome data from more than 10,000 childhood cancer patients and long-term survivors. The platform also includes genomic data from more than 800 young people with sickle cell disease. Additional genomic data is added regularly, rather than withheld until publication, from the St. Jude clinical genomics program.
"St. Jude Cloud is a treasure trove of data for the global scientific community, and its data-sharing ecosystem removes barriers to discovery by researchers in that community," said Jinghui Zhang, Ph.D., chair of the St. Jude Department of Computational Biology.
Along with the 1.25 petabytes of harmonized raw and published genomic data, St. Jude Cloud features three interconnected applications that let users seamlessly explore, analyze and visualize data. The platform launched in 2018 to accelerate data-sharing and research on pediatric cancer, which remains the leading U.S. cause of childhood death by disease.
St. Jude scientists built St. Jude Cloud in partnership with DNAnexus and Microsoft. Zhang; James R. Downing, M.D., St. Jude president and CEO; and Keith Perry, St. Jude chief information officer, are corresponding authors of the research.
The publication comes as data-sharing models shift from centralized data centers to a more decentralized network serving specific research communities. "We feel our experience building St. Jude Cloud, including the challenges involved, is beneficial in guiding the discussions going forward," Zhang said.
St. Jude Cloud: Growth
St. Jude Cloud has grown in its size, scope and function since it debuted with 5,000 whole genomes and whole exomes plus 1,200 RNA-seq datasets.
"Data sharing is especially important for pediatric cancer, where more than half of patients have rare tumors driven by distinct mutations," Zhang said. "These tumors are understudied because it is difficult to accumulate the necessary patient tumor samples." Data-sharing platforms offer a promising solution.
The platform now attracts about 10,000 unique monthly users worldwide. Visitors can browse and access raw St. Jude pediatric cancer data and explore published data from St. Jude and institutions in the U.S., Europe and China. Investigators can also upload their own data to analyze and explore alongside St. Jude Cloud data sets.
"The goal is to remove barriers and enable researchers with little to no formal computational training to perform sophisticated genomic analysis," said Clay McLeod of St. Jude Computational Biology. He and Alexander Gout, Ph.D., of Computational Biology, are the first authors.
St. Jude Cloud: Design
Three applications provide the infrastructure.
· Genomics Platform gives registered users access to whole genome sequencing data from St. Jude research studies and clinical genomic programs. Data are harmonized, a process that integrates data from different sources so that researchers can perform innovative analysis on the Cloud without downloading the data. The platform also includes data analysis pipelines and validated tools to classify cancer risk associated with germline DNA variations, to predict gene-fusion mutations, to detect variations in non-coding genomic regions and more.
· Pediatric Cancer Knowledgebase (PeCan) provides researchers with open access to annotated and curated published research. The information includes mutations identified at cancer diagnosis and relapse, cancer-causing germline mutations and gene expression. Data can be explored using ProteinPaint, an interactive visualization tool created at St. Jude. ProteinPaint allows users to rapidly navigate through the genome and identify genetic changes linked to cancer. Another resource, PeCanPIE, helps researchers sift through millions of genetic variants to identify those involved in inherited cancers. The app integrates published data from: St. Jude-Washington University Pediatric Cancer Genome Project, the National Cancer Institute's Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative, German Cancer Research Center, Shanghai Children's Medical Center, and University of Texas Southwestern Medical Center.
· Visualization Community helps researchers integrate and visualize a variety of genomic data to better understand the molecular mechanisms that cause and drive cancer. Users can explore published pediatric cancer landscape maps that integrate genomic and epigenetic data. Users can also explore clinical information for insight into cancer subtypes or conduct pan-cancer analysis. Also available are ProteinPaint and GenomePaint, tools researchers can use to visualize gene expression, genomic variants and tumor samples across the genome or just in protein-coding regions. Visualizations of additional datasets are also available.
To demonstrate the power of the St. Jude Cloud ecosystem, the developers used the platform to classify 135 childhood cancer subtypes based on gene expression. St. Jude Cloud lets researchers use the platform's data and gene expression maps to classify and study tumors as well as upload tissue samples for analysis.
Zhang and her colleagues have also used the platform to study mutation rates and mutation signatures (patterns) in 35 pediatric subtypes. The mutational signatures included ones associated with ultraviolet radiation and cancer therapy.
The other authors are Xin Zhou, Andrew Thrasher, Delaram Rahbarinia, Samuel Brady, Michael Macias, Kirby Birch, David Finkelstein, Jobin Sunny, Rahul Mudunuri, Brent Orr, Madison Treadway, Arthur Chiao, Andrew Swistak, Stephanie Wiggins, Scott Foy, Jian Wang, Edgar Sioson, Shuoguo Wang, J. Robert Michael, Yu Liu, Xiaotu Ma, Aman Patel, Michael Edmonson, Mark Wilkinson, Andrew Frantz, Ti-Cheng Chang, Liqing Tian, Shaohua Lei, Irina McGuire, Nedra Robison, Darrell Gentry, Xing Tang, Lance Palmer, Gang Wu, Ed Suh, Leigh Tanner, James McMurry, Matthew Lear, Alberto Pappo, Zhaoming Wang, Carmen Wilson, Yong Cheng, Mitch Weiss, Gregory Armstrong, Leslie Robison, Yutaka Yasui, Kim Nichols, David Ellison, Charles Mullighan, Suzanne Baker, Michael Dyer, Scott Newman and Michael Rusch, of St. Jude; Bob Davidson, Tracy Ard, Chaitanya Bangur and Geralyn Miller, of Microsoft Research; S. M. Ashiqul Islam and Ludmil Alexandrov, of Moores Cancer Center University of California San Diego; Christopher Meyer, Naina Thangaraj, Pamella Tater, Vijay Kandali, Singer Ma, Tuan Nguyen, Omar Serang and Richard Daly, of DNAnexus; and Soheil Meshinchi of Fred Hutchinson Cancer Research Center, University of Washington.
The research was funded as a St. Jude Blue Sky Initiative, an institutional program that supports transformational projects; Microsoft AI for Good program, and ALSAC, the St. Jude fund-raising and awareness organization.