Turning a Research Challenge into a Computational Solution

Cancer researchers aren’t short on data, but they are often overwhelmed by it. The Cancer Genome Atlas (TCGA) contains more than 2.5 petabytes of genomic, clinical and imaging data across 33 cancer types. For those in the medical and research communities—especially without formal computational training—making practical use of that data remains a significant challenge.

As a result, a valuable resource remains underutilized.

Dr. Jane Shen-Gunther, a gynecologic oncologist and researcher with expertise in computational genomics, encountered these limitations firsthand. Drawing on her deep expertise, she recognized the pressing need for a more accessible, unified method to access and analyze genomic and imaging data from multiple sources.

“The TCGA, The Cancer Imaging Archive (TCIA) and the Genomic Data Commons (GDC) are essentially three goldmines of cancer data,” she said. “However, the data have been underutilized by researchers due to data access barriers. I wanted to break down this barrier.”

Shen-Gunther partnered with Wolfram Consulting Group to design the TCGADataTool, a Wolfram Language–based interface that simplifies access to TCGA cancer datasets.


Designing a Research-Ready Tool with Wolfram Consulting

Drawing on prior experience with the consulting team, Shen-Gunther worked with Wolfram developers to design a custom paclet to streamline data access and clinical research workflows while remaining usable for individuals without extensive programming experience.

Shen-Gunther explained, “Mathematica can handle almost all file types, so this was vital for the success of the project. It also has advanced machine learning functions (predictive modeling), statistical functions that can analyze, visualize, animate and model the data.”

TCGA user interface

Interface and Data Retrieval — The guided interface (TCGADataToolUserInterface) allows users to select datasets, review available properties and launch key functions without writing code. Built-in routines support batch retrieval of genomic data from the GDC and imaging data from TCIA, including scans and histological slides associated with TCGA studies.

Data Preparation and Modeling — Processing functions like cleanRawData and pullDataSlice prepare structured inputs for analysis by standardizing formats and isolating relevant variables. Modeling tools enable users to identify potential predictors, visualize candidate features, generate design matrices and build models entirely within the Wolfram environment.

TCGA data visualization tools

Visualization Tools — The paclet includes support for swimmer plots, overall survival plots and progression-free survival plots, helping researchers visualize clinical outcomes and stratify cases by disease progression or treatment response.

By combining domain-specific requirements with Wolfram Language’s built-in support for data processing and analysis, the paclet makes it possible for researchers without programming experience to work with genomic and imaging datasets that would otherwise require advanced technical skills. This shift enables researchers without programming experience to work directly with TCGA data using computational methods previously out of reach.


Moving from Complexity to Capability

The TCGADataTool was developed in response to specific challenges Shen-Gunther encountered while working with genomic and imaging data. She emphasized the importance of designing a tool that could support clinical research workflows without requiring a background in programming—and saw the result as both accessible and technically robust.

By designing a tool that supports clinical research without requiring fluency in code, Wolfram Consulting Group delivered a solution tailored to the needs of domain experts—extending the reach of high-throughput data into contexts where traditional software workflows often fall short.

“The Wolfram team carefully and thoughtfully developed technical solutions at every step and created a beautiful, easily accessible product,” said Shen-Gunther. “Their expertise in data science and user-interface development was essential to the success of the paclet.”

Rather than offering a generalized platform, Wolfram Consulting delivered a focused, researcher-specific application that addresses both technical complexity and day-to-day usability. That model—identifying key obstacles, understanding the research workflow and delivering a targeted solution—can be extended to other biomedical contexts where access to large datasets is essential but often limited by tool complexity.

If your team is facing similar data access or analysis challenges, contact Wolfram Consulting Group for a solution tailored to your research environment.