Urbana, Ill. Artificial intelligence (AI) tools are quickly becoming ubiquitous for cancer researchers for their ability to efficiently process large quantities of data. This data can include multi-omic sequencing that contains information about the genome, epigenome, proteins, and even mutations in oncogenes known to cause cancer.

Deep learning, a type of AI using neural networks, is a state-of-the-art technique suitable for large datasets. However, Jun Song, Cancer Center at Illinois (CCIL) scientist and Founder Professor of Physics, explains that there is a danger of overfitting when the dataset is small.

Overfitting of data occurs when the network has been trained on a specific set of data and subsequent unseen data cannot be reliably predicted because the learned features may not be generalizable.

“For example, you can train a dataset to classify a specific subtype of cancer, but when you try to apply the model to a new dataset with a similar, but different patient cohort, that model might not work very well,” Song said.

To solve this problem, Song’s lab focuses on statistical modelling and interpreting the processes of what the AI algorithms are actually learning. In doing this, Song seeks to understand why certain therapeutic strategies work, and why some do not, for cancers, including melanoma, glioma, and estrogen-receptor positive breast cancer.


In particular, Song lab researchers develop methods for extracting biological features learned by AI models. In the context of cancer research, these features include genetic mutations, of which a small subset may contribute to oncogenesis and the survival and proliferation of cancer cells. This information can help classify different cancer subtypes and even predict survival probabilities.

Earlier this year, Song published a paper in Cell Reports, in collaboration with Northwestern University researchers, which found shared mechanisms for the genesis of leiomyoma, a type of benign uterine tumor. Leiomyomas exhibit an increased probability in African American women and can lead to many minor and major complications including infertility, excessive bleeding, and implantation failure.

The tools Song used in this project were tensor decomposition techniques that can extract the locations of aberrant epigenomic modifications recurring across patients. Instead of looking at a single modification, the technique integrates all the epigenomic modifications profiled in the study.

Image of Jun Song

Image of Jun Song.

“There is no one AI model that is generally applicable to everything. When analyzing biological data, one must explore all options and find the most suitable for the task at hand,” Song said. “My students learn skills in all aspects of statistical AI. Computers do what you tell them to do — they can’t do the thinking for you. So, my students learn the foundational statistical knowledge to use the diverse tools well.”

Song’s dedication to training students in the interface of AI with cancer research extends to encouraging underrepresented students to pursue scientific research. A new program, FUTURE-MINDS-QB, bridges students from master’s programs at Fisk University to several doctoral programs at the University of Illinois Urbana-Champaign, providing training in biomedical data science and quantitative biology, including applications of computing powers such as AI and close interplay with quantitative cancer biology.

The program seeks to ameliorate the pronounced disparities in quantitative sciences and accelerate the completion of PhD by underrepresented trainees in this field at the University of Illinois. Song serves as the Program Director, with CCIL program leader Stephen A. Boppart serving as principal investigator (PI) along with Fisk University PI’s Lee E. Limbird and Lei Qian.

“There is a gap in training students in the applications of AI to cancer research. Given the great strength of the University of Illinois in computer science and quantitative biology, the FUTURE-MINDS-QB program looks to increase the number of underrepresented students who apply state-of-the-art data analysis techniques and other quantitative approaches to cancer research,” Song said.

– Written by the CCIL Communications Team

Jun Song is a Cancer Center at Illinois researcher in the Cancer Measurement Technology and Data Science (CMD) Program, Founder Professor of Physics, and Program Director of the Fisk-UIUC Training of Under-represented Minds in Data Science and Quantitative Biology (FUTURE-MINDS-QB) Program. He is also affiliated with the Carl R. Woese Institute for Genomic Biology. Read more about Song’s research.

The paper, “Epigenomic tensor predicts disease subtypes and reveals constrained tumor evolution,” is available online.

DOI: 10.1016/j.celrep.2021.108927