Photo of Saurabh Sinha.

Urbana, Ill. In bioinformatics, machine learning (ML) tools are used to solve problems in molecular biology and genetics. In healthy cells, genes — the carriers of hereditary information — are switched “on” or “off” to carry out specific tasks. Bioinformatics researchers can decode DNA using artificial intelligence (AI) to understand why some of these switches occur inappropriately, leading to disease.

At the Cancer Center at Illinois (CCIL), one such expert is Saurabh Sinha, Illinois professor of computer science. Although his initial research in the field of computational genomics was applied to embryonic development, over time Sinha found himself drawn to opportunities and challenges in cancer research.

Sinha’s longer-term work in ML and cancer began when the National Institutes of Health (NIH) awarded a Center of Excellence to the University of Illinois, with the Mayo Clinic as partner, as a part of the broader Big Data to Knowledge (BD2K) initiative. At the time of its inception, Sinha played a key role as co-director and research lead. The BD2K center would eventually deliver its final product: KnowEnG (pronounced “knowing”), a web platform and suite of multiple analytical tools, including ML and data mining, that provides heavy-duty bioinformatics analysis on the cloud.

Now, Sinha personally focuses on developing AI tools for testing cancer cell response to treatments, but his group is also interested in biophysical modelling. Whereas basic machine learning attempts to teach a computer to accurately perform a task, biophysical modelling instructs the computer to focus on human understanding codified into mathematical models to analyze data and provide a physical basis for scientific predictions.


The BD2K center additionally led to the development of InPheRNo, an AI tool that computationally recovers networks from transcriptomics data and looks for effects that differ between these networks.

“You could, for example, use InPherRNo to understand subtypes of a cancer, and the differences between them, like with the different types of breast cancer,” Sinha said. “It would also be useful for categorizing subtypes of cancer to better treat patients. Mechanistic researchers (like me) always want to know what control mechanisms make one subtype different from the other.”

Image of Saurabh Sinha presenting

Sinha is also researching colorectal cancer in an ongoing collaboration with Steven M. Offer (PhD), assistant professor of pharmacology at the Mayo Clinic. The scientists are interested in better understanding why these cells become so aggressive and metastasize by working with cell lines at various stages of invasiveness.

“We know that the switching on and off of genes changes these cells to make them more aggressive, but the question is: which genes are responsible for such changes?” Sinha said.

Offer’s lab measures the sequences of DNA and provides the multiomics data analyzing different stages of aggressiveness, and Sinha’s lab is supplying the computational expertise, assimilating the data to analyze and find the responsible culprit.

“We integrate different views of the same thing to build a holistic picture and compare between the different states. In this case, the AI used probabilistic modelling to integrate the different data and rank the molecules by how likely they are to cause the progression of these states. Then, we took the highest ranked molecules, knocked them out, and observed whether it made a difference in the cells,” Sinha said.

Further, Sinha is working with fellow CCIL members Hee-Sun Han and Prasanth Kumar V. Kannanganattu in a CCIL seed grant-funded project that looks to create a spatially resolved interactome map of breast cancer progression. Briefly, the project is deriving single cell transcriptomics: measuring the activity level, or “on” and “off” state, of every gene within a biological sample. However, the research team is taking this work a step further and developing spatial transcriptomics, named 2020 Method of the Year by Nature Methods, which tells the researchers the exact spatial co-ordinates of the cells relative to the position of other cells.

“I’m excited to exploit the incredible revolution happening in machine learning today and adapt it to advance the human understanding of biology. Building black box models is great, but my personal excitement is to marry these tools with the language of science,” Sinha said.


– Written by the CCIL Communications Team

Saurabh Sinha is a Founder Professor in computer science and Willett Faculty Scholar. Sinha is affiliated with the Carl R. Woese Institute for Genomic Biology and Carle Illinois College of Medicine. Click here to read more about Sinha’s research.

Find out more about CCIL seed grants.