Author: Jovana Drinjakovic

Donnelly Centre researchers have developed a deep learning algorithm that can track proteins, to help reveal what makes cells healthy and what goes wrong in disease.

Yeast cells (purple) with DNA-containing nuclei (pink) and a protein (green) that resides in the cell’s waste compartment or vacuole. “We can learn so much by looking at images of cells: how does the protein look under normal conditions and do they look different in cells that carry genetic mutations or when we expose cells to drugs or other chemical reagents? People have tried to manually assess what’s going on with their data but that takes a lot of time,” says Benjamin Grys, a graduate student in molecular genetics and a co-author on the study.

Dubbed DeepLoc, the algorithm can recognize patterns in the cell made by proteins better and much faster than the human eye or previous computer vision-based approaches. In the cover story of the latest issue of Molecular Systems Biology, teams led by Professors Brenda Andrews and Charles Boone of the Donnelly Centre and the Department of Molecular Genetics, also describe DeepLoc’s ability to process images from other labs, illustrating its potential for wider use.

From self-driving cars to computers that can diagnose cancer, artificial intelligence (AI) is shaping the world in ways that are hard to predict, but for cell biologists, the change could not come soon enough. Thanks to new and fully automated microscopes, scientists can collect reams of data faster than they can analyze it.                                 

“Right now, it only takes days to weeks to acquire images of cells and months to years to analyze them. Deep learning will ultimately bring the timescale of this analysis down to the same timescale as the experiments,” says Oren Kraus, a lead co-author on the paper and a graduate student co-supervised by Andrews and Professor Brendan Frey of the Donnelly Centre and the Department of Electrical and Computer Engineering. Andrews, Boone and Frey are also Senior Fellows at the Canadian Institute for Advanced Research.

Similar to other types of AI, in which computers learn to recognize patterns in data, DeepLoc was trained to recognize diverse shapes made by glowing proteins—labeled a fluorescent tag that makes them visible—in cells. But unlike computer vision that requires detailed instructions, DeepLoc learns directly from image pixel data, making it more accurate and faster.

"Deep learning will ultimately bring the timescale of this analysis down to the same timescale as the experiments" - Oren Kraus

Grys and Kraus trained DeepLoc on the teams’ previously published data that shows an area in the cell occupied by more than 4,000 yeast proteins—three quarters of all proteins in yeast. This dataset remains the most complete map showing exact position for a vast majority of proteins in any cell. When it was first released in 2015, the analysis was done with a complex computer vision and machine learning pipeline that took months to complete. DeepLoc crunched the data in a matter of hours.

DeepLoc was able to spot subtle differences between similar images. The initial analysis identified 15 different classes of proteins, each representing distinct neighbourhoods in the cell; DeepLoc identified 22 classes. It was also able to sort cells whose shape changed due to a hormone treatment, a task that the previous pipeline couldn’t complete.