22 March 2022

Deep learning guided protein design: The guide of the guide

RESEARCH

A new ISBUC collaboration project aims to develop a machine learning platform that can predict protein structure, function and dynamics.

nikos hatzakis wouter boomsma
Image by Mike Mackenzie

A problem with specificity

Machine learning can do a lot. Feed Alphafold 2 any protein sequence and it will produce the protein’s structure with a remarkable level of accuracy. To a limited extent, it will even tell you what that protein’s function is. ‘Alphafold 2 is extremely good at extracting the structure from a protein sequence. Based on that, we can say this is a CRISPR, that is a lipase’ explains Associate Professor Nikos Hatzakis from the Department of Chemistry. That is, we can produce a generic function. ‘But Alphahold will give very low information on how it functions: its specificity, its turnover rate’. For modelling purposes, this leaves much to be desired, argues Wouter Boomsma, Associate Professor in the Department of Computer Science. To be truly useful, a model needs to include specific information about how a variation in a protein’s sequence will effect its dynamics and function. Nothing can do this at the moment. As such, today’s protein engineers often find themselves designing in the dark. 

For example, lipases are a common ingredient found in many detergents where they function to break down fat molecules. It is obviously advantageous if lipases are thermostable so that they can move around in high temperatures without falling apart. Traditionally, to increase the thermostability of a lipase, an engineer would start playing around with the enzyme’s DNA sequence, making mutation after mutation until they eventually strike upon the right mutation to have the desired effect. It is not completely blind, ‘they have some idea of where to start playing around with the sequence, but it is design by brute force and a hideous amount of work’ says Nikos.

An ISBUC collaboration is born

A few years ago, Nikos attended an ISBUC Principal Investigator Day, where he heard Wouter give an introduction to his research. Wouter has spent his career developing machine learning models of the relationship between a protein’s sequence and its function. ‘I saw his presentation and knew I need to talk to this guy’ recalls Nikos, who has spent his career studying the relationship between a protein’s dynamics and its function via single-molecule studies such as FRET and turnover-studies. ‘We said five words to each other and we knew that there was a perfect match of what he does and what I do’ says Nikos. Straight after, the two researchers organized a one-day workshop where the pair mapped out the big ideas which would drive their collaboration forward over the coming years: the research questions, their approach. From day one, their goal was the same: they wanted to produce a machine learning platform that could predict a protein’s structure, function and dynamics. Needless to say, it is an ambitious goal. ‘Ten years ago, this would not have been possible but in the last few years we have reached the technical maturity to actually be able to do this’ says Nikos.  

Nikos and Wouter spent time up-front mapping out how they would realise this goal. They worked out what data and partners they would need, talked about what funding instruments they might apply for. And soon after their one day workshop, the first concrete step was taken. Nikos and Wouter organized for Jacob Kæstel-Hansen to complete his Masters project at Novozymes. This research project won Jacob the 2021 Junior ISBUC Flash Talk and secured him a position as a PhD student in Nikos’ lab. It also set the foundation for the present project, which has now been funded by a prestigious Villum Synergy grant.

How does their approach differ from current methods?

The key difference between their proposed approach and previous machine learning platforms will be the inclusion of data from protein dynamics that will be acquired by single-molecule experiments. It is an approach that they have already successfully used to develop a machine learning model that can identify potentially malfunctional proteins based on their movements. They do this by creating a fingerprint of the protein’s individual dynamics and then comparing them to the fingerprints of other like molecules. Now, the plan is to develop this approach further and create a detailed machine learning model that makes more specific connections between dynamics and function.

One of the big challenges they face will be to limit the number of costly experiments that are necessary to characterize a system. To overcome this, Nikos and Wouter plan to develop machine learning techniques that can integrate experimental data from low-cost, high throughput experiments which will give a coarse-grained description of the system, with a few strategically chosen high-cost experiments that probe the exact functional trait of interest. 

Collaborative short-cuts to success

To develop their model, Nikos and Wouter will have access to an ample pool of lipase mutations through a collaboration with Novozymes, who both researchers have been working with extensively for a number of years already.This collaboration means that the team will immediately have access to samples and data of many mutated lipases, saving them years of developmental work if they were required to create the mutations themselves. In addition, they will also be supplied with some mutations of the POR enzyme through a collaboration with doctors at the University Children’s Hospital in Bern. This enzyme plays a vital role in regulating metabolism and mutations in its DNA sequence can lead to various metabolic diseases. 

These samples will be used to develop a machine learning model which will produce predictions of enzyme structure, function and dynamics based on their sequence. The model will be tested and re-trained based on data about the known functional effects of each mutation. This will produce a second set of predictions, which will then be tested against data from single-molecule experiments. This data is expensive in terms of both time and cost but it will provide the researchers with a much more specific error bar against which to test specific predictions related to dynamics and function – exactly what is missing from current models.

Why single-molecule experiments?

Returning to the example of lipases, when you do an experiment in a test tube what you see is the average rate at which an enzyme is breaking down fat. Based on that, you might say it breaks down a hundred fat molecules a second. But when we look at data from single molecule experiments, we might find out that the enzyme spends 50% of its time on target doing nothing so actually when the enzyme works it is breaking down fat at a rate of two hundred molecules a second, but 50% of the time it is not working. Without single-molecule experiments you miss out on these details. And these details are important for protein engineers. Take the above example, if a protein engineer wanted to design a more efficient lipase, based on averages, they might try to increase the catalytic activity. But they would be in the dark as to which element of average catalytic activity they should target; the working hours (time spent active) or the working efficiency (turnover rate when active)?.

nikos and wouter
Associate Professors Nikos Hatzakis and Wouter Boomsma

The advantages of collaborating within UCPH

Nikos and Wouter’s Villum project represents an integrative approach to structural biology that makes clear the value of interdisciplinary collaboration. Their partnerships with Novozymes and Bern have saved them years of start-up research. But more importantly, it is their partnership with each other that will drive this project; a collaboration which has benefited enormously from being based at the same university.

Collaborations with overseas partner are important explains Nikos. ‘People from abroad a have a completely different mentality and a different way of thinking so that is educative’. But there are many advantages of collaborating here in Copenhagen.  For example, it has meant that the pair were able to organize a physical meeting straight away in the weeks after meeting. ‘When you want to brain-storm, I think physical meetings are a bit better’ says Nikos. ‘You can see the body language of a person, you can see them talking but also see them thinking. It facilitates a more interactive discussion than zoom’. It also makes the exchange of physical materials between their labs easier and the co-supervision of students. This is vital for the training of young inter-disciplinary researchers who are capable of bridging the gaps between the specialties of their supervisors. What’s more, ‘the students themselves know each other, so sometimes things happen under the radar of the PIs’ says Nikos. This creates a solid foundation for collaborations as it is the young researchers who often spend the most time together. And it doesn’t hurt that when ‘a student accidentally destroys a sample they can contact the students at the other lab and organize to pick a new sample the next day’ says Nikos.

ISBUC will be following Wouter and Nikos’ Villum Synergy project over the next few years and will report back regularly. In the meantime, if you have any questions or would like to find out more about the project, please contact Nikos Hatzakis: hatzakis@chem.ku.dk or Wouter Boomsma: wb@di.ku.dk