I am a 27 year-old mathematician, currently focused on developing novel predictive algorithms to identify immunogenic targets for cancer vaccines.
In my spare time, I enjoy puzzles, games, crosswords, and middle distance running.
View CVPOEM is a machine learning algorithm designed to predict vaccination targets for cancer and pathogens by integrating mechanistic modelling of the class I antigen processing pathway. This approach overcomes limitations in existing data-driven models.
The algorithm demonstrates superior predictive performance compared to state-of-the-art methods across diverse benchmarks, including cancer and SARS-CoV-2 datasets. It has garnered significant interest from leading pharmaceutical companies such as BioNTech and GSK.
This project involved developing a mechanistic systems biology model to represent the complex biochemistry of the class I antigen processing pathway. The model uses a system of ordinary differential equations to simulate key immunological processes.
Parameters were estimated using Bayesian inference with MCMC and supervised learning techniques. The resulting model outperformed data-driven approaches in predicting antigen presentation, addressing limitations in training data widely used by other algorithms.
Immune cells present peptides on their cell surface in proteins known as MHC-I. The binding affinity and stability between peptides and MHC-I is key to the efficacy of the associated immune response, so predicting these quantaties is desirable. Predictive models are trained using peptide ligands eluted from the surface of cells and identified by mass spectrometry. However, humans have up to 6 different types (alleles) of MHC-I, so it is non-trivial to identify which peptides were bound to each MHC-I allele.
To address this, I fine-tuned a large protein language model (ESM-2) to predict p(MHC-I | peptide). The resulting model could accurately deconvolute multi-allelic mass spectrometry datasets. This work was completed while working at Synteny Biotechnology so is not public.
As part of a short project for the Oxford Mathematics course "Topics in Computational Biology", I validated the consistency of two different methods for simulating the movement of a dimer. First, I considered a mesoscopic description of a dimer as two spheres moving under the action of a potential. I solved this system of stochastic differential equations analytically to find the expected bond length. I then compared this to a molecular dynamics simulation, replacing Brownian dynamics with a theoretical heat bath model. Simulating this with an event-driven timestep yielded results consistent with the mesoscopic model.
I no longer have the source code used for this study but pseudocode used to simulate the dimer can be found in the report (Algorithm 1).
For my Master's dissertation, I investigated methods of improving molecular dynamic simulation efficiency. I proposed a new method which divides the simulation domain, using coarser Langevin dynamics away from regions of interest and more computationally expensive molecular dynamics in regions requiring more detail (e.g. near protein-DNA binding sites). Regrettably, I no longer have the full Fortran 90 source code I wrote for this project.
Please feel free to contact me via email.
Email