Abstract
Genome editing offers the unprecedented ability to precisely modify genetic material, signaling a new era in our approach to understanding life and treating genetic diseases. Recent advances in genome editing technologies have greatly enhanced their applicability. Among these technologies, prime editing stands out for its ability to precisely edit the genome while minimizing the occurrence of double-strand breaks typically associated with conventional CRISPR-Cas9 editing. This thesis focuses on improving the editing efficiency of prime editing through in-depth analyses of (i) prime editing guide RNA (pegRNA) design and (ii) the influence of local chromatin characteristics, presented in two comprehensive studies. Our first study (Nature Biotechnology, 2023) involved an extensive evaluation of prime editing efficiency across a large number of human pathogenic mutations. Applying a high-throughput screening approach, we analyzed over 92,000 pegRNA designs and uncovered key sequence-context determinants critical for achieving efficient prime editing outcomes. This exploration led to the creation of "PRIDICT" (PRIme editing guide RNA preDICTion), a machine learning model designed to improve the prediction and editing efficiency of pegRNA designs, thus facilitating the process of prime editing optimization. Building on these findings, the second study (preprint, 2023; under review) expanded the scope by including a wider range of edit types (up to 15 base pairs in length) and examined the efficiency of prime editing in mismatch repair (MMR)-proficient and -deficient cellular contexts. Through the analysis of a library with over 20,000 diverse pegRNAs, this study has refined our understanding of the prime editing process and improved our predictive capabilities with the development of PRIDICT2.0. Furthermore, to complement our high-throughput screens for optimal pegRNA designs, we expanded our research to assess the impact of local chromatin context on prime editing efficiency. We employed a method known as TRIP (Thousands of Reporters Integrated in Parallel), which allowed us to study editing outcomes at more than a thousand different genomic locations. This provided a systematic assessment of how different chromatin contexts can influence prime editing, with sites that have chromatin characteristics similar to promoters or actively transcribed genes being the most efficient. The insights gained from this analysis, along with publicly available chromatin datasets, led to the development of ePRIDICT. This model predicts the influence of chromatin context on editing efficiency, thereby advancing our ability to predict prime editing efficiency. In conclusion, this thesis takes an interdisciplinary approach that combines biological experimentation with computational methods to push the boundaries of prime editing. By utilizing high-throughput screening and a variety of machine learning methods, we have improved the efficiency of prime editing and broadened its potential applicability, paving the way for its more effective use in both research and therapeutic contexts.