Project description:LC-MS/MS-based identification of HLA-peptides is poised to provide a deep understanding of the rules underlying antigen presentation. However, a key obstacle limiting the utility of MS data is the ambiguity arising from the co-expression of multiple HLA alleles. Here, we introduce a strategy for profiling the HLA ligandome one allele at a time. By using cell lines expressing a single HLA allele, optimizing immunopurifications, and developing a novel spectral search algorithm, we identified thousands of peptides bound to 16 different HLA class I alleles. These data enabled the discovery of novel binding motifs, and an integrative analysis quantifying the contribution of factors critical to epitope presentation, such as protein cleavage and gene expression. We trained neural network prediction algorithms with our large dataset (>24,000 peptides) and outperformed algorithms trained on datasets of peptides with measured affinities. We thus demonstrate a scalable strategy for systematically learning the rules of endogenous antigen presentation.