Project description:Co-fractionation mass spectrometry (CF-MS) is a technique with potential to characterise endogenous and unmanipulated protein complexes on an unprecedented scale. However this potential has been offset by a lack of guidelines for best-practice CF-MS data collection and analysis. To obtain such guidelines, this study exploits very high proteome coverage libraries of gold standard Saccharomyces cerevisiae complexes to thoroughly evaluate novel and published yeast CF-MS datasets. A new method for identifying gold standard complexes in CF-MS data, Reference Complex Profiling, and the Extending ‘Guilt-by-Association’ by Degree (EGAD) R package are used for these evaluations, which are reinforced with concurrent analyses of published human data. By evaluating data collection designs, which involve fractionation of cell lysates, it is found that near-maximum recall of complexes can be achieved with fewer samples than published studies. Distributing sample collection across orthogonal fractionation methods, rather than a single high resolution dataset, leads to particularly efficient recall. By evaluating 17 different similarity scoring metrics, which are central to CF-MS data analysis, it is found that two metrics rarely used in past CF-MS studies – Spearman and Kendall correlations – and the recently introduced Co-apex metric frequently maximise recall, while a popular metric – Euclidean distance – delivers poor recall. The common practice of integrating external genomic data into CF-MS data analysis is also evaluated, revealing that this practice may improve the precision and recall of known complexes but is generally unsuitable for predicting novel complexes in model organisms. If studying non-model organisms using orthologous genomic data, it is found that particular subsets of fractionation profiles (e.g. the lowest abundance quartile) should be excluded to minimise false discovery. Together these guidelines identify avenues for precise, sensitive and efficient CF-MS studies of known complexes, and effective predictions of novel complexes for orthogonal experimental validation.