Unknown

Dataset Information

0

A machine learning toolkit for genetic engineering attribution to facilitate biosecurity.


ABSTRACT: The promise of biotechnology is tempered by its potential for accidental or deliberate misuse. Reliably identifying telltale signatures characteristic to different genetic designers, termed 'genetic engineering attribution', would deter misuse, yet is still considered unsolved. Here, we show that recurrent neural networks trained on DNA motifs and basic phenotype data can reach 70% attribution accuracy in distinguishing between over 1,300 labs. To make these models usable in practice, we introduce a framework for weighing predictions against other investigative evidence using calibration, and bring our model to within 1.6% of perfect calibration. Additionally, we demonstrate that simple models can accurately predict both the nation-state-of-origin and ancestor labs, forming the foundation of an integrated attribution toolkit which should promote responsible innovation and international security alike.

SUBMITTER: Alley EC 

PROVIDER: S-EPMC7722865 | biostudies-literature | 2020 Dec

REPOSITORIES: biostudies-literature

altmetric image

Publications

A machine learning toolkit for genetic engineering attribution to facilitate biosecurity.

Alley Ethan C EC   Turpin Miles M   Liu Andrew Bo AB   Kulp-McDowall Taylor T   Swett Jacob J   Edison Rey R   Von Stetina Stephen E SE   Church George M GM   Esvelt Kevin M KM  

Nature communications 20201208 1


The promise of biotechnology is tempered by its potential for accidental or deliberate misuse. Reliably identifying telltale signatures characteristic to different genetic designers, termed 'genetic engineering attribution', would deter misuse, yet is still considered unsolved. Here, we show that recurrent neural networks trained on DNA motifs and basic phenotype data can reach 70% attribution accuracy in distinguishing between over 1,300 labs. To make these models usable in practice, we introdu  ...[more]

Similar Datasets

| S-EPMC9410911 | biostudies-literature
| S-EPMC6939162 | biostudies-literature
| S-EPMC6150731 | biostudies-literature
| S-EPMC4571720 | biostudies-literature
| S-EPMC8213174 | biostudies-literature
| S-EPMC1847811 | biostudies-literature
| S-EPMC7435601 | biostudies-literature
| S-EPMC6858556 | biostudies-literature
| S-EPMC6410806 | biostudies-literature
| S-EPMC7687896 | biostudies-literature