Microsoft Research and University of Washington demonstrate a fully working automated DNA storage system
With our ever expanding storage needs, we are slowly reaching the limits of what can be stored on magnetic and optical media. To overcome this limitation, the concept of storing data on organic molecules such as DNA strands was proposed as early as 1988. Now, Microsoft Research's Karin Strauss along with Prof. Luis Ceze's team from the Molecular Information Systems Laboratory (MISL) at the University of Washington (UW) have successfully demonstrated large scale random access of information from a synthetic DNA molecule. Their findings were published in the journal Nature Biotechnology.
DNA, short for deoxyribonucleic acid, contains all the blueprint required to create life as we know it. It also provides an extremely dense and long-lasting medium for storing data. Imagine storing 1 exabyte (1 billion GB... yes, 1 billion) of data in one millimeter cube of DNA that has a half-life of over 500 years in harsh environments. As long as there is DNA-based life — which is basically everything — there will always be ways of storing highly dense data.
DNA, being a biochemical molecule, is made of building blocks called nucleotide bases that include Adenine (A), Guanine (G), Thymine (T), and Cytosine (C). DNA is composed of sequences of these nucleotides in a particular order that ultimately determines the trait that is expressed in the organism like height, skin color, etc. To store data in DNA, binary code has to be translated from 0s and 1s to A, T, G, or C and these bases have to be decoded back into binary for reading. This presents a problem in randomly accessing storage blocks in the DNA — a throwback to the days of having to read an entire tape just to access one file stored at a particular location.
To circumvent this problem, the team devised a method of accessing specific regions of DNA using short DNA strands called primers. Primers can bind specifically to target sequences, and by using a popular method called Polymerase Chain Reaction (PCR), these targets can be amplified and decoded. The team could successfully retrieve 35 distinct files from synthetic DNA with capacities ranging from 29 KB to 44 MB for a total of 200 MB of HD video, audio, text, and images. The team has since passed 400 MB of data retrieval till date.
Being able to write to and read from biomolecules such as DNA is an extremely encouraging prospect but comes with its own set of challenges as well. As any molecular biologist would attest, sequencing a DNA strand requires controlled protocols and there is always an error rate that goes into it. Not to mention the fact that the cost of DNA sequencing is still out of bounds many. Cells in the body have enzymatic mechanisms to prevent incorporation of errors during DNA replication but it remains to be see if this mechanism can be exploited in minimizing data errors while reading and writing to the molecule.
Watch the video below to learn more and see the technology in action. Does the prospect of biological storage intrigue and amaze you? Let us know in the comments below.