DNA data storage is one of the most promising new technologies poised to revolutionize the world of data storage. On a scale of 1 to 10, this is an 11½ for interesting. Using DNA as a storage device on a surface level seems to make sense, but what exactly is DNA Data Storage?
DNA data storage involves creating synthetic DNA and writing information on the DNA itself to store binary data. It allows significantly more digital data to be stored compared to traditional computer drive-based storage methods.
This article will look at the history of DNA data storage and where the current technology stands. We will also look at some benefits and downsides of DNA data storage so that you will be able to understand what the future holds for DNA-based data storage.
What is DNA?
DNA, or deoxyribonucleic acid, is the inherited biological cell nucleus material in humans and almost all living organisms. It is the directions for how to build life.
DNA can replicate or make copies of itself and there is nuclear DNA and mitochondrial or mtDNA.
DNA code is made up of four chemical bases: adenine (A), guanine (G), cytosine (C), and thymine (T). The order, or sequence, of these DNA bases, determines the information available for building and maintaining a living organism.
DNA bases are called base pairs which are attached to a sugar molecule and a phosphate molecule which create a nucleotide. Nucleotides are arranged in two long strands that form a spiral called a double helix that looks like a ladder. Each strand of DNA in the double helix can duplicate the sequence of bases.
How Does DNA Data Storage Work?
The field of DNA data storage can be traced back to discovering the DNA double helix in the 1950s. While DNA data storage may make sense as an abstract idea, understanding how it works is a different matter.
DNA data storage works by reading and writing data to synthetic DNA molecules. In nature, DNA already acts as the storage medium for the genetic information of living cells. Current techniques use a chemical process to create DNA with information.
If you consider digital data, it is split up into 0s and 1s of binary code. However, a DNA molecule has four base pairs, A,G,C, and T which means you can fit more data in a smaller space. The data can then be stored to be retrieved, sequenced, and decoded later.
DNA can also be stored without the usual overhead costs associated with current data storage technology. Data centers have substantial energy requirements, which are only going up. On the other hand, storing DNA requires minuscule amounts of energy, and you don’t have to worry about mechanical failure with DNA.
The Synthesis of DNA for Data Storage
The synthesis of DNA for data storage purposes has been the focus of much research. Current methods using chemical processes are likely not the best way to efficiently create long strands of DNA, which are vital for practical usage of DNA as a storage medium. While chemical processes have been successful, this is likely to be superseded as new techniques are refined.
The silicon chip industry uses a photolithographic approach to manufacture delicate circuitry and processors. Recent research by Harvard has successfully used this technique to produce usable DNA for storage.
6 Steps for DNA Storage Sequence and Retrieval For Future Decoding and Use
- Data transformed into binary data.
- Binary data encoded into corresponding DNA sequences.
- DNA strands sequences synthesized.
- Synthesized DNA samples preserved.
- Data retrieved.
- DNA base sequences are decoded to get the original binary data.
Benefits of DNA Data Storage
At its core, DNA is already a data storage device for our genome, and it has already been proven to be a stable form of information storage and copying. If this weren’t the case, organisms on Earth wouldn’t exist.
More than this, DNA has also been proven to be superior to current storage methods. It can hold much more data, is straightforward to read, and will last almost indefinitely. The current amount of data produced worldwide annually is increasingly exceeding our ability to store it all using current methods.
The Size of Big Data
According to figures, the so-called digital universe doubles in size every two years. When focusing on big data and its size, Google is processing around 20 petabytes per day of content. One petabyte equals 1,024 terabytes, and Google is only one way in which data is processed, so you can imagine just how much data is being processed around the world daily.
This data storage size number has been going up steadily since the phenomenon of the information explosion of the 1940s.
There has been a realization by the major players in the data sphere that the growing costs of data storage are increasingly unsustainable. Given this situation, DNA’s ability to store so much information in such a small area without ongoing maintenance costs is very appealing.
High Information Density
Research has revealed that DNA has a very high information density, which means you can store more data in a smaller space while still being readable. DNA also exceeds the information density of current data storage methods.
According to the Wyss Institute, “DNA is at least 1000-fold denser than the most compact solid-state hard drive and at least 300-fold more durable than the most stable magnetic tapes.”
Current estimates state that you could fit all of the data currently in existence in just a few grams of DNA. This incredible statement is promising on many levels and especially with concern to power usage currently needed with digital data storage.
DNA Data Is Easy To Read
Advances in DNA sequencing continue today, which means that reading DNA data is getting less expensive and easier to perform.
It is different from other forms of data storage in that it is already in a final form. Computers and other technology will become obsolete at some point, while we can hope that DNA will be around for many more billions of years.
DNA’s nucleotide code offers a compatible interface to computers via the base pairs representing letters, symbols, and digits. Successful experiments with DNA Data Storage often involve transcribing the full text of a book or using DNA to store all the notes of a song that can be played back. Given that the data stored on DNA is easy to read, the importance of DNA data storage is clearly understandable.
DNA Data Storage Stability
DNA is an almost perfect storage medium because of its stability,, and it can minimize errors in transcription over millions of generations.
It is also highly durable and can be stored in a cool, dark place and read later. On the other hand, data centers require incredibly complicated and powerful cooling systems requiring vast amounts of energy, and many centers are built in naturally cold areas due to this requirement.
Furthermore, DNA requires virtually zero maintenance to maintain its integrity. Consider that some fossils examined today after being buried deep underground for hundreds of millions of years can still produce readable DNA, so this advantage is relevant to DNA-based data storage.
Issues With DNA Data Storage
Despite the advances and success, the major issue with DNA data storage remains the cost. Current methods will set companies and individuals back $3,500 per 1 megabyte of information.
On top of this, current chemical synthesis methods still have drawbacks like synthetic errors and low yield. The chemical synthesis of the DNA also produces toxic byproducts. Inefficient and harmful production methods are not compatible with the rising awareness of green or clean data storage.
Another issue is that encoding data in DNA has been incredibly slow, which has been millions of times slower than the microsecond timescales in a silicon memory chip. But there has been progress as the technology evolves and the ability to write information into DNA at megabit per second write speeds using new technology has promise.
DNA Storage Errors
Another major issue with DNA data storage is errors. Producing DNA with information in the right area without any missing data is vital to making it a suitable storage medium. Ensuring that the DNA is synthesized correctly has proven to be a challenge.
While computer code errors will be relatively easy to spot due to blank spaces, errors in DNA sequences show up as insertions or deletions, which comes down to how DNA is formed through substituting or removing parts of the DNA.
With the implementation of techniques from the computer chip world, successful experiments have been performed despite errors.
Although there are still many challenges with DNA data storage, it allows the exponential amount of data to be stored efficiently and helps reduce the environmental impact of needing so many massive, power-hungry data centers.
So there are tradeoffs at this time with DNA Data Storage that are going to improve dramatically. It is likely to be future-proof, stable, and far more durable than current drive-based storage methods.
As the field advances, enzyme-based approaches have shown promise for synthesizing DNA. These processes can produce long strands of DNA without as many errors, toxic byproducts, and in a fraction of the time, making them the likely future of DNA storage.
- Wyss Institute: DNA Data Storage
- Wyss Institute: Enzymatic DNA synthesis sees the light
- Nature: Photon-directed multiplexed enzymatic DNA synthesis for molecular digital data storage
- IBM: Field testing for cosmic ray soft errors in semiconductor memories
- Barthel et al, Enhancing Terminal Deoxynucleotidyl Transferase Activity on Substrates with 3′ Terminal Structures for Enzymatic De Novo DNA Synthesis
- Popular Mechanics: DNA Is Millions of Times More Efficient Than Your Computer’s Hard Drive
- The University of Texas at Austin: College of Natural Sciences: Power of DNA to Store Information Gets an Upgrade
- Microsoft Blog: Microsoft and University of Washington researchers set a record for DNA storage
- Zwolenski and Weatherill: The Digital Universe
- National Library of Medicine: Francis Crick Papers: The Discovery of the Double Helix, 1951-1953
- NIH: National Human Genome Research Institute: Base Pair
- US Office of Science: United States Data Center Energy Usage Report
- Hiyoshi et al, Does a DNA-less cellular organism exist on Earth?
- Oracle: What is Big Data?
- Forbes: A Very Short History Of Big Data
- Heather and Chain, The sequence of sequencers: The history of sequencing DNA
- New York Times, Using the Weather to Cool Data Centers
- Scientific American: DNA: The Ultimate Data-Storage Solution
- Computer Weekly, Green storage: The state of energy-efficient technology
- Uncertainties in synthetic DNA-based data storage | Nucleic Acids Research | Oxford Academic