Computing

DNA can now store images, video and other types of data

Tiny test tubes of DNA might one day replace sprawling data-storage centers

All of the data from more than 600 smartphones — 10,000 gigabytes worth — could be stored on the tiny pink smear of DNA at the end of this test tube.

Tara Brown Photography/University of Washington

By Kathiann Kowalski

May 4, 2016 at 6:00 am

With a smartphone, you can look up facts, stream videos, check out Facebook, read tweets and listen to music. But all of those data aren’t stored on your phone. They are kept somewhere else, perhaps half a world away. For now, companies like Microsoft, Amazon and Facebook store those data on magnetic tapes or other media. It’s an ever-growing library of data that takes up lots of space in sprawling data centers. And even the best storage media last only a few decades at most. Then they need to be replaced. But there may be a better way to keep and guard information, some researchers say. Store and retrieve it — with DNA.

DNA holds the genetic information that tells each cell inside a living being what to do. Each side of a DNA molecule’s twisted, ladder-like structure is made of four chemical building blocks. They’re called nucleotides and are known as A, T, C and G. (The letters stand for adenine, thymine, cytosine and guanine.) In various combinations, these letters spell out the code for our genes.

Computers currently store data as series of 0s and 1s. But data also can be written using the four building blocks of DNA, says Luis Ceze. As a computer architect at the University of Washington in Seattle, he studies how computers and data systems should be designed and function. Labs can make strands of synthetic DNA, one nucleotide block at a time. Combinations can be developed, as a code, to stand for numbers, letters or other digital information. Later, other lab equipment can translate those building blocks along a strand of DNA. In that way, they can decode the original data.

Why bother? DNA can hold lots of information in a tiny space. In theory, a volume of DNA the size of a sugar cube could hold as much data as a Walmart-sized storage center. Plus, Ceze says, unlike magnetic tape, DNA can last unchanged for thousands of years.

Work on DNA data storage started years ago. Ceze’s team has just added what’s known as “random access” to the method. It offers a way to find a specific file. Each data file gets its own unique “address.” It works in much the same way as a house number, street name and zip code guide a mail carrier to your home. The researchers add those digital addresses to each DNA strand holding data for a particular file.

Ceze’s team, which included people from Microsoft, reported its new advance on April 6 in Atlanta, Ga. The advance was presented at the International Conference on Architectural Support for Programming Languages and Operating Systems.

A borrowed tool

To search for a specific file in a large quantity of DNA, the Seattle team uses a tool called PCR. It’s short for polymerase (Puh-LIM-er-ase) chain reaction. Here’s how PCR works: DNA goes into a test tube, along with strings of nucleotides known as primers. Each primer is chosen to match the address sequences at the ends of selected DNA strands. Single nucleotides and a few other things are in the mix, too. The test tube then goes into a machine that heats and cools the soup of genetic material over and over.

Testtube-data — In a lab at the University of Washington, Lee Organick prepares samples of DNA with data files for storage (as Luis Ceze looks on). Tara Brown Photography/University of Washington

Heating up double-stranded DNA separates it into single threads. After the sample cools down, the primers seek out and bind to the ends of the specific strands that scientists are interested in. Single nucleotides in the mix then bind to the rest of the strand.

Each time the heating and cooling cycle repeats, it’s like pressing start on a copying machine; the PCR duplicates DNA. These cycles repeat over and over and over, making millions of copies of the target DNA. Scientists describe this as “amplifying” the DNA.

PCR will copy desired snippets of DNA so many times that soon they greatly outnumber all of the rest of the genetic material in a sample.

Many scientists already use PCR. It’s used to copy the DNA found at a crime scene, for instance. That lets forensic scientists work with the DNA and compare it to other samples, such as one from a suspect. Similarly, environmental scientists might use PCR to amplify the foreign DNA they find in a river in hopes of matching it to a particular species of fish.

Making lots of copies of a specific bit of DNA can now help pick out a data file, Ceze says.

He compares the idea to trying to get a bowl of alphabet soup with only certain letters. Picking out individual letters would take a really long time. But suppose you were able to selectively copy, over and over, just the letters you liked. Eventually, nearly every scoop you took out of the bowl would contain just what you wanted. Likewise, PCR can make sure that the DNA picked out after the process is pretty much just what you had been looking for. Then lab equipment can read that DNA to decode its stored data.

PCR is a pretty standard tool in genetics research. But borrowing that tool to find specific DNA data files didn’t happen until Ceze took a break from his regular work and spent time in a microbiology lab. There, he learned about PCR. And that led to the team’s idea for random access. “You see two things, and then all of a sudden you see that they could be connected,” he explains.

Avoiding errors

Making and copying large amounts of DNA is “hard to control exactly,” Ceze says. So his team also built in a way to deal with errors. When data have been encoded into the fake DNA, overlapping parts of each section will go onto three separate DNA strands. In order to decode a file, a computer needs data from at least two of the three strands. That way, even if one strand has errors, the other two strands will still have saved the data.

The new system also doesn’t require the same accuracy for all types of data. Relaxing standards for some types of material makes it easier to store large files. For example, text files might require a very high level of precision. In contrast, most people won’t notice if a few pixels are off in yet another picture of their cat.

In lab tests, the system worked very well. The researchers successfully coded video files of people talking about war crimes in the African nation of Rwanda. When they later searched for those files, they found them easily. The group also encoded and reconstructed four image files.

Dean Tullsen is a computer-science engineer at the University of California, San Diego. He chaired the meeting’s session in which Ceze’s group described its new system for DNA storage and file retrieval. He says that it’s not clear whether or when DNA data storage might become common. But the University of Washington team has “shown some very exciting potential,” he says. “The best part of the work is that they have actually stored some pictures in synthesized DNA” in the lab, he adds. The team then “read the data back out with no errors.”

Of course, one of those pictures showed a cat.

Power Words

(for more about Power Words, click here)

cell The smallest structural and functional unit of an organism. Typically too small to see with the naked eye, it consists of watery fluid surrounded by a membrane or wall. Animals are made of anywhere from thousands to trillions of cells, depending on their size. Some organisms, such as yeasts, molds, bacteria and some algae, are composed of only one cell.

chemical A substance formed from two or more atoms that unite (become bonded together) in a fixed proportion and structure. For example, water is a chemical made of two hydrogen atoms bonded to one oxygen atom. Its chemical symbol is H₂O. Chemical can also be an adjective that describes properties of materials that are the result of various reactions between different compounds.

DNA (short for deoxyribonucleic acid) A long, double-stranded and spiral-shaped molecule inside most living cells that carries genetic instructions. In all living things, from plants and animals to microbes, these instructions tell cells which molecules to make.

DNA sequencing The process of determining the exact order of the paired building blocks — called nucleotides — that form each rung of a ladder-like strand of DNA. There are only four nucleotides: adenine, cytosine, guanine and thymine (which are abbreviated A, C, G and T). And adenine always pairs up with thymine; cytosine always pairs with guanine.

engineer A person who uses science to solve problems. As a verb, to engineer means to design a device, material or process that will solve some problem or unmet need.

environmental science The study of ecosystems to help identify environmental problems and possible solutions. Environmental science can bring together many fields including physics, chemistry, biology and oceanography to understand how ecosystems function and how humans can coexist with them in harmony. People who work in this field are known as environmental scientists.

forensics The use of science and technology to investigate and solve crimes.

gene (adj. genetic) A segment of DNA that codes, or holds instructions, for producing a protein. Offspring inherit genes from their parents. Genes influence how an organism looks and behaves.

molecule An electrically neutral group of atoms that represents the smallest possible amount of a chemical compound. Molecules can be made of single types of atoms or of different types. For example, the oxygen in the air is made of two oxygen atoms (O₂), but water is made of two hydrogen atoms and one oxygen atom (H₂O).

microbiology The study of microorganisms, principally bacteria, fungi and viruses. Scientists who study microbes and the infections they can cause or ways that they can interact with their environment are known as microbiologists.

nucleotides The four chemicals that, like rungs on a ladder, link up the two strands that make up DNA. They are: A (adenine), T (thymine), C (cytosine) and G (guanine). A links with T, and C links with G, to form DNA. In RNA, uracil takes the place of thymine.

pixel Short for picture element. A tiny area of illumination on a computer screen, or a dot on a printed page, usually placed in an array to form a digital image. Photographs are made of thousands of pixels, each of different brightness and color, and each too small to be seen unless the image is magnified.

polymerase chain reaction (PCR) A biochemical process that repeatedly copies a particular sequence of DNA. A related, but somewhat different technique, copies genes expressed by the DNA in a cell. This technique is called reverse transcriptase PCR. Like regular PCR, it copies genetic material so that other techniques can identify aspects of the genes or match them to known genes.

random Something that occurs haphazardly or without reason, based on no intention or purpose.

random access The process of storing or retrieving a particular data file directly no matter where it’s stored in a medium, instead of having to encode or decode the whole body of data.

smartphone A cell (or mobile) phone that can perform a host of functions, including search for information on the Internet.

species A group of similar organisms capable of producing offspring that can survive and reproduce.

synthetic An adjective that describes something that did not arise naturally, but was instead created by people. Many have been developed to stand in for natural materials, such as synthetic rubber, synthetic diamond or a synthetic hormone. Some may even have a chemical makeup and structure identical to the original.

This is one in a series presenting news on technology and innovation, made possible with generous support from the Lemelson Foundation.