FEATURE12 January 2016
The DNA disk
x Sponsored content on Research Live and in Impact magazine is editorially independent.
Find out more about advertising and sponsorship.
FEATURE12 January 2016
x Sponsored content on Research Live and in Impact magazine is editorially independent.
Find out more about advertising and sponsorship.
Recent research highlights the field of genomics as one of the fastest-growing data generators in the world. But could it also solve the data storage crisis? By Bronwen Morgan
The amount of data produced worldwide in 2011 ( 1.8 ZB) would have filled 57.5 billion 32GB Apple iPads: enough to build a Great Wall of China twice as tall as the original. This figure reached 2.8 ZB in 2012, and is predicted to reach 40 ZB by 2020.
One of the most prolific, and fastest-growing, data generators in the world today is the field of genomics: the quantity of genetic data being produced on a daily basis is currently doubling every seven months, and by 2025, genome scientists will have overtaken YouTube and Twitter, as well as reigning science data kings astronomy and physics, as the leaders in data production.
But in an ironic twist, the same field could also offer a solution to the approaching data storage crisis. According to Nick Goldman, research group leader at the European Bioinformatics Institute (EBI), the 2.8 ZB of data that would have resulted in an even bigger ‘Great Wall of iPads’ could be stored in just one cubic metre of DNA.
The idea for this novel storage medium emerged – as so many do – in a bar, where Goldman and his collaborator Ewan Birney, director of the EBI, were pondering their own data storage issues.
Then we realised that the DNA, which was the source of the information that had become our headache, is actually a digital information storage medium itself
It was around five years ago and they had just had come out of a meeting to discuss whether or not the institute would continue to be able to cope – not just physically, but financially – with the ever-increasing volume of information that was being produced. Much of the Institute’s activity, Goldman explains, involves storing “seriously large amounts” of genomics data, and making it available to other scientists to download. “We’re not as big as CERN,” he says, “but we’re up there with the major scientific set-ups.” As a publicly-funded institution, the high cost of running the hard disks to store this data was becoming a significant problem.
“We were sitting in the bar afterwards, and we said: ‘If only we didn’t have to share all this information on hard disks, which cost a lot and go wrong and need replacing and require electricity and cooling systems. Wasn’t there some other way we could store all this information?’ We were just sort of joking around. And then we realised that the DNA, which was the source of the information that had become our headache, is actually a digital information storage medium itself.”
Goldman and Birney reasoned that, if genome information can be stored as a cell with DNA information inside it, rather than as a digital file, that meant that DNA could conceptually be used to store any other type of digital information.
“We realised from what we know from our everyday life handling DNA sequences and experiments studying DNA that we have all the technologies that we need to use DNA to store information,” Goldman says. “Basically you need to be able to write a message into your medium, you need to store it or move it around or make copies, and then you need to be able to read it back.”
The process works by converting binary code – the zeros and ones that are the basis of anything stored or transmitted digitally – into the four letters that make up DNA: A, C, G and T. A file of data stored as binary code will be turned into a long string of letters, and this long string is then broken into shorter sections with indexing information attached. This allows it to be put together again correctly when read back by the computer program.
The physical version of this DNA code is then produced – in a process comparable to inkjet printing, says Goldman – by a company in California, and shipped to the Institute in test tubes. The DNA itself is essentially a powder, which can be suspended as a liquid solution and when freeze-dried becomes a film on the surface of the test-tube, much like if salt water was left to evaporate.
He’s keen to point out that while they are effectively repurposing DNA, at no point does the process go anywhere near the genome of a living organism: “That worries some people, but really we’re just using some chemical molecules that have very convenient properties and that we are good at manipulating in laboratories.”
The initial test of the process, in 2013, successfully stored around 1MB of digital information: all of Shakespeare’s sonnets, an audio clip of Martin Luther King’s ‘I have a dream’ speech and a copy of Watson and Crick’s classic paper on the structure of DNA.
Goldman and Birney have subsequently worked in collaboration with artist Charlotte Jarvis to store a recording of a string quartet – around 2.5MB of information – in DNA, which was then dissipated via children’s bubble mix as part of a multi-sensory installation. “In theory you could recover the music [from the bubble mix] if you have a lab handy,” says Goldman.
Both of these trials have been genuine proof of principle that what started as a fun ‘Friday afternoon science project’ could be a viable alternative for the storage of digital information. The most appropriate application of DNA storage, says Goldman, would be for companies that need to archive a lot of information in the long term, rather than as a replacement for memory systems like chips or hard disks in personal computers.
So far Goldman has been approached by film industry production companies looking to store archived films, energy companies looking to store data from geological surveys and CERN, among other interested parties. At the moment, he says, the bottleneck to the method going mainstream is the synthesis – actually making the strands of DNA – to their designs.
This apparently requires enormous improvements in order for the idea to be economically viable. But, he adds, this type of large-scale improvement is not unusual in genomics – the technology used to read a DNA sequence has improved “by a million-fold” in the past 10 years.
And saving space isn’t the only benefit offered by the method, he says: DNA, and reading it, will never become obsolete. “DNA will last a really long time, and we know that from regular research studying evolution,” he says. “I work with people that have taken samples from 700,000 year old dead horses and recovered DNA successfully. And there will always be a reader: today’s magnetic tapes won’t work in 20 years because there won’t be a reader, but there will always be readers for DNA because we will always want to read that information.
“The technology we use to do it will change, but the basic concept will be the same.”
0 Comments