A new computer file format that can process DNA samples 30 times faster than existing systems has been developed by teams from UNSW and the Garvan Institute of Medical Research.
The SLOW5 format was specifically designed to more efficiently analyze nanopore sequences, which provide a more complete view of genetic variation.
The improved efficiency not only helps medical experts to analyze individual DNA samples much faster and provide faster and better healthcare to patients, but also allows for more sampling over a period of time. given.
The research behind the development has been published in Natural biotechnologybut the software has already been made available via open source and has been downloaded over 1000 times in just a few weeks.
The complex nature of DNA nanopore sequencing means that huge amounts of data are created, which must then be stored and properly analyzed.
This data was routinely saved in a file format called FAST5, with such complex information often yielding files around 1.3 terabytes in size, roughly equivalent to 650 hours of HD video.
Historically, it took computers about two weeks to process such large FAST5 files and analyze the human genome information they contained.
But Dr Hasindu Gamaarachchi, formerly a UNSW PhD student working under Prof. Sri Paramesvaran in the School of Computing and Engineeringhas now created a file format designed for efficient and scalable analysis of nanopore signal data.
The new SLOW5 file format not only significantly reduces file sizes, but can also process the exact same information in about 10.5 hours, more than 30 times faster than FAST5.
The key to this is that the SLOW5 format, unlike FAST5, allows for efficient parallel computing, in which multiple processors can simultaneously run multiple smaller analyzes decomposed from a much larger, more complex, and complete data set.
“You can think of it like trying to dig a really big hole with ten people, but there’s only one shovel they have to share,” said Dr Gamaarachchi, lead author of the paper who is now a genomic computational systems engineer at Garvan Institute.
“It was like that with FAST5. But with SLOW5 everyone has their own shovel and they can all dig at the same time and get the job done much faster.
“The FAST5 format is slow because the data is not accessible in parallel. It is based on the hierarchical data format that was designed in the 1990s to work on machines that at the time only had a single processor, rather than modern machines that include multiple processors.
“The hierarchical data format is also generic, while SLOW5 is purpose built. So in terms of the digging analogy, it’s like we’re also providing a shovel specifically designed for the type of soil.
“And because the new SLOW5 can be accessed in parallel by multiple processors at the same time, processing time has been reduced by a factor of 30,” he said.
“So instead of taking about two weeks to process data from a human genome, the time has now fallen to less than half a day.”
Nanopore sequencing itself offers a more complete view of genetic variations and the possibility of reconstructing complex genomes.
A nanopore is a hole on a nanometric scale, over which an ionic current passes, with alterations in the current measured during the passage of biological molecules. The alterations are then documented and translated to identify that molecule and base modification.
Nanopore sequencing is used to identify a range of diseases and also helps medical professionals analyze DNA samples in greater detail to potentially offer more personalized medicine, particularly in the treatment of various cancers.
The new SLOW5 file format can now help medical professionals diagnose diseases faster and ensure that patients are prescribed specific targeted drugs – often the most effective treatment – much faster than before.
Dr Ira DevesonHead of Genomics Technologies at the Garvan Institute and co-author of the paper, said, “SLOW5 has removed one of the major bottlenecks to the use of nanopore sequencing, a new technology that has countless potential applications in clinical genetics, agriculture and other areas of biosciences.
“With the development of SLOW5, our ability to process nanopore sequencing data can now follow our ability to generate it. This will open the door to many new applications in medical science for this exciting emerging technology. »