How Big Data and Genomics Are Unlocking Cures for Rare Diseases


Genomics and Big Data are rapidly merging to contribute significantly to the field of precision medicine

Precision medicine uses genetics and other patient information to design targeted treatments for diseases. One of the key ingredients behind precision medicine is genomics. This, combined with advanced, high speed computing technologies, is helping to bring new treatments to market faster.

We have all had family members or friends affected by diseases such as cancer, diabetes, and heart failure. The main goal behind genomics is to accelerate finding treatments for these and other diseases. Genomics is the area within molecular biology that deals with genomes (an organism’s complete set of DNA). Big data, on the other hand, deals with the analysis of the data being generated by various objects, whether in business, sports, or healthcare.

Since virtually every human disease is ingrained in our genes, genomics is being used to gain valuable insight into the potential causes of these diseases. The main process used in genomics is called sequencing, which means figuring out the order of DNA bases in a genome. The human genome is made up of over 3 billion of these genetic letters. Over the past 10 years, numerous genome sequencing projects and thousands of other genomics related studies have been started.

The University of California, Santa Cruz (UCSC), a long time leader in genomics, are conducting one such study. Since 1985, the university has been a leader in this field, and in 2014, it formally established the UCSC genomics institute to “unlock the world’s genomic information to drive targeted treatment of diseases”. The institute has several projects underway. One of the important studies it is conducting is on childhood cancer, called the Treehouse Childhood Cancer Project. Every day 43 children are diagnosed with cancer in the U.S., of whom 8 will die from cancer within 5 years, and 26 will have significant long-term issues and may die in the next 30 years. Amazingly enough, since 1980, only 3 new drugs have been approved for treating childhood cancer, vs. several dozen drugs for adult cancers during the same time period. This study’s goal is to try and identify situations where an adult drug is predicted to work on a subset of pediatric cancer patients.


On the Big Data front, there are many exciting new developments. The most impactful change is the use of cloud computing. Historically, individual researchers would download and analyze DNA data on local hardware. However, given the enormous growth of biomedical data, the size of the data has made access and analysis challenging in terms of both storage and computing capability.

The explosion in data generated by genomics driven research is staggering. As labs adopt newer and faster equipment to decode DNA, exabytes of data is being generated. According to a recent scientific study by a prominent group of scientists, by 2025, genomics could be the largest field in big data. With the raw data that make up each human taking up 100GB of storage capacity, combined with the numerous studies under way currently, it is easy to see how that could be possible. Compared to other segments such as social media (Twitter) and online video (YouTube), genomics is creating far more data that will require careful analysis.


Source: PLOS Biology

To meet this challenge, technologies such as security and storage need to be improved dramatically, and powerful new analysis tools that leverage the cloud’s computing power need to be made readily available to make sense of the mountains of data being produced at a rapid pace.

A good example is Google Genomics, which is a cloud computing service that includes API’s, allowing scientists to move DNA data into its server farms and conduct experiments using their popular database technology. It also offers sophisticated data processing and analysis tools such as BigQuery, GATK, Apache Spark, Cloud Dataflow and Grid Engine Cluster. Finally, it offers virtual machines, scalable storage, and either fully managed SQL and NoSQL databases like Bigtable and Datastore. Google has been an important player in this field, and has partnered with multiple organizations involved in genomics studies, such as the Broad Institute of MIT and Harvard. The company is not alone in this area. Amazon Web Services and IBM are also active in this field, as are several other cloud service providers.

There’s no doubt the use of cloud computing is significantly impacting genomics. Technology is enabling much faster sequencing and analysis of DNA, at a more reasonable cost, and ultimately (and hopefully) resulting in faster discovery of genetic causes of various hard to cure diseases. In 2003, it cost $3B and 15 years to fully sequence the first human genome. Today, it costs around $1000 and it takes less than one day.  Better and more effective treatments have already been discovered for several pediatric cancers such as Melanoma. Big data for genomics is an important area that needs further exploration and investment. We will likely see many new developments in this exciting field and eventually realize faster development of treatments for these diseases.


BitNavi is a blog conceived by Karl Motey in the heart of Silicon Valley, dedicated to emerging technologies and strategic business issues challenging the industry.

Leave a Reply

Your email address will not be published. Required fields are marked *