Big Data in Healthcare

Big Data on the rise

The data that is generated from the use of the Internet, social media, healthcare records, and purchasing history – herein referred to a ‘big data’ – is increasing at an exponential rate worldwide.  Both the veracity and velocity of data collection continue to rise with advancements in technology and with the use of more diverse forms of those technologies.1  This atmospheric increase in big data has been made possible by an equally rapid increase in the number of people globally with access to Internet and mobile technologies.  In 2014, over 2 billion people worldwide had access to the Internet and over 5 billion people had a mobile phone.  Within the next 4 years, over 5 billion people will have Internet access. 

The result of this dramatic increase in Internet access is a production of data 44 times greater than seen in 20092.

This amount of data to be generated is truly staggering and continues to grow each day as more and more people gain Internet access and adopt “smart” technologies including smartphones and smartwatches.  To put this amount of data into context, research has indicated that if one were to collate all of the data from recorded history through the year 2003, one would have approximately 5 billion gigabytes of data.In 2011, that same volume of data was generated every two days.  Just four years later, in 2015, that same amount of data was available every ten seconds.  And it continues to increase with every minute of every day (Figure 1).  

Figure 1
Data Generation Every Minute, 2012-2016

With more and more people throughout the world connected to the Internet every day, the generation of data is growing at a nearly exponential rate. There are now more mobile devices in the world than there are people. With all of this connectivity, just how much data is generated every minute? The numbers are astonishing and are brilliantly illustrated in this graphic by DOMO, Inc.

As this graphic illustrates, people are using Internet-connected devices more and more in their daily lives. Data scientists are attempting to harness the power of the compiled “big data” from those devices for a number of purposes, from business initiatives to improving healthcare.


DOMO, Inc. Graphic used with permission. Available at:

As Dr. Michiel Ringkjøbing-Elema explains, the availability of data is on the rise and presents opportunities to improve both our clinical trials and ultimately, patient care


Big Data in Healthcare

1 / 8
Big Data
Slide information
Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. And the amount of data is ever expanding.  This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is “big data”.
1 / 8
Big Data
Slide information
While the rise in big data generation and availability is exciting, it means relatively little unless one is able to understand and create value from it.  Over the past few years, the value created from big data has continued to evolve as analytical techniques are developed capable of both managing and making sense of the vast quantity of information available.  Many companies, including the ones depicted here, have tapped into the power of big data to advance their respective businesses and promote innovation in how one makes use of the ever-growing mountains of available digital information.
1 / 8
Predicting treatment response for depression
Slide information
One aspect in particular that big data can potentially benefit the healthcare industry is through the identification of biomarkers.  This will aid the treatment of patients by reducing the number of failed treatments, giving more specificity to the prescription process.  If one can get the right treatment first, they may be more willing to continue treatment and have better long-term outcomes.     Further, big data may also help us to improve the way in which clinical trials are conducted by improving efficacy and potentially reducing discontinuation from studies which, in many trials, is due to an overall lack of effect from the medication being studied.  Through these efforts, new treatments may be developed more quickly and efficiently, ultimately leading to better health for those suffering from depression and other psychiatric disorders.


Creating value from big data

Big Data analytics
Over recent years, the dramatic increase in the number of users and volume of generated data has prompted innovation from both the private and public sector, primarily in the development of methods capable of making sense of such large quantities of data.  Several companies, including giants such as Uber and Facebook, have been able to create value from the data generated, translating user activity into profitable gains for their companies.  As these companies and researchers have acknowledged, however, the sheer volume of data is larger than the tools currently available to analyze them,4 limiting the ability to answer questions based on big data in a cost- or time-effective manner.  As such, and given the exceptional rate at which technologies capable of collecting data has increased in the past 20 years – especially in the past 5 years – methodologies and systems able to analyze such large quantities of data are in continuous development and still evolving.  

While proving profitable in the private sector, with such seemingly unfathomable volumes of data at our fingertips, questions arise as to how one can extract meaningful information on a population and even an individual level in sectors such as the healthcare industry to improve patient outcomes, reduce costs, and increase quality of life.5  In other words, what can big data do for us to improve health for both healthy individuals and those in need of care?

Big data provides an opportunity to use mountains of information from thousands – even millions – of patients to better understand and treat medical illnesses

Systems biology
In order to treat medical diseases, one must have a thorough understanding of what is occurring both chemically and biologically in the human body.  As such, and before taking a discussion on the role big data can play in the healthcare industry, it is prudent to comment on the possibilities big data provides in modeling chemical and biological processes.  Using so called “systems biology,” researchers have begun to explore and better understand how the interactions of biological processes shape human behavior as well as the development and course of various diseases.  Such an understanding would revolutionize modern medicine as it would allow physicians and researchers the ability to better treat a disease from the development of medication to the treatment of a patient in the clinic.  Fortunately, such a revolution in disease understanding has already been made possible.  Big data, and subsequent methodologies to analyze the data, allows researchers to study the interactions of the so-called “omics” (epigenomic, genomics, and proteomics to name a few) in order to better understand how the body functions from a biological perspective.6  This is of vital importance since it is widely accepted that the different biological processes studied through the various “omics” interact amongst each other, not as independent constructs.  Having the ability to assess these interactions has led to breakthroughs in our understanding of diseases, and could potentially lead to advancements in treatments for a countless number of diseases.7

Computer-modeling of disease models and the ability to test the interactions of a wide range of “omics” data allows research scientists the ability to formulate and test hypotheses and in much more efficient manner, ultimately saving valuable time and money.  Progressed further, this technology could lead advancements in the drug development process, including in target validation, and would ultimately lead to improved quality of life for patients by getting safe and effective treatments to market that target the underlying biology of the disease.      

Systems medicine
While the possibilities of big data are vast and include many public health arenas, its utility in influencing healthcare may be one of the most encouraging,10 including the potential for advancements in drug discovery and development.  Within the healthcare industry, researchers hope that big data can serve to effect positive change in ways which were previously not possible. 

Big data allows physicians to use technology to enhance the treatment of their patients by improving outcomes while reducing time and costs17

There are a number of possibilities and questions surrounding the use of big data in healthcare, not least of which is how such quantities of data can be brought to the level of the individual patient.11,12 How can big data be used to impact one person seeking treatment?  Can it be used to personalize care?13  With a rapid increase in the number of devices which can collect data, including watches and mobile phones, more data than ever is being generated and used for a multitude of purposes from promoting healthy behaviors14 to monitoring and improving our understanding of the progression of progressive diseases like Parkinson’s15 and Alzheimer’s.16

Coupled with advancements in the Internet along with mobile phones and wearable devices is also the electronicalization of medical records.  According to the US Centers for Disease Control, in 2013 nearly 80% of physicians used some form of electronic medical record system, an increase from 18% in 2001.18  In 2014, 3 out of 4 hospitals had adopted electronic health record systems, a nearly 8-fold increase from 2008.19  With more and more physicians and hospitals using electronic medical records, researchers will have the ability to assess more data and consequently better understand population health.

While clearly advantageous for public health analytics, what benefit could all of this data have for an individual patient?  Imagine if a doctor could input a patients’ medical history, including their laboratory values, diagnosis, and family history and obtain a recommendation for care from a database of thousands of scientific journal articles written on the topic coupled with tens of thousands of other patients similar to the patient in question.  By translating knowledge gained from systems biology into so-called “systems medicine,” and through the use of cognitive-computing systems such as that being developed through the IBM Watson project, healthcare professionals have begun to see this possibility become a reality.20

As Dr. Torbjörn Hägglöf of IBM Watson Health explains, there are ongoing efforts to harness the power of big data to aide in patient care decision-making

Computer-assisted healthcare: IBM Watson
IBM Watson is a technology platform using a combination of natural learning processing and machine learning to garner insights into a particular query from large amounts of unstructured data.22  IBM Watson first came to media fame in 2011 when it competed against human contestants on the American television quiz show Jeopardy! where it demonstrated an ability to answer nuanced questions in a remarkably accurate manner.  In fact, the IBM Watson system quite easily defeated two of the shows most celebrated winners over a several episode competition.  

In recent years, Watson has been adapted and re-invented in an effort to transform the healthcare industry, where Watson has been directed to assist physicians with both diagnosis and treatment planning for one of the leading causes of death worldwide, cancer.

The “Watson for Oncology” cognitive-computing system harnesses the power of big data to provide an evidence-based treatment plan for each patient23

Oncologists from the hospital and doctors from some of the most prominent cancer research and treatment institutions in the world continually “teach” Watson based on clinical outcomes so that Watson “learns” for future cases. First and Formemost, it provides a completely objective assessment of a patient based on their full medical and social history. 

How it works
With so much information available for Watson to sort through in order to answer a specific question or give a treatment recommendation, how does it determine what the best answer to a question or solution to a problem is?  A multi-faceted process is used by Watson in order to interpret information and ultimately answer a question.  First, Watson determines what type of question is being asked and, more importantly, what the question is asking for by breaking down the question into parts of speech.  Watson then scans its database of information, coming up with thousands of possible solutions.  Where Watson excels – and differentiates itself from simple computers – is in the next step, where Watson tests hypotheses and evidence, developing both pro and con evidence for the thousands of potential solutions gathered in the previous step.  In the final step, Watson ranks the possible solutions based on its hypothesis and evidence-testing as well as on previous experience, ultimately providing a percentage score of how likely that the answer provided is correct.  All of this is done in a matter of minutes.

Clinical Use
As noted earlier, the clinical utility of systems medicine, and cognitive-computing systems like IBM Watson, is exciting.

Systems like IBM’s Watson for Oncology have a number of benefits for the healthcare industry.  First and foremost, it provides a completely objective assessment of a patient based on their full medical and social history.  As any physician knows, there is far too much data for a physician to obtain and assess for each individual patient and new scholarly articles which the physician may have the time or access to read.  Further, these systems “learn” from each patient who has a successful or failed treatment along with every article and textbook written on the disease of interest, allowing confidence that the data utilized in providing treatment recommendations is not based on static or outdated information.  

From a hospital administration perceptive, these systems allow for a virtual, collaborative effort between physicians and researchers worldwide.  It also serves to fill gaps that healthcare shortages can cause or in areas of the world where specialized physicians are in demand.  Within treatment for cancers in the United States, for example, the latest report from the American Society of Clinical Oncology notes that while the number of cancer cases is growing, the clinical workforce is aging and exists largely in metropolitan areas, facts which may adversely impact the ability of the medical community to meet the clinical demand for care.24  Having such diagnostic and treatment recommendation technologies available through Watson or similar systems in underserved or vulnerable populations such as patients in rural, prison, or refugee settings would provide physicians who do not have the support of a large team of colleagues with the ability to obtain a more comprehensive assessment of these patients.  

While the above noted benefits of systems medicine approaches propagated by programs like Watson for Oncology clearly have the potential to advance clinical practice and help patients, a number of challenges are also noteworthy.25  For example, what happens when the clinical recommendation of the treating physician or team of physicians conflicts with that of a system such as IBM Watson?26  Such systems are meant to serve as a guidance mechanism for the physician, not as the definitive solution to treatment or diagnosis.  With such a powerful system providing a conflicting (suggested) treatment course, however, the physician and patient may feel conflicted about what constitutes the correct path to health. Such discussions need to be taken between the physician, patient, and family in order to determine what will have the greatest likelihood of achieving a better quality of life for the patient.

Ethics in Big Data-Assisted Healthcare
Ethical questions, in terms of the way in which data is obtained and utilized, are at the forefront of big data discussions and will likely remain until core principles of use are properly implemented throughout the healthcare industry.27,28

Acknowledging and addressing ethical questions that may arise through the use of big data and systems like IBM Watson is of the utmost urgency

In a broader sense of ethical concerns surrounding big data, the question of who “owns” the data that is gathered through the use of the Internet, mobile and wearable devices, and healthcare information poses a very real and valid ethical question.29,30  With so much data available from so many sources, this question can be difficult to answer.  Can the data gathered through these devices be shared with others?  Who is responsible for the data?  To what extent do people have the “right to be forgotten”, or in other words, to what extent can ordinary people control the access and sharing of their data?31

Discussions surrounding the proper use of big data prompt strong opinions from citizens and policy makers, from private companies and public sector agencies, and from physicians within the healthcare industry.  Further discussions will be required – and soon – in order to ensure that the massive volumes of data being obtained throughout the world are utilized in a manner which is beneficial to society, but also accords with a set of best ethical principles. This will help to ensure the greatest degree of mutual benefit for those wishing to access and use the data as well as those from whom the data is collected.  



The rapid expansion of Internet access and mobile technologies worldwide provides opportunities to revolutionize healthcare in ways that were not possible 15 years ago.  More people than ever use smartphones, wear smartwatches, and have regular and reliable access to the Internet.  These technologies generate vast troves of data, the volume of which continues to grow each day.

It is no surprise that so-called big data is at the forefront of medicine, and cognitive-computing technologies such as those seen with IBM Watson have already started to unlock the power of this data in an effort to aid physicians in the diagnosis and treatment of patients fighting cancer.  With further advancements in this and similar technologies, these systems could be expanded to combat other deleterious and complex diseases, including mental health disorders.  With a growing need for healthcare services worldwide, multi-faceted collaborations between business and the healthcare industry are required in order to ensure the health of the global population.  Harnessing the power of big data may help to fill gaps in care while ensuring better health and quality of life for patients throughout the world. 


  1. Targio Hashem IA, Yaqoob I,Anuar NB, et al. The rise of “big data” on cloud computing: Review and open research issues.Information Systems 2015;47:98-115.
  2. Khan N, Yaqoob I, Targio Hashem IA, et al. Big Data: Survey, Technologies, Opportunities, and Challenges. Scientific World Journal 2014;2014:712826.
  3. Smolan R, Erwitt J. The Human Face of Big Data. Sausalito, CA: Against All Odds Productions; 2012.
  4. Marx V. Biology: The big challenges of big data. Nature 2013;498:255-60.
  5. Tan SS, Gao G, Koch S. Big Data and Analytics in Healthcare. Methods Inf Med 2015;54(6):546-7.
    In this article, the authors discusses how the rapid expansion of data collection within the healthcare industry, and the ability of healthcare professionals to access and leverage this “big data”, could improve healthcare outcomes while simultaneously reduce costs. They discuss the challenges inherent with the use of big data in the healthcare industry, in particular in processing and analyzing such vast quantities of data. The article presents various approaches ranging from efficient methods of processing large clinical data to predictive models that could generate better predictions from healthcare data.
  6. Alyass A, Turcotte M, Meyre D. From big data analysis to personalized medicine for all: challenges and opportunities. BMC Med Genomics 2015;8:33. 
  7. Schneider HC, Klabunde T. Understanding drugs and diseases by systems biology? Bioorg Med Chem Lett. 2013;23(5):1168-76.
  8. Berg EL. Systems biology in drug discovery and development. Drug Discov Today 2014;19(2):113-25.
    This article discusses how the data generated through systems biology omics-based efforts is being used to integrate diverse data types in order to connect molecular and pathway information to predict disease outcomes. It is of their opinion that the creation of better human disease biology models may aid in the prediction of how drugs work in the body, enable the opportunity for more personalized medicine approaches to treatment, increase the success rate of treatments, and potentially find new users for existing medicines.  
  9. Butcher EC, Berg EL, Kunkel EJ. Systems biology in drug discovery. Nat Biotechnol 2004;22(10):1253-9.
  10. Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2014;2:3. 
  11. Bender E. Big data in biomedicine: 4 big questions. Nature 2015;527(7576):S19. 
  12. Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA 2014;311(24):2479-80. 
  13. Chawla NV, Davis DA. Bringing big data to personalized healthcare: a patient-centered framework. J Gen Intern Med 2013;28 Suppl 3:S660-5.
  14. Dallery J, Kurti A, Erb P. A New Frontier: Integrating Behavioral and Digital Technology to Promote Health Behavior. Behav Anal 2014;38(1):19-49.
  15. Oung QW, Muthusamy H, Lee HL, et al. Technologies for Assessment of Motor Disorders in Parkinson's Disease: A Review. Sensors (Basel) 2015;15(9):21710-45.
  16. Geerts H, Dacks PA, Devanarayan V, Haas M, Khachaturian Z, Gordon MF, Maudsley S, Romero K, Stephenson D; Brain Health Modeling Initiative (BHMI). Big data to smart data in Alzheimer's disease. The brain health modeling initiative to foster actionable knowledge. Alzheimers Dement 2016;12(9):1014-21.
    This article discusses how the increase in the amount of "big-data" databases can potentially advance CNS research and drug development. The authors argue that while big data are potentially valuable, analytical methods that go beyond retrospective data-driven associations with various clinical phenotypes are needed in order to make practical use of them. The authors argue that mechanism-based modeling and simulation approaches, where existing knowledge is formally integrated using complexity science and quantitative systems pharmacology can be combined with data-driven analytics to generate knowledge for drug discovery programs, validating appropriate targets for medicines, and optimizing clinical development opportunities. With this information, the authors hope that advances for the treatment of Alzheimer’s disease will also be realized in order to decrease the burden of this often disabling disease.
  17. Roski J, Bo-Linn GW, Andrews TA. Creating value in health care through big data: opportunities and policy implications. Health Aff (Millwood) 2014;33(7):1115-22.
  18. Hsiao C-J, Hing E. Use and characteristics of electronic health record systems among office-based physician practices: United States, 2001–2013. NCHS data brief, no 143. Hyattsville, MD: National Center for Health Statistics; 2014. Available at 
  19. Charles D, Gabriel M, Searcy T. Adoption of electronic health record systems amount U.S. non-federal acute care hospitals: 2008-2014. The Office of the National Coordinator for Health Information Technology. ONC Data Brief No. 23; April 2015. Available at 
  20. Wolkenhauer O, Auffray C, Jaster R, Steinhoff G, Dammann O. The road from systems biology to systems medicine. Pediatr Res 2013;73:502-7.
  21. Doyle-Lindrud S. Watson will see you now: a supercomputer to help clinicians make informed treatment decisions. Clin J Oncol Nurs 2015;19(1):31-2.
    This article discusses the advent of the IBM Watson supercomputing system in collaboration with cancer care providers and how it aids clinicians in making treatment decisions. Clinically, the physician is able to input all relevant clinical information about a patient into the IBM Watson system, allowing the system to review that data against the latest evidence and treatment guidelines. Watson has the ability to standardize care and accelerate the approval process, a benefit to the healthcare provider and the patient.
  22. IBM. What is IBM Watson? Available at
  23. IBM. Watson for Oncology. Available at
  24. American Society of Clinical Oncology. The state of cancer care in America, 2016. Available at
  25. Capobianco E. Ten challenges for systems medicine. Front Genet 2012;3:193.
  26. Fischer T, Brothers KB, Erdmann P, Langanke M. Clinical decision-making and secondary findings in systems medicine. BMC Med Ethics 2016;17(1):32.
  27. Chatellier G, Varlet V, Blachier-Poisson C; participants of Giens XXXI, Round Table No. 6. "Big data" and "open data": What kind of access should researchers enjoy? Therapie 2016;71(1):97-105, 107-14.
    The authors of this article discuss that while advances in big data have the potential to provide benefit to how healthcare is driven, various ethical questions must also be addressed.  This article analyses the opportunities and challenges related to the use of open and/or "big data", from the viewpoint of pharmacologists and representatives of the pharmaceutical and medical device industry.  Some of the core topics discussed in this article is determining who could (or should) have access to which data, how to combine collective interest and protection of personal data and how to finance in the long-term both operating costs and databases interrogation. 
  28. Tractenberg RE, Russell AJ, Morgan GJ, et al. Using ethical reasoning to amplify the reach and resonance of professional codes of conduct in training big data scientists. Sci Eng Ethics 2015;21(6):1485-507. 
  29. European Data Protection Supervisor. Meeting the challenges of big data. Opinion, 7/2015. 19 November 2015. Available at
  30. Kostkova P, Brewer H, de Lusignan S, et al. Who owns the data? Open data for healthcare. Front Public Health. 2016;4:7.  
  31. Payne D. Google, doctors, and the "right to be forgotten". BMJ 2015;350:h27
Country selection
We are registering that you are located in Brazil - if that's correct then please continue to Progress in Mind Brazil
You are leaving Progress in Mind
Please confirm your email
We have just sent you an email, with a confirmation link.
Before you can gain full access - you need to confirm your email.
The information on this site is exclusively intented for health care professionals.
All the information included in the Website is related to products of the local market and, therefore, directed to health professionals legally authorized to prescribe or dispense medications with professional practice. The technical information of the drugs is provided merely informative, being the responsibility of the professionals authorized to prescribe drugs and decide, in each concrete case, the most appropriate treatment to the needs of the patient.
Register for access to Progress in Mind in your country