How the use of data science in healthcare will lead to high-value, personalized care

Atul Butte, MD, PhD

Chief Data Scientist, University of California Health System (UC Health)

How the use of data science in healthcare will lead to high-value, personalized care

10 February 2021 | 16min

Quick Takes

  • The role of Chief Data Scientist is emerging as a key member of the C-Suite in healthcare systems

  • The use of patient data will play an integral role to achieve high value and personalized care

  • For healthcare organizations to succeed they must respect, use, and innovate with their data

The role of Chief Data Scientist is emerging as a key member of the C-Suite in healthcare systems as we embrace digital transformation and the use of data science in healthcare.

Atul Butte, MD, PhD shares his insights around data science and its role in the evolution of healthcare systems to deliver more personalized and value-based healthcare.

The merging of physician and data scientist in the healthcare industry

Atul Butte_The use of data Science in healthcare_content image 1

HT: One of the things that’s unique about your experience is that you are both a physician and a data scientist. As a Chief Data Scientist yourself, why do you think this role is important for hospitals? What are the tasks and/or responsibilities of the Chief Data Scientist that enables a more data-driven, outcomes-based business model?

Atul Butte: It’s still rather new for a health system to have a Chief Data Scientist role. Really the goal is to look at various sources of data with the goal to improve patient outcomes. 

Data sources are vast in the healthcare industry. I deal with a lot of data from input sources such as biomedical research data, clinical data, claims data, and the bills we send to payers. Then we have output sources as well, which include reports on quality; Where we are wasting money?; How do we negotiate better with payers to really reach the value of care we want to deliver? 

I do believe the practice of medicine is a data oriented profession. It’s a data oriented industry. The sooner we learn that the sooner we can use all this advanced data analytics to actually improve what we do.

It is up to us now as Chief Data Scientists to put those statistics together to improve public health. Does this drug seem to work for our patients? We know it worked in clinical trials, but does it work in our patients? Does that medical device seem to be improving outcomes? We might be ordering three or four types of drugs, or five or six types of medical devices, which one of those is the best one for our patients? There could be a medical research effort as well as part of this Chief Data Scientist role.

Then, finally, the new responsibility is preparing the organization for digital health itself. In the United States, we can now write prescriptions for actual applications on smartphones. These are FDA-approved now. To think of what a digital therapeutic looks like is another component of what could fall under a Chief Data Scientist role. 

I think the role is still rare, but I think it’s going to be increasingly common. I do believe the practice of medicine is a data-oriented profession. It’s a data-oriented industry. The sooner we learn that the sooner we can use all this advanced data analytics to actually improve what we do.

Healthcare data science and the role it plays in machine learning and precision medicine

Atul Butte_The use of data Science in healthcare_content image 2

HT: In one of your TED talks, you mentioned that data is key to realizing the dream of precision medicine. But also that “data doesn’t do anything by itself”. How can we use data and statistics, and why is it so important that it’s shared to achieve this overarching goal?

We can’t do it with just the measurements. We need health informatics, databases and tools and artificial intelligence to really then know, with those measurements, this is the right next, safest, cost effective thing to do with this patient. That, to me, is precision medicine.

Atul Butte: We have enormous amounts of data in medicine and the biomedical sciences now. There are two big inputs that I think of. One is the genomic or molecular measurements, which we are getting more and more of from medical research. For example, cancer genomes are getting measured, patients’ kids with rare diseases are getting their genomes sequenced, so the molecular world is bringing a lot of data to us.

Also, electronic health systems are bringing enormous amounts of data. Everything we’re doing and measuring on patients is captured now in electronic medical records. Those two together are going to really input into what we call precision medicine.

To me, precision medicine is defined as the customization of the care we’re going to deliver to a patient based on measurements obviously from that patient, but they’re not just molecular measurements, it could be behavioral measurements – patients prefer this kind of side effect, not that kind of side effect. 

We’re going to get a lot of measurements from patients, but we can’t just do it with measurements from that patient, we need those measurements on a lot of people. More importantly, you’ve got to remember, what did we do to those patients? What worked and didn’t work? 

We can’t do it with just the measurements. We need health informatics, databases and tools and artificial intelligence to really then know, with those measurements, this is the right next, safest, cost-effective thing to do with this patient. That, to me, is precision medicine.

Obviously, we cannot get to precision medicine without having a lot of data. We’ve got to operate on that data in a safe, respectful manner, this is patient data, but that’s how we’re going to get to a more value-based, precise medical delivery system.

HT: What would you say are the biggest challenges from a data perspective for precision medicine? Is there any data that’s missing that we need in order to be able to do that?

Atul Butte: There are a lot of challenges still in achieving precision medicine. We don’t get molecular measurements on every patient, or every patient with cancer. We’re still missing a lot of measurements, but we’ll get there. I think we’re getting to cheaper and more effective tools to diagnose patients earlier and earlier in the disease process.

There’s also a challenge in data harmonization. When we measure something at one hospital, how do we know it’s the same named entity in the other hospital? How do we harmonize those data elements? 

Actually, one of the challenges of data harmonization unique to the United States is that we’re in a competitive healthcare system: health systems that make billions of dollars and actually compete with each other for patients. So, the bulk data sharing across health systems is a challenge, because those are actual patients and customers and clients.

It’s sometimes seen from the technology world that it should be easy to put data sets together, but from the economic, political, and sociological world, it’s much harder. Data harmonization isn’t just a technical problem, it’s a challenge that we don’t necessarily all want to share data with each other because we’re competing for the same patients in the United States.

Then another element that’s still missing is patient data. After we discharge a patient from the hospital with an intensive procedure, what actually happens next to that patient? How often do we hear back that they’re happy with the outcomes of surgery, or not having side effects with a particular drug?

There’s a whole enormous world to patient care that obviously rests with the patient, but we have no good tools right now to actually get that kind of data, those patient-reported outcomes, in a reliable way, nor would we actually know exactly what we should do with that data from a health system as well. There are still many different challenges, even just on the data side, to get to precision medicine.

I think even artificial intelligence and machine learning in the healthcare industry, those tools are still in their infancy. We need to build those tools and training to take these data elements to come up with actual recommendations for physicians, nurses, and the whole healthcare enterprise.

Data collection and data analysis across the entire patient journey is key to deliver patient defined outcomes

Atul Butte_The use of data Science in healthcare_content image 3

HT: How do we collect patient data once they are discharged, for example, to know if they are happy with the outcome? How do you envision this happening in the future? 

Atul Butte: We need to get better at getting more data from patients, especially after we’re done seeing them in an encounter, whether it’s a visit in the clinic or a surgical procedure. But, how do we convince patients to actually give us that data? How do we even reach those patients? I think what it means is changing the mindset of what it means to relate to a patient. 

Think about other industries. Think about an industry like Apple, where Apple knows the name of every single user of an iPhone, the billions of them out there. Think about a car company like Tesla, they know every single driver of every single one of their cars. 

Here in the pharmaceutical and medical device world, I would bet that most companies, most pharmaceutical companies or bio-techs, or device manufacturers have no idea who is actually using their products. They might know the health systems, they might know the doctors, but there’s no connection to the patients right now. It’s intermediate. 

I think that should change in the future. It’s going to be in the interest of all of us to know everyone that’s being treated. Certainly, pharmaceutical companies are going to want to know who exactly is on their drugs. I don’t mean regulatory studies that the FDA makes a company do, but what I mean is that it’s going to be in the interest of these companies to want to know who their customers are and actually to study where drugs are failing, not just working. 

For us in the healthcare sector, we have to get better and better at really understanding our patients. Why do they join our health systems? Why do they leave our health system? Of course, we send out surveys to make sure that the last encounter they had was pleasant, but I think we can go beyond that to understand why they are missing appointments? Why is it that their diabetes or blood pressure is still out of control?

We have to get better at understanding those patients, but that can mean more technological tools too, to understand what are known as the social determinants of disease. Is it about the area that they’re living in? Access to transportation? Access to groceries and healthier food options? All of those are going to have to be in what we think about in our equation in the right way to treat patients. All of that goes still back to precision medicine. We cannot think about patients properly without really taking into account the social determinants of health.

HT:  How do we change the cultural mindset to get patients to share their data and health systems to use more patient-reported data to drive clinical decisions? 

Atul Butte: There is going to have to be a change on the patient side as well as the health system side. I can talk about the health system side in particular. Right now, I don’t think we’re clearly indicating to patients that we want all of their data. In fact, I think we indicate the opposite of that. The devices that patients use, whether it’s spirometers to measure their breathing or a blood glucose machine to measure their blood sugars, they generate a lot of data. The exercise trackers, and the fitness walking trackers generate a lot of data.

Right now, the health systems don’t really take in that data, with few exceptions. Going beyond that, we don’t have the tools in the health system to deal with that type of data to immediately parse through all that data and come up with suggested recommendations. It seems overwhelming for physicians to have that much data from patients. 

We have to be more accepting as health systems to understand we want to see that data. Maybe we could be a little bit more targeted – These are the 5 or 10 questions we’d love to ask you about your disease 30 to 60 days after the surgery. Maybe we should be more targeted and have some way to deliver those questions to the smartphone or the computer that the patients are using.

At the same time, patients, I think, are willing to share. Patients that are ill seem to be the most willing to share to make sure that the diseases are studied, and that they’re best treated. We just have to be better as health systems to tap into that generosity, that data generosity. Patients want someone to listen. We have to be better at listening, and not just to them, but also to all of their technological devices as well.

HT: There’s the data generosity side but then there’s always this fear that you read or hear about that suggests that patients don’t want to share data because it’s such a personal, secure thing. How do you balance the two opposing ideas? Or, in your experience, are patients willing to share their data?

Atul Butte: It is a concerning time for patients, especially with respect to their data. There are news items all the time talking about breaches and misuse of data in many spheres, not just health data, but also credit card data and financial data. It’s a challenging time to keep data secure. 

Beyond that, there’s an element of privacy with not wanting to share intimate details of one’s condition with others. I think though that patients trust the health system. Patients trust doctors and nurses and have trusted doctors and nurses for hundreds, if not thousands of years. Breaches and breaking into databases by malevolent parties are a fact of life. We, as health systems, have to convince patients that we are beyond the state of the art in making sure that all the firewalls, encryption, and cyber security elements that we know of are in place. They should also trust that we want to keep those details private.

There are also regulated uses of data. The phrase I use is safe, respectful, and regulated use of data. In the United States, we’ve had HIPAA now for more than 20 years. We can always argue it could be better, it could be covering more, maybe covering less, but it is the law now and it’s been there for 20 years. A generation of physicians and researchers understand what HIPAA allows and doesn’t allow. At the same time, we’re going to have to continually justify and defend why we capture this data and what we’re doing with it. That is our responsibility as health systems and as Chief Data Scientists.

HT: Do you know of any healthcare systems or have examples of how this data is being collected electronically from patients after they’ve been discharged? 

Atul Butte: Yes, indeed there are ways for health systems to tap into patients and their desire to share more about what’s going on in their lives, especially after we’ve seen them. The simplest way to think about it is through what are called net promoter scores: Would a patient recommend us to others? 

There are companies that run these in the United States, Press Ganey is one that many health systems use. We can tap into those scores to see if someone was happy or unhappy with the encounter. That’s still at a very transactional level. We want to sometimes get to more detailed data elements. How is the patient actually doing with their disease? Not how easy it was to go find a parking spot.

Modern systems for electronic health records, including Epic and I think Cerner as well, now have ways to send questions through their smartphone apps directly to patients. These are called patient-reported outcomes. There are many consortia now setting up standard patient-reported outcome questions that we can use and draw from, that have been validated. I think we’re going to see more of those being implemented by health systems, so we can do it through the apps that these vendors for electronic health records give us to distribute to our patients.

One health system I know that does this really well is Geisinger in the middle of Pennsylvania. I’m on their scientific advisory board, but I’ve seen how Geisinger uses their data and their outcomes in many different ways, including using that to recruit patients to get genetic data so they can better understand how their patients might be doing in the future with diseases they might not have even have had yet, but also understanding how well they’re reacting to surgeries, so much so that Geisinger can now even offer a money-back guarantee if patients are not happy with their procedures or outcomes. I think that’s an example, taken to an extreme, of a health system that really seems to better understand their patients that we can all strive towards.

Key recommendations for data usage in healthcare transformation and value-based care

HT: What would be your top three recommendations to give to healthcare leaders when it comes to using their data to achieve more personalized and value-based care?

Atul Butte:

1 Respect your data. Understand that this is sensitive, it has to be protected, it’s coming from patients. You cannot treat it callously and you have to ensure that those that are touching and using the data have the utmost integrity. You have to continually defend and ensure that.
2 Actually use your data. What that means is sometimes decisions, in any industry including healthcare, are made from the gut, from the heart, but instead you might need to start to make decisions using data. You might not recognize you have that data. You might not recognize that the particular claims data here or the clinical data there are in your databases. So, understanding the data, the catalog of data, the inventory of data you have and how to use it is going to be more and more important.
3 Innovate with your data. Future innovations in healthcare are going to be data-driven. What does it mean now that patients can see their medical records on their smartphone? What does it mean that healthcare providers, such as physicians or radiologists are now starting to put in AI tools so that they can start to actually see and diagnose radiological films or CAT scans faster and faster with a system of computational tools? The way I really like to frame it is either we as a healthcare sector are going to be inventing these tools, or we’re going to end up buying these tools. Which side of those two would you rather be on, especially for academic medical centers? I think it’s our duty to innovate with this data in a safe, respectful and regulated way.

HT: We all have a vision of data-driven healthcare – a patient walks into their doctor’s office, pulls up their Fitbit data, their nutrition tracked, genomic data is there, and the doctor is using all this to put together a preventative, and more personalized health plan for the said patient. When do you see this coming together and really working?

Atul Butte: I think it’s going to happen in prototypes. Let’s say firstly at academic medical centers and places at the cutting edge I’m going to guess within two to three years. It might be that in general practice it is a lot longer than that, probably 5 to 10 years. It won’t have everything, but it’ll have a lot more. I think we’re going to get there. That’s going to be a way for patients to really get incentivized to want to share more because their doctors are doing something with their data. I think it’s going to be a feed-forward loop.

Atul Butte, MD, PhD is the Priscilla Chan and Mark Zuckerberg Distinguished Professor and inaugural Director of the Bakar Computational Health Sciences Institute ( at the University of California, San Francisco (UCSF). Dr. Butte is also the Chief Data Scientist for the entire University of California Health System and has authored over 200 publications, with research repeatedly featured in the New York Times, Wall Street Journal, and Wired Magazine. Dr. Butte was elected into the National Academy of Medicine in 2015, and in 2013, he was recognized by the Obama Administration as a White House Champion of Change in Open Science for promoting science through publicly available data. Dr. Butte is also a founder of three investor-backed data-driven companies: Personalis (IPO, 2019), providing medical genome sequencing services, Carmenta (acquired by Progenity, 2015), discovering diagnostics for pregnancy complications, and NuMedii, finding new uses for drugs through open molecular data. Dr. Butte trained in Computer Science at Brown University, worked as a software engineer at Apple and Microsoft, received his MD at Brown University, trained in Pediatrics and Pediatric Endocrinology at Children's Hospital Boston, then received his PhD from Harvard Medical School and MIT.