Whether people like it or not, consumers are already turning to artificial intelligence for medical advice—whenever they feel the need.
This is fundamentally reshaping access to medical information. Yet with these new possibilities comes a significant concern: such recommendations are not always reliable. According to figures OpenAI published in January, more than 40 million people ask ChatGPT health-related questions every day. Roughly one in four of the service’s approximately 800 million regular users submits a medical query at least once a week.
The debate over how artificial intelligence should be deployed, regulated, and evaluated in clinical settings often overlooks an obvious reality: direct consumer use of these systems has already arrived. “Too often people are using this as an expert and not as an assistant,” American Medical Association CEO John Whyte said in an interview with Axios.
On one point, experts broadly agree: artificial intelligence should not replace a physician—at least not yet. The more practical question, however, is different—how useful such a tool can be when a doctor is unavailable or when a person simply does not have one.
“We have made access to medical information and medical judgment in this country so difficult—and ChatGPT makes it so easy,” said Ashish Jha, the former White House COVID-19 response coordinator under President Biden and a former dean of Brown University’s School of Public Health. Expecting such tools to be as accurate as a physician, he added, is “absurd, given how much more convenient they are.”
The risks, however, are obvious. “I think there is a possibility that bad things could happen. … Is it dangerous? I believe the current status quo is also dangerous,” said Bob Wachter, chair of the Department of Medicine at the University of California, San Francisco and author of the book “A Giant Leap: How AI is Transforming Healthcare and What That Means for Our Future.” “The question is what you would have done without it,” he added.
A recent study published in the journal Nature found that ChatGPT underestimated the severity of medical emergencies in roughly half of the cases in a test conducted by researchers.
Karan Singhal, who leads the company’s medical AI initiatives, said the latest GPT-5 models correctly route emergency cases in nearly 99 percent of situations. In real-world use, she noted, health-related conversations in ChatGPT typically unfold across several stages: the model asks clarifying questions, gathers additional context, and only then formulates its response.
Attention is now turning to what new restrictions and rules at the state and federal levels may emerge for the use of artificial intelligence in healthcare.
“In the United States, we do not regulate the availability of information,” said David Blumenthal, former president of the Commonwealth Fund. However, he added, rating organizations may emerge to assess the reliability of different chatbots for specific tasks.
Several conclusions emerge from conversations with experts.
First, artificial intelligence performs better at some tasks than at others. According to Whyte, chatbots can be useful, for instance, in explaining test results or helping patients compile a list of questions for their doctor before an appointment.
But that does not mean users employ them in that way. Jha—who believes large language models are still “not ready for full-scale use” in diagnosing diseases—argues that people will continue using them to try to understand the causes of their symptoms simply because “they were already doing that with Google, and this is much better than Google.”
Ultimately, he says, “we still do not have a truly clear understanding of what these tools are good for—and what they are not.”
Second, the outcome depends heavily on the original prompt, and ordinary users are often unable to formulate it properly.
“The way a patient’s question is phrased can lead to different answers from a language model,” says Monica Agrawal of Duke University. If someone provides incomplete context, shares subjective impressions, or begins with an incorrect assumption, the model may reinforce that misunderstanding rather than correct it—even more so than a physician might.
Third, the problem may lie in the very manner in which answers are delivered. “What worries me is that some of these models speak with a level of confidence that is not actually justified,” Jha notes.
In addition, models are often designed to provide users with the answers they expect to hear, Agrawal adds. Where a physician might challenge a statement or express doubt, a system does not always display that kind of behavior.
“If you tell me, ‘I have a headache,’ I’m not going to reply, ‘Oh, I think you have a migraine.’ I’m going to say, ‘Tell me more,’” Wachter explains. “The tools do not yet do this naturally—although I suspect future consumer systems will.”
Finally, most people using artificial intelligence lack the knowledge required to recognize an error. According to Wachter, there is a noticeable gap between the professional use of these tools and the way ordinary people rely on them.
For physicians, such systems can be extraordinarily useful. The average patient, however, often lacks the medical training needed to recognize when a model’s answer does not apply to their situation—or simply appears questionable.
At the same time, the models are continuously retrained—and overall they are steadily improving.