AI and Mental Health Care: Issues, Challenges, and Opportunities

QUESTION 7: Can AI help address disparities in access to mental health care and the shortage of mental health providers?

Back to table of contents
Project
AI and Mental Health Care

Background

Mental health care systems worldwide face persistent challenges, including severe provider shortages and unequal access to care.108 Over half of U.S. counties lack psychiatric services, with wait times often exceeding a month for high-risk patients.109 In Alabama, the provider-to-patient ratio is 1 to 1,200. Only 27.2 percent of estimated mental health needs are met nationwide.110 These gaps are compounded by poor Medicaid coverage, a concentration of providers in urban areas, and the underrepresentation of clinicians from diverse backgrounds.111 Marginalized communities face additional barriers, including geographic isolation, cost, stigma, and a lack of culturally informed care.112

Policy and technology have begun to shift this landscape. The 2008 Mental Health Parity and Addiction Equity Act limited insurer restrictions on coverage of mental health services, improving reimbursement options.113 The COVID-19 pandemic further accelerated change by expanding telehealth and normalizing digital mental health tools, aided by the temporary suspension of state licensing requirements that allowed patients to access providers across state lines. Many of these flexibilities have since lapsed, reintroducing regulatory barriers that can limit scalability of telehealth services. Smartphone ownership in the United States now exceeds 97 percent, enabling broader access to remote care.114 Telehealth has reduced transportation barriers and improved access to culturally matched providers, which correlates with higher engagement and satisfaction.115 More recently, the growth of digital tools, ranging from mindfulness apps to medication reminders, has extended mental health support beyond traditional settings. Whether such tools primarily serve as gateways to human care or function as stand-alone substitutes remains unclear.116

From 2019 to 2021, consumer-facing digital mental health apps grew by over 50 percent, appealing to patients by lowering costs and increasing care options.117 AI-powered tools now occupy a growing share of this space. Purpose-built technologies like chatbots, digital triage systems, and symptom monitors aim to address provider shortages and improve access.118 LLM-based chatbots like Woebot and Wysa deliver automated interventions that are designed to reduce mild-to-moderate anxiety and depression in underserved groups.119 Using natural-language processing, triage systems may be able to prioritize high-risk patients, shorten response times, and optimize clinician focus.120 Algorithms deployed on social media platforms could detect acute distress, prompting earlier intervention.121 Digital phenotyping tools can passively monitor indicators like mobility, speech, and sleep through smartphones to predict relapse and depression risk, supporting proactive care for patients with limited clinician contact.122

However, these benefits come with potential risks. Models trained on narrow populations can misread culturally specific expressions of distress, delaying appropriate care.123 Historical biases, such as the overdiagnosis of conduct disorders in youth of color, can be embedded in training data and perpetuated by algorithms.124 Similar biases have been found in broader medical AI systems, such as underestimating care needs in low-income or nonwhite patients when using spending as a proxy for health and underprescribing medication by gender.125 No uniform standards govern bias auditing or demographic reporting in mental health AI tools.

Equity concerns have also emerged. A recent systematic review highlights the potential risks of creating a two-tier mental health system in which disadvantaged groups disproportionately receive AI-only services, while more privileged groups maintain access to human care.126 AI-driven tools could inadvertently displace in-person services in rural or low-resource settings, exacerbating existing inequities.127 Additionally, older adults, low-income populations, and ethnic minorities tend to trust digital tools less and use them at lower rates. Forcing these populations to substitute digital tools for human therapy could be particularly harmful. Currently, no large-scale deployment of AI mental health tools has undergone systematic evaluation for equity outcomes, nor have federal or state agencies established frameworks to ensure equity measures are included in AI implementation.

Digital equity also shapes access. Not all populations have reliable Internet, up-to-date devices, or the digital literacy needed to use AI tools effectively. Engagement drops where infrastructure is lacking.128 Pilot programs that pair AI systems with community training and simplified user interfaces show promise in underserved areas.129 However, lack of systematic evaluation leaves uncertainty about which implementation strategies improve engagement, reduce disparities, or ensure sustained use across different populations.

Responses

A photo of Marian Croak, a person with dark skin and dark hair, wearing business attire and smiling at the viewer.

Marian Croak
 

AI holds significant potential in addressing the persistent disparities in mental health access. Many of these disparities stem from factors such as enduring cultural taboos, financial barriers, and a critical shortage of therapists with both timely availability and cultural sensitivity toward underserved communities.

The use of generative AI for direct therapeutic interventions remains a complex area requiring more scientific research and regulation.130 Additionally, if one assumes that body language, pauses in speech, intonation, and physical appearance account for more communication than words, chatbots and even advanced therapeutic bots still miss valuable communication.131 More research is needed to understand whether these limitations are surmountable.

In the meantime, lower-risk AI tools should be evaluated to see whether they can provide more immediate help to expand the availability of therapists. Theoretically, these lower-risk tools can alleviate clinicians’ administrative burdens, enabling them to see more clients, and can also support the training of new providers who are equipped to meet the needs of various communities.

First, to reduce their workloads, therapists are increasingly employing purpose-built and general AI tools for several relatively low-risk tasks. These include automating scheduling and appointment communications, transcribing sessions and assisting with clinical documentation, and simplifying billing through insurance verification and claims processing. AI also streamlines the initial patient intake by automatically gathering necessary demographic and medical information, as well as the reasons for their visit. Additionally, chatbots are being used to efficiently address common patient inquiries (FAQs) about mental health services.

Despite addressing relatively low-risk administrative functions, the deployment of these AI tools requires a degree of human oversight to ensure accuracy and mitigate potential privacy and security vulnerabilities. As these technologies become increasingly integrated into the daily practice of therapists, empirical verification will be essential to confirm their purported benefit of reducing workload and thereby enhancing practitioner capacity to serve more clients.

Although more rigorous scientific evidence is still needed, another promising trend that has a high potential for rapidly scaling the training of new therapists is the use of collaborative AI–human interaction training tools. Examples of these approaches include:

  • Creating learning tools to explain different therapeutic approaches and modalities as well as potential diagnosis and possible interventions.

  • Dynamically adapting learning material to the personal learning needs and goals of new therapists.

  • Using AI tools to track trainee progress, monitor improvements, refine supervisor feedback, and suggest specific areas where additional focus is needed.

More research is needed to understand the efficacy of these training practices and how they are used in practice.

Finally, to truly expand access to AI mental health services, greater cultural sensitivity and competency are essential. While clients tend to be more satisfied and engaged with culturally similar therapists, cultural competence and adaptability also can enhance the therapeutic relationship.132 As AI models advance in cultural awareness and context sensitivity, they will play an increasingly significant role in training both new and seasoned therapists to effectively extend their practices to wide-ranging communities.

 

A photo of Alison Darcy, a person with light skin and long gray-brown hair, wearing a green top and facing the viewer.

Alison Darcy
 

According to the popular view, AI will reinforce bias and exacerbate existing disparities in access to quality mental health care. Bias in AI or in any service is indeed a key consideration, though it is neither new nor unique to AI. All science is subject to the “garbage in, garbage out” principle; that is, conclusions are only as good as the data from which they were derived, and in health care research, bad data can have real consequences for patients. In my early research at Stanford we discovered, for example, that diagnostic instruments used to detect incidents of eating disorders reflected attributes that male patients did not endorse (e.g., fear of weight gain among boys with anorexia) because they had been developed with predominantly female participants.133 This has real consequences, leaving people undiagnosed, untreated, or given the wrong treatment. Thus, the issue of bias in data is not abstract, and many researchers and commentators have demonstrated incidents of real bias in AI today.134

However, it is important to note that implicit bias exists among human health care workers. While the risk of further embedding bias in AI is indisputable, such risk needs to be considered in the context of the institutional biases and significant disparities that are baked into today’s system. AI could play a crucial role in addressing and mitigating disparities in mental health access if developed correctly and intentionally.

For example, throughout our research program, which, at the time of writing, includes eighteen randomized trials and twenty-five published, peer-reviewed studies in the scientific literature, we have employed recruitment practices to deliberately oversample for diversity in our participant group. This enables us to compare findings across demographic groups to ensure equitable effects—something that most studies are underpowered to do if they employ only the minimal acceptable recruitment practice of recruiting a sample that is demographically representative of the population.

This has resulted in some early findings that support the potential usefulness of purpose-built chatbots for mental health among typically underserved or underestimated communities. For example, in a cluster analysis study to examine latent usage patterns among users of Woebot, we identified a group of users who appeared to use the app in a way that gave rise to the steepest symptom reductions over the shortest period of time, relative to the other groups.135 Those “efficient users” were more likely to be younger (average age=36), uninsured, non-Hispanic Black males. This group reported greater affinity to the chatbot (on the Bond subscale of the Working Alliance Inventory) and saw the largest symptom reductions in depression and stress despite using the app for shorter periods of time overall.

As an academic health researcher and clinician, I worry that we are developing our own biases and holding AI to different, higher standards than any other health intervention. Any AI should be evaluated by the same standards as other DMHIs (digital mental health interventions), as many of us argue in this publication. This does not mean we should not include the special considerations that AI presents, but rather that they must be grounded in the ultimate truth of symptom improvement.

Many scientific and clinical leaders have studied the issue of embedding diversity, equity, inclusion, and belonging into the DMHIs that we are creating in health care settings, including our team. A robust strategy begins with an organizational commitment to embedding these values into the fabric of the organization such that it is then imprinted into the essence of the intervention or product itself. At Woebot we appointed an individual with a specific demonstrated skill set at filling recruitment pipelines with diverse candidates to run recruitment efforts and actively measured and rewarded success here. We never appointed an individual because they were a “diverse” candidate; rather, by ensuring that the pipeline was inclusive, we always hired the very best person for each role.

A later initiative established a clinical diversity advisory board that met every six weeks, created a full charter to produce thought leadership, and helped further the organizational vision to create an inclusive intervention experience that reflects the lived experience of all individuals who use our products, as well as the diverse perspectives of clinicians, corporate partners, and policymakers.

Methodological considerations in evidence generation

Existing frameworks like PIDAR (Partner, Identify, Demonstrate, Access, Report) and RE-AIM (Reach, Effectiveness, Adoption, Implementation, and Maintenance) provide guidance, emphasizing diverse partnerships, technology access, digital literacy, and data practices. Despite general support for DMHI efficacy, little is known about outcomes across diverse subgroups, and trials often lack diverse samples or focus on minority populations. Scholars have also noted that sociodemographic data are under­reported in DMHI clinical trials. To genuinely assess DMHI impact on health equity, evidence-generation methodologies must be carefully designed, prioritizing the inclusion of individuals with lived experience and fostering diverse research teams to broaden the pool of future researchers.

Recruiting diverse samples is crucial for evidence generation. Tactics include culturally sensitive materials, community outreach, partnerships with diverse health care systems, resources for limited Internet access, and diverse research teams. At Woebot Health we collaborated with the Scripps Translational Science Institute to successfully recruit a diverse sample, reaching our target of approximately 50 percent from historically underrepresented groups, including racial/ethnic minorities and rural residents. Real-world partnerships like this are vital, and clinician-referred recruitment should be managed carefully to avoid biases. “Opt-out” methods for study invitations from health systems or patient registries could increase diversity. Recruitment targets should be based on mental health problem prevalence rates in specific sociodemographic groups, and oversampling for specific groups may be necessary to achieve sufficient sample sizes for examining group differences.

Thoughtfully designed and consistently implemented sociodemographic surveys are essential. They should include culturally sensitive and inclusive response options for race, ethnicity, sexual orientation, gender identity, and social determinants of health like food and housing insecurity. Electronic responses and privacy assurances promote honest disclosure. Challenges exist in collecting comprehensive data outside research settings, requiring stakeholder agreement and user testing to consider assessment burden, privacy, and cultural appropriateness.

Analyzing and reporting sociodemographic characteristics and outcomes across subgroups are integral to understanding the impact of DMHIs on health equity. DMHI research should report sample sociodem­ographics, and consumer-based data should be presented with sociodemographic breakdowns. Ideally, outcomes are reported by sociodemographic groups or included as covariates. For small subgroups, we suggest exploratory, descriptive, and hypothesis-generating approaches (e.g., within-group and between-group effect sizes). Researchers could also analyze structural factors that contribute to inequities.

Finally, we advocate for assessing a multitude of secondary outcomes, aside from efficacy and safety, that include engagement, satisfaction, and feasibility, reflecting both researcher and participant priorities. Qualitative data are hugely valuable for understanding the metrics and can be used to triangulate diverse user experiences.

As many of us argue in this publication, we should not hold AI to a different standard than already exists for the systematic exploration of the usefulness of any other device or mental health intervention, and so here we draw from the significant and growing field of DMHI science to outline best practices. My team has written extensively on this and is published in the peer-reviewed literature.136 A robust strategy for addressing health equity includes the following.

Equity-informed implementation science in real-world deployments

A vital consideration when evaluating mental health technology is how it is actually put into use. Commercial deployments are where the “rubber hits the road” and real-world effects are observed, whether effects on individuals, families, communities, or even the bigger health care systems. This is where we see whether these technologies truly make a difference.

Solid implementation science can help guide how well these mental health technologies are adopted, how much they cost (health economics and outcomes research), and what gets in the way of their adoption. Collecting these real-world data is crucial for understanding how effective these tools are and for making sure they’re actually helping to close health equity gaps rather than making them worse.

Health care organizations that are buying or using DMHI tools should demand that the manufacturers show how their products are designed with health equity and responsible AI principles. They can use existing frameworks to guide them in asking questions about who these tools reach, whether they are specifically designed for vulnerable groups, and how they’re monitoring outcomes to ensure fairness. Some organizations are even creating their own frameworks to ensure equity.

AI may be ideally positioned to collect data on social determinants of health because chatbots are already conversing with individuals in the naturalistic setting of their home. This information can help health care systems offer the right support and resources to those who need them most, potentially leading to earlier help and better-tailored interventions, benefiting underserved people, manufacturers, and health care systems.

Finally, we point to new roles that AI might be ideally positioned for, like “digital navigation” in which patients can be guided through their treatment options, improving their decision-making and potentially matching them with options that fit their lived experience, which can impact access and speed of support.

Real-world data from a deployment designed specifically to address health care disparities

We partnered with Virtua Health, a midsize health care provider in New Jersey that wanted to explore whether Woebot could play a role in addressing health equity in its large and unevenly distributed provider system. Early data suggested that the follow-up rate with Woebot was approximately four times greater than the referral completion rate among traditional behavioral health sources in the primary care setting; individuals had on average more than three times the number of encounters with Woebot than that of traditional referrals, with more than three-quarters of those encounters being outside clinic hours. Participants using Woebot saw a full category reduction in symptoms of depression (measured by PHQ-9) and anxiety (GAD-7) in eight weeks and completed routine patient reported outcome assessments 85 percent of the time. Anecdotally, physicians reported that social determinants of health domains emerged far more often in their conversations with Woebot than they had anticipated because, they said, they often do not have time to assess them in the context of a typical encounter. Our data scientists have shown that Woebot can identify social determinants of health domains at approximately a four-times-greater frequency than the literature suggests is detected in health care settings using self-report instruments. In addition to supporting patients in identifying social determinants, connecting people to services is another example of how AI can help a whole-person approach and leverage data collection and reimbursement opportunities for the health system.

In conclusion, we should resist the temptation to think of patient-facing chatbots as replacing the role of therapists, because to do so misses the point entirely. We have an opportunity to deploy AI in ways that fill the many gaps that our fragmented health care system creates today, supporting better outcomes for everyone in ways that elevate the role of the human clinician and offer an invaluable window into the lived experience of patients. After all, mental health doesn’t stop once we leave the clinic.

 

A photo of Arthur Kleinman, a person with light skin, gray hair, and a gray beard and mustache, wearing a brown jacket and blue shirt, and smiling at the viewer.

Arthur Kleinman
 

This question is potentially the most troubling regarding the use of AI-driven interventions in the mental health field. The presence of disparities and issues of efficiency will drive AI creators to argue for its adoption and claim that bots can replace human beings and are cheaper and more efficient. I feel this logic is basic to the use of AI in business settings and will be carried over to health care as health care is further privatized and its business interests prioritized. (Look at what private equity companies do when they acquire health care assets: strip them of things that can be monetized and sold off and emphasize efficiency to such a degree that quality care is worsened.) From my perspective, the real question is how to prevent the use of AI in mental health care from following this logic. If we begin with the central value of care and list its human qualities, then we will always conclude that AI interventions are appropriate when they augment, not replace, human caregivers. AI can be used today to contribute to clinical diagnoses, refine history taking, improve differential diagnosis, and add to the factors that go into making clinical judgment more useful, but AI cannot substitute for human clinical judgment and care.

Today, we can more usefully speak of culturally informed care than culturally competent care. AI can contribute significantly to culturally informed care delivered by mental health professionals. It can do this by providing information about cross-cultural differences, differing religious beliefs and practices, and patterns of behavior that may amplify or disguise symptoms. In the same way, AI can contribute to the care of underserved populations by improving understanding of the influence of structural factors like poverty on the course of disease and outcome. This is information based in what LLMs can provide that should identify many culturally relevant factors that clinicians can consider.

To assume that any policies can prevent AI from exacerbating existing health care inequalities would be extraordinarily naive. Health and social inequality run throughout our health care system in the United States and throughout systems all over the world. These are the same inequalities that policymakers in our country have been unable to control because their causes are so fundamental to the political economy of our society. The goal should be that AI does not make existing inequalities worse. That AI in and of itself might improve health and social inequalities seems highly unlikely, because it is not going to be creating structural transformation. Nonetheless, keeping this as a desirable goal for AI would be a positive step forward if it can be stated in practical terms with specific guidelines.137

 

A photo of Kacie Kelly, a person with light skin and long blonde hair, wearing red glasses and colorful business attire, and smiling at the viewer.

Kacie Kelly
 

What are the most promising applications of purpose-built AI in expanding access to mental health services?

Severe mental health workforce shortages leave vast gaps in access to quality mental health services. Purpose-built AI, and digital mental health tools more broadly, can alleviate this burden by bolstering the existing mental health workforce and helping to make it more effective and efficient, expanding access to mental health services.

AI can optimize the existing workforce by improving the screening, diagnosing, and matching of patients to the right provider more quickly. Difficulty finding a mental health provider (e.g., due to insurance, specialty, geography, cultural fit) is a critical barrier. By connecting customers to the right provider sooner, AI can help deploy our overstretched mental health workforce more effectively while alleviating barriers to care.

AI can help stratify patients by risk, helping crisis response teams, health systems, and providers to reach more patients in need. By identifying individuals with lower-acuity needs, AI can also support stepped care models—where those individuals are directed to less-intensive services, such as peer support, nonspecialist providers, or even AI-powered therapeutics—freeing up clinical capacity for higher-acuity cases. AI can also help with clinical note-taking and electronic health record (EHR) documentation, which is well-documented to lead to burnout.138 Finally, AI can enhance traditional forms of measurement-informed care with early risk detection, measurement of outcomes, and quality monitoring.

How can AI supplement the work of mental health professionals or substitute for them when access is unavailable or limited?

AI-enabled digital mental health tools may supplement care provided by mental health professionals. These tools can serve multiple roles across the care journey: as an adjunct to care, as a lead-up to care for someone who is waiting to be seen by a clinician, or as a follow-up to treatment.139 This is especially important given the persistent workforce shortages and geographical inequities in mental health access. Many Americans live in areas with no access to mental health providers. More than 60 percent of people receiving mental health care do so from their primary care physician (PCP).140 While PCPs may be able to offer screening and medications, patients who seek care in primary care offices receive little to no behavioral therapy or education outside integrated behavioral health care models.

AI can help fill this gap. High-quality, evidence-informed, culturally and linguistically competent AI-enabled digital therapeutics can expand access to quality mental health care to those who may not otherwise receive it. Likewise, AI-enabled digital therapeutics can optimize the mental health workforce, particularly primary care providers who are often providing mental health care. This offers important opportunities for improving health equity, potentially enabling more personalized, culturally relevant, and language-accessible care at scale for underserved populations.

AI therapy chatbots are still in their nascency but show potential for increasing access. While promising, these tools must be deployed with caution to ensure transparency about when users are interacting with AI, clear guardrails around scope and safety, and feedback loops that enable clinician oversight when appropriate. Human connection remains essential, and AI should be designed to complement—not replace—the work of trained professionals where possible.141 AI can also play important roles in risk stratification, measurement, and monitoring.

How can AI be integrated into existing health care systems to support, rather than replace, human providers?

AI is creating new opportunities to expand and improve traditional forms of measurement in mental health care, including measurement informed care (MIC), which leverages repeated, systematic use of validated measures to inform treatment decisions and monitor progress over time.142 Traditional measurement in mental health is based on the use of self-reported assessment tools, such as the GAD-7 and PHQ-9. The subjective nature of such measurement tools may lead to both over- and under­reported symptoms—or may miss impor­tant aspects of daily functioning that are of primary importance to patients.143

AI can enhance traditional MIC in several ways. Traditional assessments are designed to be brief to accommodate clinical workflows and represent snapshots in time.144 AI enables a more expansive view that brings in “real-world” data from commonly owned digital devices, such as smartphones and wearables, to provide information about behavior, cognition, and mood. These data can be used in conjunction with traditional self-report assessments to detect changes in mental health conditions or predict relapse.145 A recent systematic review found that physiological and behavioral data collected through digital phenotyping methods (e.g., mobility, location, phone use, call log, heart rate) can be used to detect and predict changes in symptoms of patients with mental health conditions, allowing for intervention even before an adverse event occurs.146Additionally, the increasing use of telemedicine for mental health visits has produced new sources of data, including videos from patient visits, audio recordings, eye movements, and so on.

Natural language processing (NLP) may be used more widely in the future of clinical mental health.147 For instance, NLP technology is being tested to analyze data in electronic health records (EHRs), including clinical measures, clinician notes, comorbid conditions, and sociodemographic factors, to predict symptoms of severe mental illness and suicidal ideation and attempt.148 For example, the Department of Veterans Affairs (VA) and scientists from the National Institute of Mental Health (NIMH) developed an expansive suicide mortality risk–prediction algorithm using Veterans Health Administration (VHA) electronic health records, enabling the VA to provide a more targeted, enhanced outreach and care program for veterans identified as being at high risk of suicide.149 More recently, the VA added the use of NLP to tap into unstructured EHR data, such as clinical notes, to enhance the accuracy of this risk-prediction algorithm, resulting in an additional 19 percent accuracy. This demonstration showed that NLP-supplemented predictive models improve the benefits of the predictive model overall.150

While promising, more research is needed to better understand how digital phenotyping and NLP incorporated into MIC might affect patient engagement and treatment efficacy.

What challenges arise in ensuring AI tools provide culturally competent and context-sensitive care?

Though AI offers the potential to expand mental health care access and improve care quality, caution must be applied to ensure it does not exacerbate disparities. Culture, beliefs, identity, and life experiences all impact how we perceive and experience mental health conditions, as well as the treatments, coping mechanisms, and supports that are effective for each individual.151 Additionally, these identities mediate how we communicate and express symptoms of mental health needs, suggesting implications for what is considered an appropriate use of AI systems.

Despite the need for culturally relevant tailoring, research on NLP rarely discusses the role of demographic differences and language expression.152 In one example, researchers found an association between depression and certain language features, including “I-usage.” A recent study looking at the performance of language models used to detect signs of depression on social media posts found an association between “I-usage” and white individuals but not Black individuals.153 Researchers then tested how race impacted the performance of language-based depression models. The model performed poorly with Black people, meaning AI applications designed to detect mental health conditions in social media posts, clinical notes, or other text could miss or misinterpret language patterns from different racial or cultural groups. If we embrace technologies such as these without ensuring safety and efficacy across populations, we risk exacerbating existing disparities.

This example is a cautionary tale about the need to include diverse groups of people when developing and testing AI applications. Rather than ignoring differences between different populations, a proactive recognition of difference can help overcome these gaps. This should include training and testing on a representative and diverse population and incorporate a variety of perspectives in the development of AI. For instance, researchers at the University of Texas at Austin and Cornell University are working to develop models that could identify social risk factors for Black youth with the goal of more accurately identifying suicide risk in young Black youth. Moreover, we must understand that the very statistical underpinnings of AI look for an optimal pattern or norm and thus may inherently create a bias against difference.154 When not intentionally designed to include “edge” cases, AI is less effective for those existing outside the norm (e.g., someone who is neurodivergent or has a disability).155

How can AI tools help bridge the gap for underserved populations without exacerbating disparities?

Many of the improvements to mental health care delivery that AI may bring could also help bridge gaps for underserved populations; however, the potential for bias must be addressed to ensure the technology’s success. We cannot assume AI will improve measurement, prediction, or care for all. Instead, we must intentionally design it to do so and monitor its impact.

The critical importance of training and testing AI systems on diverse and representative populations is well-known. The “intelligence” gained from AI must be understood within the limitations of the training data used and the potential for bias.

We must also address whether a specific AI application is safe and accurate for all people who might use it. Aggregate outcomes obscure the potential inaccuracy for specific groups.156 Thus, outcomes must be disaggregated, especially for populations that experience mental health or other health disparities.157 This is particularly true for health systems using AI.

One of the primary ways algorithmic bias is introduced is by mismatching what an algorithm is intended to predict versus what it is actually predicting.158 For instance, a landmark study found that by conflating prediction of future health care needs with prediction of future health care costs, a widely deployed health care algorithm led to “enormous racial bias” impacting “important medical decisions for tens of millions of people every year.”159 The bias was created because using health care costs instead of needs overlooked barriers to access that cause some populations to be less likely to get the health care they need. Mental health applications are particularly vulnerable to using proxy measures because of the relative subjective nature of many mental health symptoms and diagnoses; the lack of true biometric data makes finding precise measures rather than proxies particularly challenging in mental health care. The Algorithmic Bias Playbook offers a process to identify and mitigate “label” bias in algorithms used in systems (e.g., health systems, insurance companies). The process involves identifying all algorithms used in AI/predictive technologies, articulating their ideal and actual targets, and evaluating bias risk.160

AI should be employed specifically to detect and redress bias and health disparities. AI could even be used to reveal phenomena driving health disparities that human beings are unable to detect. For instance, NLP is an emerging approach to screen for and identify stigmatizing language in EHRs, automatically alerting clinicians and their supervisors.161 One promising study showed AI was able to decipher previously unexplained disparities in knee pain across patients from underserved populations with diseases such as osteoarthritis. With the increased precision and ability for AI to see what human beings cannot, AI may “better capture [need and] potentially redress disparities in access to treatments.”162 As science on brain function and mental health continues to advance, AI may facilitate more precise diagnosis that more accurately screens and detects mental health issues across populations.

 

A photo of Daniel Barron, a person with light skin and short brown hair, wearing a dark business suit and white shirt and smiling at the viewer.

Daniel Barron
 

AI carries the potential to ameliorate mental health care access disparities and provider shortages, if we deploy it strategically to perform specific, well-defined clinical jobs that currently stretch our resources thin. Elizabeth Yong and colleagues highlight AI’s attractiveness for enhancing accessibility, particularly for underserved communities, by taking on certain tasks.163 Promising areas include AI tools automating administrative jobs (freeing up clinician time for tasks that they are uniquely qualified to perform), delivering scalable psychoeducation for defined conditions (a specific content delivery job), or supplementing diagnostic capabilities in underserved regions by assisting local providers with specific analytical tasks (see Table 1 for further discussion).

AI can act as a supplement to mental health professionals by handling such defined tasks, or, in limited, validated cases, substituting for them when access for a specific job is otherwise nonexistent. Liana Spytska emphasizes AI as a complement for certain jobs, not a full substitute for tasks requiring deep human connection.164 Just like any technology within health care, integrating AI into existing health care systems should focus on tools that support human providers within their current workflows for defined tasks.

Making sure AI tools deliver culturally competent care when doing their designated tasks demands thoughtful design and training on diverse datasets relevant to the job. (The same case is often made for medical student training.) Rinad Bakhti and colleagues found culturally relatable and coproduced DMHIs (a specific type of AI-driven job) showed higher engagement.165 The FAITA-Mental Health framework includes “cultural sensitivity” for evaluating AI tools performing specific mental health tasks.166 To genuinely bridge gaps for underserved populations, we must choose AI tools for tasks that are most needed and can be effectively handled by AI in those specific contexts, without inadvertently creating a “second-class” standard of care that deepens disparities.

Endnotes