Key Lessons Involving Generative AI Mental Health Apps Via That Eating Disorders Chatbot Tessa Which Went Off The Rails And Was Abruptly Shutdown – Forbes
Gleaning useful lessons for generative AI-based mental health apps.
In today’s column, I am continuing my ongoing series regarding the use of generative AI for mental health guidance. I’d like to share with you some key lessons gleaned based on an eating disorder (ED) advisement chatbot named Tessa that made big headlines in mid-year 2023 for having gone off the rails and subsequently being abruptly shut down. This is a tale of the ages. Lots can be gleaned from the ins and outs of this intriguing eyebrow-raising circumstance.
My aim in this discussion is to focus on the overarching AI technological considerations and how this forewarns us about the spate of rapidly emerging AI-based mental health apps coming into the marketplace day by day.
On a related note, there is a fruitful abundance of leadership, business systems, and experimental research-oriented lessons to be garnered from the Tessa incident too. I’m not going to venture into those in this discussion. Instead, I will merely lightly touch upon those facets herein and primarily be focused on the AI particulars. I want to cover the lessons learned about how AI and especially generative AI is or ought to be utilized when it comes to devising and fielding mental health treatment apps.
If you’d like to get up-to-speed on my prior coverage of generative AI in the mental health sphere, you might consider for example these analyses:
I believe that you will find today’s analysis of the chatbot that went astray to be quite absorbing. Pack a sandwich and have handy a nice cold drink for your journey.
Backstory Of The Eating Disorder Advising Chatbot
Let’s begin at the beginning and cover the backstory involved. As mentioned, this is going to be about a chatbot that was aiming to assist with eating disorders.
According to established medical research, eating disorders are widespread in the U.S. and considered one of the most debilitating and deadliest mental illnesses:
Trying to educate the public at large about eating disorders and what kinds of mental health treatment are best undertaken remains a tough task to accomplish. People are likely to search on the Internet for information, assuming that they at all consider the matter substantive enough and become determined to find out about the topic. The trouble with randomly seeking Internet-based information is that there are rampant falsehoods, disinformation, and misinformation aplenty out there.
Fortuitously, numerous carefully curated and suitably devised web-based materials are also available online, including ones that are intended to serve as a kind of coursework endeavor for someone seeking eating disorder help. These web-based tools have gradually been either augmented with or at times replaced by mobile apps. A mobile app can in today’s times be more advantageous for usage since a person can download the app and make use of it at any time on their smartphone (in contrast, a web-based capability would usually require an online connection, which might be sometimes unavailable or difficult to access).
A notable element or feature of mobile apps for mental health advisement is that the use of a text-oriented conversational computer-based facility can be incorporated. The conversational component is usually loosely referred to as a chatbot. We all seemingly know about chatbots these days, especially as a result of the advent of generic generative AI such as the widely and wildly popular ChatGPT by AI maker OpenAI, along with many other generic generative AI apps such as Google’s Bard and Gemini, Anthropic’s Claude, etc.
Let’s take a brief pause in this rendition for an important callout.
I will momentarily be clarifying what the word “chatbot” entails. In short, not all chatbots are the same. Thus, consider the word “chatbot” to encompass a wide range of capabilities, sometimes of a narrow and crudely simplistic nature, while at other times being much more robust and interactive. I’ll get more into this shortly. My point is that many people often blur things by assuming or believing that all chatbots are the same. They are not.
Okay, we will now enter into the specific instance that will be the focus for the remainder of this discussion. The circumstances revolved around an eating disorder chatbot that was referred to as Tessa. I am getting you ready for what took place.
Here is an excerpt from a research study that discusses these matters and is entitled “Effectiveness Of A Chatbot For Eating Disorders Prevention: A Randomized Clinical Trial” by Ellen E. Fitzsimmons-Craft, William W. Chan, Arielle C. Smith, Marie-Laure Firebaugh, Lauren A. Fowler, Naira Topooco, Bianca DePietro, Denise E. Wilfley, C. Barr Taylor, Nicholas C. Jacobson, International Journal of Eating Disorders, December 2021:
Please note that as stated in the above excerpt, the researchers referred to their app as Body Positive and were using a chatbot named Tessa to deliver the capability. For purposes of the discussion herein, let’s go ahead and refer to the app overall as Tessa, which is pretty much what all of the reporting on the matter did at the time that things rose in abundant prominence in the media. In any case, on a bit of a technicality, I just wanted to clarify that Tessa was considered the delivery mechanism.
The researchers had mindfully sought to devise and test the Body Positive program’s capabilities, doing so before further releasing the program beyond a research environment. Like most such research, once the capabilities of a research endeavor seem to be relatively well-tested and ready for public use, the hope is to make the capability available to a wide audience.
You might be surprised to know that at times some really good programs for mental health guidance go no further than a research lab and sadly do not come to the attention of the public at large.
Part of the reason that sometimes a program doesn’t make the leap from a research orientation to a publicly available option is that researchers might not have the commercialization skills or money to bring their program to the marketplace. There is a big difference between doing things in a lab setting versus gearing up to undertake commercial usage by perhaps thousands or maybe millions of people.
Another consideration is where would be the best place to make your program available to the world. You want your research-backed program to be seen and used in the right places, rather than being buried amidst zillions of other wanton apps that languish in some massive and confounding free-for-all app store. Standing out in a wheat-from-the-chaff manner is a big issue and you don’t want your top-researched program to be tainted by those fly-by-night apps that were made without an iota of systematic bona fide work.
In this instance, the researchers noted that a suitable venue would be a non-profit organization known as the National Eating Disorders Association (NEDA):
Furthermore, the researchers realized that using a chatbot facility could readily make the program more accessible and would undoubtedly increase the chances of people actively opting to use the eating disorder advisement therein:
I trust that you get the gist of the situation.
It is straightforward.
In recap, a web-based eating disorders program was reworked into becoming an app that would leverage the added benefits of leaning into a chatbot capability. People using the app would be able to seemingly interact with the program conversationally. Doing so enhances the experience for the users since they are having a “personalized interactive” experience somewhat akin to interacting with a human advisor (not necessarily so, but people might perceive this to be the case; more on this later on herein).
We shall see what happened next.
Get ready.
When Tessa Went Off The Rails And The World Howled
Upon the eating disorder chatbot Tessa becoming widely available via NEDA, there was a rapid viral-like realization by some that the chatbot was giving inappropriate advice about eating disorders. Indeed, the advice was at times the complete opposite of what is considered proper treatment. Social media inflamed the situation. A swirl of media attention was like hungry sharks circling the water for easy prey.
The whole matter became a headline-grabbing confabulation.
We’ve seen the same consternation about chatbots many times. I’ve covered numerous instances wherein a chatbot was made available and people right away discovered intrinsic toxicity or other foul maladies such as undue biases and discriminatory wordings, see the link here. A type of contentious confusion can arise when this happens. On the one hand, it might be that the chatbot was poorly devised and readily emitted toxicity. On the other hand, sometimes people go out of their way to trick or fool a chatbot into saying things that otherwise would not normally be emitted, see how this happened to generative AI in the early days, at the link here.
This conundrum takes us down a bit of a rabbit hole. You might persuasively argue that no matter what people enter into a chatbot, the chatbot should never emit anything untoward. Period, end of story. A counterviewpoint is that if people push hard enough, the odds are they are going to break a chatbot, and in that case, perhaps the issue should be at the feet of the people who try to undercut the chatbot rather than the chatbot per se. The contention is that this is why we can’t have shiny new things, namely, smarmy people ruin it for all of us.
Moving on, the researchers who had devised the system were reportedly dismayed and shocked that Tessa was doing what it was purportedly doing. They had carefully sought to ensure that this kind of improper output would not occur. Yet, despite their best efforts, they suddenly had a firestorm on their hands.
On May 31, 2023, when the headlines were exceedingly blaring about Tessa as being askew, NPR reported that one of the researchers insisted that the chatbot would not have gone off-the-rails because it was devised in a rules-based manner versus a generative AI manner (I’ll be explaining this difference in a moment):
On June 8, 2023, NPR ran another piece about the matter on a follow-up basis and said this:
Let’s see if we can lay out what seems to have happened.
A carefully devised, tested, and well-researched program that was using a chatbot interface and did so via rules-based constructs had reportedly been changed midstream to employ generative AI. Imagine the surprise that this would bring to the research team that had toiled night and day to bring the app to the marketplace. Their hard-fought efforts to try and mitigate emitting any dire falsehoods by the chatbot were negated. One would naturally think this would be a heart-wrenching piece of news.
I might add that I’ve developed many AI systems during my lengthy career of developing proprietary apps for companies, and at times been crestfallen that a company might later on decide to make changes that undercut or undermine crucial backbones of the AI app. They would often do so without telling me beforehand. Just a tweak here or there, they would later tell me. Meanwhile, the app falls apart or does things that make me cringe and I close my eyes and dearly hope that no one ever associates me with the now fouled-up AI.
As a general rule of thumb, there is often an ongoing difficulty in marrying the build stage of AI app development with the implementation stage. The implementation side might run wild. If you are only sought solely for the development side, you often have to pray and hope that the implementation will go well since you have no hand in the rollout.
I might add, in all fairness, sometimes a builder does zany things too. Perhaps their app isn’t ready for prime time, but they push it over into production anyway. As such, the implementation or production side will potentially have to make changes or do some rejiggering to make up for the quandary of loose bolts and screws that would otherwise sink the ship.
Separating the build side from the implementation side is an inherently dangerous affair. Not only can things go horribly wrong, but the separation often pits the two parties against each other, spiraling downward into becoming finger-pointing opponents. It was your fault, one side proclaims. No, it was your fault, the other side replies. The app becomes a soccer ball that gets kicked around as each side tries to defend its positioning and denigrate the posture of the other side.
Messy.
And, sometimes scandalous.
The Deal About Rules-Based Versus Modern Generative AI
A vital lesson that I want to concentrate on is the notion of what constitutes a rules-based chatbot versus a generative AI chatbot. Most people are not especially familiar with the difference. The usual assumption is that one chatbot is just like another.
Time to do a bit of a historical discourse.
Let’s first cover what a rules-based approach consists of, along with what a data-driven approach such as generative AI consists of. By getting those two major fundamentals onto the table, we can subsequently see how this pertains to chatbots.
You might have faintly heard about or maybe even lived through an AI era known as expert systems, also known as rules-based systems, and at times referred to as knowledge-based systems. Here’s the deal. There was a belief that a viable means to devise AI systems was to do so via the codification of rules. You would go to human experts in some domain or field of interest, interview them to surface the rules that they used when doing their work (this process was coined as knowledge acquisition), and you would then enter those rules into a specialized program that would execute or perform the stated rules.
Voila, you have essentially embedded human expertise into a computer program. All kinds of rules-based systems were devised. If you wanted an AI system that could do what a medical doctor does, you would earnestly try to get the physician to reveal all the rules that they use when performing medical work. Those medical-oriented rules would get entered into an expert system shell. The expert system would then be put into use, presumably being able to mimic or perform medical work on par with a human physician.
Several limitations emerged.
First, you might have a devil of a time getting experts to reveal their rules. A person might naturally be hesitant or outright resistant to giving up their secret sauce. Maybe doing so puts them out of a job. Even if someone is willing to spill their guts, the question is whether the stated rules are indeed the actual rules that they are using. A person might rationalize what they do, meanwhile, they might be actually doing something else. You can’t be sure that the rules are bona fide.
Second, for smaller sets of rules, getting the rules into an expert system and testing it was relatively easy to do. Scale though made a difference. The chances are that any full-bodied in-depth set of expertise is going to encompass thousands upon thousands of rules. You potentially have a morass on your hands. When should one rule prevail over another? What should be done if two or more rules are in direct conflict with each other? Etc.
Third, maintaining and doing the upkeep of a rules-based system could be problematic. Experts tend to change their viewpoints and often devise new rules or alter old rules. The same changes had to be made to the codified rules to make sure that the expert system was still on target. Once again, having to figure out the conflicts between rules and how to align the rules was often very challenging.
You probably know what eventually happened.
The era of AI consisting of rules-based systems hit a proverbial wall. People felt that going bigger wasn’t particularly productive. A gradual falling out of rules-based systems occurred. Many now refer to the result as the AI Winter, namely that AI fell into a bit of despair and no longer had the glow it once had.
That covers the rules-based approach.
Shift gears.
We need to discuss the contemporary data-driven approach to AI.
You certainly know about the wonders of today’s generative AI such as ChatGPT. The thing is you might not be familiar with how it works. Does generative AI use rules akin to the rules-based expert systems of yesteryear? No, that’s not the crux of things for generative AI.
Take a different tack toward trying to make AI. In the case of generative AI, the underlying AI technique and technology are known as large language models (LLMs). The idea is to use mathematical and computational pattern-matching to examine huge amounts of data. Find patterns in the data. Be able to then make predictions based mathematically and computationally based on the data used in the initial training of the AI.
What kind of data?
Let’s aim to dissect human language as expressed in zillions of written narratives, essays, books, poems, and the like, as scanned and found across the Internet. There is a lot of data to be had.
We are aiming to model what human languages consist of, doing so by having a large-sized model and using a large amount of data to do the pattern-matching training. The easiest way to conceive of this is the auto-complete feature in a word processing package. How does the auto-complete identify what will be the next word that you might type? It does this by having pattern-matched passages of human-written text. Humans tend to compose their words in somewhat predictable sequences. The odds are that the next word you plan to type can be predicted.
Generative AI takes this approach to a heightened scale.
Predicting the next word is pretty easy and simple. Suppose we use the mathematical and computational approach to predict the next sequence of words that will complete a sentence. Harder to do, but still possible. Envision that we use the same approach to predict the rest of a paragraph, or perhaps the rest of an entire essay. In a sense, that’s what generative AI is doing, though on a word-at-time basis. For more details, see my discussion at the link here.
The mathematical and computational pattern-matching uses a type of model that is somewhat loosely portrayed as akin to the human brain and the neural networks of the brain. I say loosely because do not be fooled by the naming involved, some say “neural networks”, but I prefer to say “artificial neural networks” to highlight that this computational structure used for machine learning is not the same as the complexity and nature of the human brain. A lot of people fall into the trap of assuming they are one and the same. Nope, not at this time.
What do we get by using these large language models or LLMs?
You get nifty results such as generative AI.
When you use a generative AI app, you are almost immediately awestruck at the apparent fluency involving natural languages such as English. The language is highly conversational. It is gob-smacking. To clarify, it is not a sign of sentience. Some argue that today’s generative AI is sentient, see my discussion at the link here, but that is a bridge too far. You are witnessing mathematical and computational modeling at scale. Some refer to generative AI as a stochastic parrot, others say it is nothing more than an extensive auto-complete function.
We are now at the million-dollar question.
Which is better, a rules-based approach to AI or a data-driven approach such as LLM and generative AI?
You would be hard-pressed to find pundits these days who would be willing to proclaim that the rules-based approach to AI is better than the data-driven approach. Many have opted to disparage the prior days of the rules-based AI era. Out with the old, in with the new. The data-driven approach is heralded today. Another name given to the rules-based methods is to say that it is a symbolic approach, while the data-driven is more ground-level and described as a sub-symbolic approach.
A bitter war has been ongoing between those who believe the future of AI is at the sub-symbolic level versus at the symbolic level. I often get asked at AI conferences whether I believe in the symbolics versus the sub-symbolics. Well, frankly, I am a proponent of combining the two together, as I describe in detail at the link here. I believe that each has its merits and there are synergies to be had. Whether those synergies lead to true AI, referred to as artificial general intelligence (AGI), nobody can say. We might eventually need to find some other completely different approaches and abandon the old ways of the symbolics and the sub-symbolics.
Speaking of the old ways, some hark back to the era of rules-based systems as a somewhat golden age. A phrase that is commonly used is to say that was a time period consisting of GOFAI, Good Old-Fashioned AI. Be careful if you tell an AI person “GOFAI”. They might be someone who relishes the rules-based era and is happy to hear the expression, while a sub-symbolic proponent might tell you to toss your GOFAI out the window.
All in all, I have brought you into the fold about a rules-based approach to AI versus data-driven generative AI.
We next need to see how this plays out when it comes to chatbots.
Chatbots And What They Are Made Of
A chatbot is a program that can chat with a user, carrying on some semblance of a conversation.
Easy-peasy.
As an example, I’m betting that you’ve used Siri or Alexa. Those are reasonably construed as chatbots. You say something to them and they respond. You can carry on a conversation. But, how good or fluent is that conversation?
The existing versions of those chatbots are quite obviously wanting. You assuredly have had many frustrating moments trying to get Siri or Alex to grasp what you are saying. Some people give up the attempt to be fluent and instead speak in rudimentary words, one at a time. In addition, you find yourself trying to avoid expressing entire sentences. The use of terse commands of a couple of words is the way to proceed. Otherwise, these chatbots get confused and can’t discern what you are asking or telling the AI to do.
In stark contrast, if you proceed to use a generative AI app such as ChatGPT or Bard, you right away shake your head and wonder why in the heck can’t Alexa or Siri converse in that fluent of a manner. As an aside, you’ll be happy to know that both Alexa and Siri are getting a complete makeover and overhaul. They will be making use of generative AI.
I bring up the qualms about those popular chatbots to divide the world into two different types of chatbots.
First, there are the older types of chatbots that were devised via the techniques of natural language processing (NLP) that we used to customarily use. That’s what Alexa and Siri consist of. Second, there are the newer types of chatbots that now use generative AI. This is the newer approach to NLP.
Consider for a moment the prior NLP method (I am going to simplify things, which I mention for those of you versed in NLP and that you probably will have some heartburn here, sorry).
Remember how in your English grammar classes you learned to parse a sentence by looking for the subject, a verb, adjectives, nouns, and the like? In a sense, that was the way that NLP used to be undertaken. A sentence would be parsed step-by-step to find the key grammar elements and various rules of grammar were then applied. Out of this, you could determine the syntax and also make reasonable guesses at the semantics or meaning of the sentences.
This is reminiscent of the rules-based approach that I earlier mentioned. We can come up with rules to figure out what sentences consist of. We then apply those rules to parse sentences. It makes sense and it is how humans seemingly have figured out the nature and meaning of sentences (well, as taught to us in school).
I will freely admit that my ability to remember all the strict rules of grammar is long forgotten. When my children went through it in school, I sheepishly realized that I seemed to no longer know the rules. Somehow, by osmosis, I just seem to know what a sentence consists of. I may subconsciously be using the rules that I learned as a child, or I may be doing something else, such as pattern-matching.
Aha, pattern-matching!
You should be thunderstruck by the phrase pattern matching, i.e., the method used for generative AI and LLMs.
We do not craft generative AI by instructing the AI on the explicit rules of grammar. Instead, we allow mathematical and computational pattern-matching to figure out how to parse sentences, uncovering whatever patterns might be discoverable. Does the generative AI approach ultimately land into devising its own set of grammar rules such that sentences consist of subjects, verbs, and the like? Some say yes, and others disagree, see my discussion at the link here.
Here’s where we are on these two sides of a coin. You can devise a chatbot that uses what is essentially a rules-based approach, or you can devise a chatbot that uses a data-driven approach. The “old” way was via the rules, and the new way was via the data-driven angle.
The rub is as follows.
When you use a rules-based approach, you can do extensive testing to see whether the rules are doing the right things. You can also inspect the rules. Furthermore, you usually aim to ensure that the whole concoction is repeatable. Each time that you run the chatbot, you can be relatively assured of what the outputs will consist of. This is known as being deterministic.
When you use the data-driven approach such as with LLMs and generative AI, there aren’t any predefined rules. Nor are there explicit rules that appear once you’ve done the pattern-matching (at least not that we’ve yet figured out how to surface suitably). You just have a massive computational model. It is hard to inspect it. It is hard to know what it is going to do. This is especially the case because there is usually a statistical and probabilistic underpinning to it. This is known as being non-deterministic.
I have inch by inch led you to an enigma of a riddle.
Would you rather use the old way of NLP that is going to be more predictable and in a sense safer because you can anticipate what it will do (i.e., deterministic), but at the same time the fluency is less pronounced, or would you prefer to use the high fluency of LLMs and generative AI of the latest in NLP, but at the same time not be fully assured of what the AI is going to do (non-deterministic)?
Ponder that mind-bending puzzle.
I’ll add more fuel to the fire.
You have likely heard about so-called AI hallucinations (I don’t like the use of the word “hallucinations” in this context because it overly anthropomorphizes AI, see my discussion at the link here). When using generative AI, there is a chance that the AI will make up things, such as telling you that Abraham Lincoln flew around the country in his jet plane. The fictitious stuff can be hard to ferret out since you might not have anything else to compare to the generative AI output (the ground truth). Whenever you use generative AI, you are always at risk that you will get made-up falsehoods, see my analysis on how to deal with this, at the link here.
You would rarely encounter a similar problem with the older style of NLP. You could still have this happen, but it usually is because the testing wasn’t exhaustive enough. That being said, the larger a rules-based approach gets, the more testing is required, and is increasingly arduous to fully touch all bases.
I trust you are mulling this over.
I would guess that you might reach a particular conclusion, which I’ll reveal next.
AI-Based Mental Health Chatbots Amid Risky AI Behaviors
Suppose I ask you to devise an AI-based mental health app that is a chatbot.
Let’s assume that you are serious about doing so. You realize that people will tend to believe whatever the chatbot tells them. A person is going to trust that the chatbot is telling them the truth. If the chatbot tells a person to do something that we know is risky, the chances are that the person might proceed based on what the chatbot told them.
This could be bad, really bad.
If you use the old way of NLP, you can generally anticipate and test beforehand for what the chatbot is going to say. This provides a semblance of relief. You can screen things in advance and aim to ensure that nothing zany is likely to be emitted. The danger or risks of zany stuff appearing are somewhat minimized. The people relying upon your mental health chatbot will be better served. You will also hopefully be somewhat protected from liability since you have chosen a means of seeking to reduce risks.
I’m sure you are thinking that the problem though is that the old way of NLP is not as fluent as the newer way of doing things.
Will the user be satisfied with a more stilted form of conversation?
Okay, so you decide you’ll use the newer way. You opt to use generative AI. Fluency is amazing. But you can’t especially control it and you can’t especially test it fully. A ticking time bomb exists. At some point, there is a solid chance that the highly conversational NLP Is going to say something that is false or misleading. The person relying upon your mental health app could be harmed. Bad for them. Bad for you too, since you are exposing yourself to heightened liability even if you try to declare upfront that people should be cautious using your chatbot.
Do you see how rough a choice this is?
You are between a rock and a hard place.
In with the new, out with the old, but maybe risky suffering occurs. Keep with the old, and set aside the new, but maybe the fluency is so lacking that people won’t use the chatbot. You could end up with a well-tested and low-risk mental health app that nobody wants to use. Meanwhile, someone else has thrown caution to the wind, and their generative AI chatbot for mental health is scoring big-time usage. Little do they know, or maybe they do and don’t care, an ongoing risk awaits their users and themselves. Little do the users know, or maybe they do and don’t care, that they are taking a heightened risk and could be bamboozled by the AI.
Yikes, what a mess.
More About Tessa And The Choices Made
I’d like to take you back into the Tessa circumstance. Doing so will vividly illustrate the tradeoffs I’ve been articulating.
Let’s take a look at some salient excerpts from a research article entitled “The Challenges in Designing a Prevention Chatbot for Eating Disorders: Observational Study” by William W Chan, Ellen E Fitzsimmons-Craft, Arielle C Smith, Marie-Laure Firebaugh, Lauren A Fowler, Bianca DePietro, Naira Topooco, Denise E Wilfley, C Barr Taylor, Nicholas C Jacobson, JMIR Formative Research, 2022.
First, as I already noted, the idea was to develop a program called Body Positive that was delivered via a chatbot named Tessa:
To do this, the researchers opined that due to the mental health nature of the app, the prudent path would be to use a rules-based approach:
The path they describe is becoming the anticipated two-step nowadays, namely initially developing a rules-based version, testing it, improving it, expanding it, and then further down the road consider infusing machine learning or some kind of generative AI.
They then mention the tradeoffs associated with rules-based versus a more open-ended generative AI version:
You can plainly see the dilemma as they have earnestly noted it.
In summary, one supposes that an everyday chatbot that is going to advise someone about how to best put together a kegger party can feel somewhat at ease with using generative AI. The risks are low. For someone who wants to devise a chatbot that proffers mental health advice, well, they ought to be thinking carefully about using generative AI to do so. A rules-based approach is going to reduce risks while using generative AI has the potential to shoot the risks right through the roof.
Example Of Rules-Based Approach To AI Mental Health Advisement
I put together a series of short examples to help highlight the rules-based approach versus the open-ended data-driven generative AI approach.
Here’s how I will proceed.
I am going to pretend that there is a mental health disorder known as “portmantua”. I purposely am making up this fake disorder because I don’t want any reader to become preoccupied with whether or not the disorder is being properly depicted. That’s not the point of this exercise. The crux is that I want to demonstrate the rules-based versus the data-driven approaches in a mental health chatbot context.
Also, I am going to radically simplify the mental health advisement aspects. Again, the concept is to merely be illustrative. You would not want to devise an AI-based mental health chatbot based on the sparse and concocted aspects that I am going to be making up. Keep your eye instead on the aspects of rules versus data-driven, thanks.
With those important caveats, here is a description of the (entirely fake) portmantua:
Okay, that was quite a broad-brush description of a mental health disorder and its corresponding symptoms, along with what to do if the symptoms arise. Extremely simplistic. Highly unrealistic. Again, it is a made-up exercise only.
Suppose that I wanted to develop a rules-based approach to providing a chatbot that would interact with people and seek to aid them with potentially experiencing portmantua.
I am going to use four rules, whereby three of the rules correspond to each respective symptom, and a fourth rule will be a diagnosis and recommendation. The rules will consist of questions along with what to do depending upon the answer that the user gives to the rule.
Here we go.
The first question is this: “Do you have periodic hot sweats for no apparent reason?”
(a) If the reply is “No” then emit the message “That’s good.”
(b) If the reply is “Yes” then emit the message “That is worrisome.”
(c) If the reply is anything other than “Yes” or “No”, emit the message “I appreciate your answer and will ask my next question.”
The second question is this: “Have you had a lack of hunger even when having not eaten for quite a while?”
(a) If the reply is “No” then emit the message “Great!”
(b) If the reply is “Yes” then emit the message “That is interesting.”
(c) If the reply is anything other than “Yes” or “No”, emit the message “Your answer is noted and I will ask my next question.”
The third question is this: “Does mental haziness sometimes occur such that you cannot remember what happened in the last two to three hours?”
(a) If the reply is “No” then emit the message “Wonderful.”
(b) If the reply is “Yes then emit the message “Troubling.”
(c) If the reply is anything other than “Yes” or “No”, emit the message “Thanks for the answer.”
After having asked those three questions and gotten answers, the final response should be one of the following:
(i) If all of the questions were answered with a “Yes” then emit the message “You might have portmantua, go see your doctor as soon as you can.”
(ii) If any of the questions were answered with a “No” then emit the message “I doubt you have portmantua.”
(iii) If any other answers were given other than a “Yes” or a “No” then emit the message “I wasn’t able to determine whether you have portmantua or not.”
Please take a moment to examine those rules.
If I asked you to strictly abide by those rules and carry out a session asking someone about whether they might be experiencing portmantua, could you do so?
I would wager that you could.
Each rule is easy to read and comprehend, and easy to convey to someone else. The answers by the user are restricted to “Yes” or “No”, though there is a provision if the person diverts and provides some other answer. We can all agree that this is ridiculously simple, but the gist is that we can compose lots and lots of rules and make them as complex or as simple as we like.
Could we stridently test the rules to see if they were internally considered complete?
Sure, in this case, the responses are considered a finite set. Each question can be answered as either “Yes” or “No”, plus we allow for other responses but will lump those as being other than the words “Yes” or “No”. If you ran these repeatedly with lots of people, you might get some that answer the three rules of questions with answers of Yes for Rule #1, Yes for Rule #2, and Yes for Rule #3, so let’s represent that as [Yes, Yes, Yes].
The finite set then consists of these possible responses:
We can make sure to include the possibility of anything other than a “Yes” or a “No” by including an “Anything” response too, like this [Yes, Yes, Anything], [Yes, Anything, Yes], etc.
For a relatively modest set of rules, we can exhaustively test this to see what happens for each instance. We would then adjust as needed and feel comfortable such that we can predict what the chatbot is going to say to the users.
I am next going to do something a bit tricky, so please follow along with me. First, I could readily enter the above rules into an expert system and use the expert system to execute or perform the rules. Rather than doing so, I am going to use ChatGPT to execute or perform my rules. This is kind of odd because usually, you would use ChatGPT for the fluency that it provides as a generative AI chatbot. I am going to give prompts to ChatGPT that tell it to strictly perform my rules. I am purposely going to try and restrict ChatGPT to just abide by the rules that I’ve come up with. It is an easy way to simulate an expert systems approach. Yes, you guessed it, I am opting to take the lazy man’s prerogative on this.
I entered suitable prompts and then decided to start a run-through with ChatGPT by saying Yes to each of the questions about my potential symptoms.
According to the fourth rule, if I say “Yes” to each symptom, we should get a final diagnosis and recommendation that says: “You might have portmantua, go see your doctor as soon as you can.”
Drumroll please as we see what happened.
Go ahead and compare the above dialogue with what I had stated in the set of rules. Everything seems to have worked as expected. We have ourselves a (simulated) rules-based expert system. Quite exciting.
I proceeded to do other variations such as saying [Yes, No, Yes], and indeed the appropriate reply from Rule #4 was emitted. I tried nearly all of the possibilities. I don’t think there is any need to walk you through each of them. You get the essence of things.
I will do something else that might catch your eye.
Suppose I do not explicitly use the word “Yes” in my answers, and yet I express a semblance of yes for each of the symptoms. What will this rules-based approach produce as an answer? You might be tempted to believe that an expression of yes ought to be sufficient to be interpreted as having entered three Yes indications.
But realize that I am restricting what the chatbot can do. It is only to abide by the rules. If you look again at Rule #4, it says that if anything other than “Yes” or “No” is then the designated response is “I wasn’t able to determine whether you have portmantua or not.”
Let’s see what happens.
You can plainly see that my answers were expressed as a yes even though I didn’t explicitly use the word “Yes” in my answers. Presumably, this should have implied that I do potentially have portmantua. But, because the fluency of the chatbot was limited (on purpose), the response was that portmantua could not be determined.
If I found this during testing, I would likely want to change the rules so that a kind of yes would be considered an actual yes. The problem for the older NLP is that you might not be able to finely tune the NLP to deal with ambiguities. A more fluent NLP might be needed.
That allows me to move to the next example.
I am now going to use ChatGPT in its conventional fluent manner. I will feed to ChatGPT my above essay narrative that describes generally what portmantua consists of. Thus, I won’t give ChatGPT any explicit rules. I am starting fresh and only telling ChatGPT the brief description. That’s it.
Are you on the edge of your seat to see what happens?
Continue reading to find out.
Example of Generative AI Approach To Mental Health Advisement
As just mentioned, I fed the earlier-mentioned description about portmantua into ChatGPT. I told ChatGPT to go ahead and diagnose me and give a recommendation.
Here’s what happened.
Observe that I didn’t use “Yes” and “No” as answers. Instead, I was fluent in my entries. Likewise, ChatGPT was fluent in the responses that I got. There seemed to be a dialogue.
ChatGPT generated a final response that I seemed to potentially have portmantua, which makes sense because I gave replies that were essentially all Yes answers. In addition, ChatGPT also provided some suggestions about going to see a healthcare professional.
My next attempt was to provide answers that were essentially all No answers. Again, this is being done on a fluency basis and we will have to see how ChatGPT handles things.
Here we go.
The final response by ChatGPT seems to be on target.
I had stated that I didn’t have any of the asked-about symptoms. ChatGPT echoed back that I indeed don’t seem to have the symptoms, based on my replies. In the usual way, ChatGPT is tuned to respond, there is a caution in the final response that tells me to possibly consult with a healthcare professional anyway.
You might be tempted at this juncture to declare generative AI as the winner in this kind of competition. The thing is, we have to try and see what ChatGPT does when the wording gets more out of whack. Also, we are always on the cliffhanging edge of getting an unsuspected AI hallucination.
For my next entries, I will do my best to try and give answers that strongly suggest that I don’t have the symptoms. A human therapist would likely see right through my answers and get the drift of what I was saying.
Here’s what the generative AI did.
I’m not overly thrilled with how the generative AI handled this. A human therapist would almost surely have opted to dive deeper into my answers. Also, I heavily implied that the “symptoms” were based on other factors beyond that of portmantua (which, of course, ChatGPT did somewhat account for by the caveat about the symptoms being attributable to other factors).
Now, to be fair, I had included in my establishing prompt that I wanted ChatGPT in this series of runs to be succinct. That was my doing. I will tell the generative AI that being more conversational is okay.
Here’s what happened.
I stopped the dialogue because it would have gone on for a while if I kept trying to stretch things out. The aspect I wanted you to see is that the generative AI generally parlayed with me, seemingly, and took things in stride when I was repeatedly unsure of the question about having hot sweats involved.
That’s the beauty of using generative AI. It is also a curse or concern, namely that there was no particular means for me to predict beforehand what the generative AI was going to say. I would have to basically hope that nothing untoward was emitted.
Conclusion
Depending upon a mere glimmer of hope that a chatbot won’t say something severely inappropriate or outrightly wrong is not a prudent way to devise an AI-based mental health chatbot, especially when the risks are high and human lives and mental health are at stake.
By and large, a rules-based approach would be more restrictive on what the person can enter, and restrictive on what the chatbot will say in response, but you can test it extensively beforehand and overall aim toward being lower in risk.
You might remember that I earlier indicated that a combination of rules-based and data-driven is an upcoming merged-style approach. Some refer to this as neuro-symbolic AI or hybrid AI. Perhaps we can have our cake and eat it too. Have a core set of rules. Surround this with generative AI. Allow the rules to brush back the generative AI when it gets out of range. The rules would seek to catch any AI hallucinations or oddball interactions and stop or correct things before anything goes demonstrably awry. That’s an approach that AI researchers and AI developers are working on.
I’ll end today’s discussion with a crucial phrase for anyone devising AI-based mental health chatbots. These are words that ought to be carved in stone and kept above the doorway leading to wherever the AI development and AI implementation is taking place.
Written in Latin the well-known phrase is this primum non nocere.
First, do no harm.