New ChatGPT model reduces unsafe replies by up to 80% – Digital Watch Observatory

Digital Watch Observatory
Digital Governance in 50+ issues, 500+ actors, 5+ processes
Home | Updates | New ChatGPT model reduces unsafe replies by up to 80%
ChatGPT now encourages real-world connection and offers grounding steps for distress, while ongoing evaluation will refine safeguards and taxonomies.
OpenAI has updated ChatGPT’s default model after working with more than 170 mental health clinicians to help the system better spot distress, de-escalate conversations and point users to real-world support.
The update routes sensitive exchanges to safer models, expands access to crisis hotlines and adds gentle prompts to take breaks, aiming to reduce harmful responses rather than simply offering more content.
Measured improvements are significant across three priority areas: severe mental health symptoms such as psychosis and mania, self-harm and suicide, and unhealthy emotional reliance on AI.
OpenAI reports that undesired responses fell between 65 and 80 percent in production traffic and that independent clinician reviews show significant gains compared with earlier models. At the same time, rare but high-risk scenarios remain a focus for further testing.
The company used a five-step process to shape the changes: define harms, measure them, validate approaches with experts, mitigate risks through post-training and product updates, and keep iterating.
Evaluations combine real-world traffic estimates with structured adversarial tests, so better ChatGPT safeguards are in place now, and further refinements are planned as understanding and measurement methods evolve.
Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!
More news
The Digital Watch is an initiative of the Geneva Internet Platform, supported by the Swiss Confederation and the Republic and Canton of Geneva. The GIP is operated by DiploFoundation.
The GIP Digital Watch observatory reflects on a wide variety of themes and actors involved in global digital policy, curated by a dedicated team of experts from around the world. To submit updates about your organisation, or to join our team of curators, or to enquire about partnerships, write to us at digitalwatch@diplomacy.edu. We look forward to hearing from you.

source

New ChatGPT model reduces unsafe replies by up to 80% – Digital Watch Observatory

New ChatGPT model reduces unsafe replies by up to 80% – Digital Watch Observatory

Jesse

https://playwithchatgtp.com