Helping ChatGPT better recognize context in sensitive conversations
Mirrored from OpenAI for archival readability. Support the source by reading on the original site.
May 14, 2026
SafetyHelping ChatGPT better recognize context in sensitive conversations
New safety updates help ChatGPT respond safely when risk emerges over time.
People come to ChatGPT every day to talk about what matters to them—from everyday questions to more personal or complex conversations. Across hundreds of millions of interactions, some of these conversations include people who are struggling or experiencing distress. We design our systems to respond carefully in these moments, including by providing crisis resources and connecting people with someone they trust when needed.
Today, we’re sharing new details about safety updates that help ChatGPT better recognize when risk may be emerging over time by identifying subtle or evolving cues, and using that context to inform safe responses. This helps ChatGPT distinguish between the hundreds of millions of safe interactions people have every day and the much rarer cases where added caution is needed, so it can respond more carefully—for example, by de-escalating, refusing harmful details, or redirecting toward safer alternatives.
These improvements build on years of extensive work across model training, evaluations, monitoring systems, and more than two years of collaboration with mental health and safety experts.
Why context matters in sensitive conversations
In sensitive conversations, context can matter as much as a single message. A request that appears ordinary or ambiguous on its own may carry a very different meaning when viewed alongside earlier signs of distress or possible harmful intent. To respond appropriately, we train ChatGPT to recognize the potential harmful intent from the surrounding context so that it can refuse the request, de-escalate, and guide the user toward support.
These cases are uncommon, but critically important to get right. Our goal is to help ChatGPT connect relevant signals when they matter without overreacting in ordinary conversations.
We focused this work on acute scenarios including suicide, self-harm, and harm-to-others. Working with mental health experts, we updated our model policies and training to improve ChatGPT’s ability to recognize warning signs that emerge over the course of a conversation and use that context to inform more careful responses.
In these rare, high-risk situations, ChatGPT can better distinguish between benign requests and those that may signal a higher risk of harm. This builds on our safe completion approach, which is designed to refuse unsafe parts of a user request, and respond cautiously where it can safely do so. The goal is to help the model respond more appropriately to context, escalating caution when signals of harm emerge within conversations, while continuing to respond helpfully in benign situations.
Improving safety across conversations
Some safety risks can emerge across separate conversations. One conversation may include subtle signs of potentially harmful intent and then another may include related requests that only trigger concerns when understood in combination with the prior context. Without that safety-relevant context, the later conversation – and potentially important warning signs – may appear benign
Building on our longstanding work to strengthen ChatGPT’s ability to recognize these signs of distress, we developed safety summaries: short, factual notes about earlier safety-relevant context that may matter in rare, high-risk situations. These summaries are created by a model trained for safety reasoning tasks and are narrowly scoped, kept only for a limited time, and used only when relevant to a serious safety concern. They are designed to capture factual safety context, not to serve as general personalization or long-term memory. Like we discussed above, we also trained ChatGPT to use this context more carefully, so it can better recognize when added caution is needed and respond appropriately – for example by de-escalating, refusing to provide details, or redirecting toward safer alternatives.
Working with mental health experts
We developed these systems with input from mental health professionals in our Global Physicians Network, including psychiatrists and psychologists with expertise in forensic psychology, suicide prevention, and self-harm.
These experts helped inform decisions around when safety summaries should be created, how much prior context may be relevant, and how long the model should consider that context when responding. Their input helped ground this work in real-world expertise and support more appropriate responses in sensitive situations.
Measuring improvement
These updates help ChatGPT better recognize patterns of potentially harmful intent both within and across conversations. When concerning signals emerge gradually, the model is better able to identify the pattern and respond more safely.
In internal evaluations specifically designed to measure performance in challenging cases, these updates significantly improved safe responses in scenarios where risk became clearer over time. These tests measured how often the model gave the intended safe response in conversations designed to emulate high-risk situations.
In long single-conversation scenarios, the safe-response performance improved by 50% in suicide and self-harm cases, and by 16% in harm-to-others cases. This means the model was substantially more likely to recognize when earlier parts of the conversation changed the meaning of a later request and respond appropriately.
We also tested performance across multiple conversations and multiple models to help ensure these improvements remain effective as models evolve. On GPT‑5.5 Instant, the current default model in ChatGPT, the safe-response performance improved by 52% in harm-to-others cases and by 39% in suicide and self-harm cases.
We also evaluated the quality of the safety summaries themselves. Across more than 4,000 evaluations, they received an average safety relevance score of 4.93 out of 5 and a factuality score of 4.34 out of 5, indicating they were generally accurate and focused on the most important safety context.
Finally, we tested whether adding this safety context reduced quality in ordinary conversations. In our internal testing, responses remained broadly comparable in everyday chats, with no meaningful user preference between responses with or without safety summaries.
Looking ahead
Helping AI systems recognize risk that only becomes clear over time is a difficult, long-term challenge. Signals can be subtle, spread across messages, or buried within otherwise ordinary conversations. We will continue improving ChatGPT’s ability to identify those rare but important moments and respond appropriately.
Today, this work focuses on self-harm and harm-to-others scenarios. In the future, we may explore whether similar methods can help in other high-risk areas such as biology or cyber safety, with careful safeguards in place. This remains an ongoing priority, and we will continue strengthening safeguards as our models and understanding evolve.
Read more about our safety and mental health work:



Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.