News / #model-release Tag Model releases 500 articles archived under #model-release · RSS Sign in to follow OpenAI official-blog 13d ago A near-autonomous AI chemist improves a challenging reaction in medicinal chemistry OpenAI and Molecule.one show how a near-autonomous AI chemist using GPT-5.4 improved a key drug-making reaction, advancing medicinal chemistry research. 24 Hugging Face official-blog 13d ago GLM-5.2: Built for Long-Horizon Tasks Back to Articles a]:hidden"> GLM-5.2: Built for Long-Horizon Tasks Team Article Published June 17, 2026 Upvote 13 Z.AI zaiorg zai-org We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its… 18 r/LocalLLaMA community 13d ago It looks like Rio 3.5 397B could've simply been a semi-failed embezzling of funding Here is the chain of events: The model training received funding of R$500K (about $100K USD). The initial model documentation claimed that it was a developed on top of Qwen 3.5 397B with fancy training and great improvements. It was discovered that the model was a cheap, simple… 30 Hugging Face Daily Papers research 13d ago Text-Vision Co-Instructed Image Editing Abstract A unified text-visual image editing framework is presented that combines semantic intent from textual instructions with spatial guidance from visual prompts to achieve more precise and faithful image manipulation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing… 16 Hugging Face Daily Papers research 13d ago Learning from the Self-future: On-policy Self-distillation for dLLMs Abstract d-OPSD introduces a novel on-policy self-distillation framework for diffusion language models by adapting self-teacher construction and supervision mechanisms to match the non-autoregressive nature of diffusion models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 29 Hugging Face Daily Papers research 13d ago LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Abstract Parallel loop Transformers achieve better code generation performance with two loops due to refined representations, while additional loops cause diminishing returns and increased positional mismatch costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Looped… 5 Hugging Face Daily Papers research 13d ago Variable-Width Transformers Abstract A novel transformer architecture with nonuniform width allocation across layers achieves better performance and efficiency compared to uniform designs by utilizing a parameter-free residual resizing mechanism. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Scaling model… 5 arXiv — Machine Learning research 13d ago Noise-Driven Escape from Metastable Phases explains Grokking in Deep Neural Networks arXiv:2606.17120v1 Announce Type: new Abstract: Deep neural networks (DNNs) exhibit first order phase transitions under variations of the L2 regularization strength, with each transition marking the onset of a new learnable feature. Below a critical regularization strength, all… 34 arXiv — Machine Learning research 13d ago The Discrete-Log Clock: How a Transformer Learns Modular Multiplication arXiv:2606.17399v1 Announce Type: new Abstract: When small transformers grok modular multiplication, prior work reports that the learned embedding has a "dense" Fourier spectrum requiring all frequencies. This contrasts with modular addition, where only a sparse set of key… 11 arXiv — Machine Learning research 13d ago Credibility-Weighted Pricing of Autonomous Vehicle Liability Under Operational Design Domain Shift arXiv:2606.17451v1 Announce Type: new Abstract: Automated Driving System deployments create a foundational ratemaking challenge: sparse experience, shifting operational design domains, and non-stationary risk across software releases. We propose a hierarchical Bayesian… 22 arXiv — NLP / Computation & Language research 13d ago Looped World Models arXiv:2606.18208v1 Announce Type: cross Abstract: Current world models face a fundamental tension: faithful long-horizon simulation demands deep computation, but deeper models are expensive to deploy and prone to compounding errors. We resolve this by introducing Looped World… 19 arXiv — NLP / Computation & Language research 13d ago Self-Generated Error Training for Token Editing in Diffusion Language Models arXiv:2606.17175v1 Announce Type: new Abstract: Token-to-token (T2T) editing lets LLaDA2.1 revise committed tokens during block-diffusion decoding. The released recipe trains this editor on random vocabulary corruptions, but at inference the editor sees the model's own fluent,… 25 arXiv — NLP / Computation & Language research 13d ago MLLP-VRAIN UPV system for the IWSLT 2026 Simultaneous Speech Translation task arXiv:2606.17255v1 Announce Type: new Abstract: This work describes the participation of the MLLP-VRAIN research group in the shared task of the IWSLT 2026 Simultaneous Speech Translation track. Our submission utilizes the recently released Parakeet and Qwen 3.5 models to create… 20 arXiv — NLP / Computation & Language research 13d ago ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues arXiv:2606.18237v1 Announce Type: new Abstract: Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility, but they are difficult to scale… 36 arXiv — NLP / Computation & Language research 13d ago A Red-Team Study of Anthropic Fable 5 & Opus 4.8 Models arXiv:2606.18193v1 Announce Type: cross Abstract: We evaluate the adversarial robustness of two frontier large language models (LLMs) developed by Anthropic, Fable 5 and Opus 4.8, against four families of automated jailbreak attack across 7 826 harmful intents spanning a… 6 Vercel — AI dev-tools 13d ago Introducing Vercel Connect Giving your agents access to your tools, data, and services is what makes them useful. As agents perform deeper work across systems, authenticating and authorizing that access becomes central to your application architecture. Today, agent access is usually granted through… 21 Vercel — AI dev-tools 13d ago Introducing eve Today, we are proud to introduce eve , an open-source agent framework for building, running, and scaling agents. eve is designed around the idea that building an agent should mean defining what it does without assembling all of the pieces that it needs to run in production.… 15 Hacker News — AI on Front Page community 13d ago US holds off blacklisting DeepSeek, more than 100 firms deemed security risks https://archive.ph/MlU1U Comments URL: https://news.ycombinator.com/item?id=48565498 Points: 332 # Comments: 358 28 Simon Willison community 13d ago NetNewsWire Status NetNewsWire Status I find this inspiring. Brent Simmons retired a year ago, and his retirement project is making one piece of software really, really good - free from any commercial pressure. The software is NetNewsWire, first released in 2002 and made open source in 2018. I've… 14 Hugging Face Daily Papers research 13d ago ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions Abstract ChLogic benchmark reveals persistent performance gaps between English and Chinese logical reasoning in large language models, influenced by surface realization differences and translation artifacts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models… 37 Hugging Face Daily Papers research 13d ago Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion Abstract Spectral Forcing, a time-conditional 2D-DCT low-pass operator, improves diffusion model efficiency by explicitly separating signal from noise in pixel-space models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Pixel-space diffusion models are trained on full-bandwidth… 32 Hugging Face Daily Papers research 13d ago ProCUA-SFT Technical Report Abstract Training computer-use agents using a large-scale synthetic dataset with automated task generation and verification achieves significantly improved performance on desktop interaction benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Training computer-use agents… 4 Hugging Face Daily Papers research 13d ago OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation Abstract OPD-Evolver is a self-evolving agent framework that combines slow-fast co-evolution with on-policy self-distillation to enhance memory management and policy learning across multiple domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory has become a standard… 28 Hugging Face Daily Papers research 13d ago Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus Abstract Research agents face significant challenges when evidence is in a different language than the query, with performance degrading even when gold evidence is provided directly. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep research agents are increasingly evaluated on… 28 Hugging Face Daily Papers research 13d ago A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization Abstract Training instability in reinforcement learning with verifiable rewards is analyzed through token-level gradient dynamics, leading to a stable policy optimization method that updates only on positive-advantage completions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 20 Hugging Face Daily Papers research 13d ago Looped World Models Abstract Looped World Models introduce iterative latent state refinement through shared transformer blocks, achieving 100x parameter efficiency while adapting computational depth to prediction complexity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current world models face a… 14 Hugging Face Daily Papers research 13d ago TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs Abstract A framework called TRIAGE is proposed to improve clinical early warning systems by training large language models to generate dialectical reasoning for continuous risk scoring with better calibration and interpretability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 29 Hugging Face Daily Papers research 13d ago Aligning Quantum Operators with Large Language Models Abstract Large language models can be adapted to understand quantum operators by mapping unitary matrices into their latent space, enabling quantum circuit synthesis and language-conditioned gate constraint specification. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Can Large… 19 r/LocalLLaMA community 13d ago Someone awhile ago did a quant shootout for Qwen3.6, I did shoddy math on it (again)   submitted by   /u/Diablo-D3 [link]   [comments] 25 Vercel — AI dev-tools 13d ago Introducing eve, an open-source agent framework eve is now available in public preview. eve is an open-source framework for building, running, and scaling agents. An agent is just a directory of files, and production comes built in: Durable execution Sandboxed compute Human-in-the-loop approvals Subagents Evals The smallest… 31 OpenAI official-blog 13d ago Introducing LifeSciBench Introducing LifeSciBench, an expert-authored, expert-reviewed benchmark for evaluating how AI systems handle real-world life science research tasks and decisions. 19 Hugging Face Daily Papers research 13d ago Attacks on Machine-Text Detectors Retain Stylistic Fingerprints Abstract Machine-text detection remains challenging despite evasion techniques, but stylistic features can provide robust defense when analyzed across multiple documents rather than individual instances. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Despite considerable progress… 17 Vercel — AI dev-tools 13d ago Vercel for Enterprise Apps and Agents Today we are introducing Vercel for Enterprise Apps and Agents , a platform that gives your entire company the ability to ship with AI safely, behind your access and security boundaries. Over the past year, employees across Vercel shipped hundreds of agents and internal apps.… 34 Ars Technica — AI news-outlet 13d ago Trump admin tries to block Clean Air Act lawsuit over xAI's gas turbines NAACP lawsuit says xAI uses gas turbines without permits for Grok data center. 19 OpenAI Python SDK releases dev-tools 13d ago v2.42.0 2.42.0 (2026-06-16) Full Changelog: v2.41.1...v2.42.0 Features api: admin spend_alerts ( 6134198 ) api: manual updates ( f337bf4 ) api: update OpenAPI spec or Stainless config ( 7015158 ) Build System fix release workflow permissions ( #3389 ) ( a526ee8 ) Use CI environment for… 38 Simon Willison community 13d ago datasette 1.0a34 Release: datasette 1.0a34 Quoting the release notes: The big feature in this alpha is tools to insert, edit and delete rows within the Datasette interface. These features are available on table pages, and edit and delete are also available as action items on the row page. The… 36 r/LocalLLaMA community 13d ago GLM-5.2 is now 1st on Design Arena — ahead of the now unavailable Claude Fable 5. https://x.com/Designarena/status/2066940737011560652   submitted by   /u/Recoil42 [link]   [comments] 36 Ars Technica — AI news-outlet 13d ago Anthropic "pauses" token-based billing for its Claude Agent SDK Move originally planned for Monday would have heavily increased power users' costs. 21 r/LocalLLaMA community 13d ago Is Le Gros Chaton opensource? so i keep hearing about le gros chaton, the upcoming mistral model that allegedly destroys claude mythos, gpt-5.5, my sleep schedule, and possibly the french economy. people say it has 1b context, self-improves in real time, writes perfect code, and only hallucinates in elegant… 38 Hacker News — AI on Front Page community 13d ago GrapheneOS has been ported to Android 17 Article URL: https://discuss.grapheneos.org/d/36469-grapheneos-has-been-ported-to-android-17-and-official-releases-are-coming-soon Comments URL: https://news.ycombinator.com/item?id=48561654 Points: 273 # Comments: 110 16 r/LocalLLaMA community 13d ago GLM-5.2 just dropped open weights and it already looks weirdly strong for coding GLM-5.2 just released and the early numbers look pretty insane. 1M context window, open weights, MIT license, two reasoning effort modes, and it is already showing up near the top of coding arenas. I know every new model gets hyped for 24 hours, but this one actually looks worth… 28 r/LocalLLaMA community 13d ago GLM 5.2 API is live, weights are on HF, and ollama has it already GLM 5.2 dropped on Friday locked behind the GLM Coding Plan. That was annoying if you just wanted to test it without subscribing to another IDE tier. Two hours ago today they opened the API and pushed weights to HuggingFace under MIT. Ollama already has it. So now you can… 15 TechCrunch — AI news-outlet 13d ago Android 17 launches with new multitasking tools as Google expands Gemini features Google has released Android 17 and Wear OS 7, introducing new multitasking features, parental controls, security tools, and smartwatch upgrades. The launch is also accompanied by a Pixel Drop that brings Google’s latest AI models to its devices. 9 Hacker News — AI on Front Page community 13d ago GPT‑NL: a sovereign language model for the Netherlands Article URL: https://www.tno.nl/en/digital/artificial-intelligence/gpt-nl/ Comments URL: https://news.ycombinator.com/item?id=48559188 Points: 206 # Comments: 203 15 r/LocalLLaMA community 13d ago Mistral - New family of open-weight models @ July Tweet : https://xcancel.com/arthurmensch/status/2066913353860018596#m   submitted by   /u/pmttyji [link]   [comments] 9 Hugging Face Daily Papers research 13d ago You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences Abstract Temporal Difference in Vision (TDV) presents a novel self-supervised learning approach for video data that eliminates traditional inductive biases by leveraging causal relationships between past and future frames. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Progress in… 30 Simon Willison community 13d ago datasette-tailscale 0.1a0 Release: datasette-tailscale 0.1a0 A very experimental alpha plugin which lets you do this: datasette tailscale mydata.db \ --ts-authkey tskey-auth-xxxx --ts-hostname datasette-preview This starts a localhost Datasette server with a Tailscale sidecar that connects it to your… 10 Simon Willison community 13d ago Quoting Georgi Gerganov I can 100% attest to the fact that Qwen3.6-27B is a very capable local model for coding tasks. Over the last month and a half I've been using it almost daily, either on my M2 Ultra or on my RTX 5090 box. I use it for small mundane tasks at ggml-org - nothing really impressive,… 9 Hugging Face Daily Papers research 13d ago Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks Abstract Track2View generates novel camera viewpoints from videos by using 3D point tracks to establish explicit spatiotemporal correspondences, achieving superior visual quality and camera accuracy compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct… 9 r/MachineLearning community 13d ago [ECCV 2026] Final Decisions [D] ECCV 2026 final decisions are expected to be released on June 17, 2026 . Since there was no exact release time specified, results will likely roll out within 48 hours. This thread is for everyone to share updates, discuss outcomes, and support each other through the decisions.… 26 Page 10 of 10 · 500 articles ← Newer