Tag

Model releases

500 articles archived under #model-release · RSS

OpenAI official-blog 13d ago

A near-autonomous AI chemist improves a challenging reaction in medicinal chemistry

OpenAI and Molecule.one show how a near-autonomous AI chemist using GPT-5.4 improved a key drug-making reaction, advancing medicinal chemistry research.

24
Hugging Face official-blog 13d ago

GLM-5.2: Built for Long-Horizon Tasks

Back to Articles a]:hidden"> GLM-5.2: Built for Long-Horizon Tasks Team Article Published June 17, 2026 Upvote 13 Z.AI zaiorg zai-org We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its…

18
r/LocalLLaMA community 13d ago

It looks like Rio 3.5 397B could've simply been a semi-failed embezzling of funding

Here is the chain of events: The model training received funding of R$500K (about $100K USD). The initial model documentation claimed that it was a developed on top of Qwen 3.5 397B with fancy training and great improvements. It was discovered that the model was a cheap, simple…

30
Hugging Face Daily Papers research 13d ago

Text-Vision Co-Instructed Image Editing

Abstract A unified text-visual image editing framework is presented that combines semantic intent from textual instructions with spatial guidance from visual prompts to achieve more precise and faithful image manipulation. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Existing…

16
Hugging Face Daily Papers research 13d ago

Learning from the Self-future: On-policy Self-distillation for dLLMs

Abstract d-OPSD introduces a novel on-policy self-distillation framework for diffusion language models by adapting self-teacher construction and supervision mechanisms to match the non-autoregressive nature of diffusion models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

29
Hugging Face Daily Papers research 13d ago

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

Abstract Parallel loop Transformers achieve better code generation performance with two loops due to refined representations, while additional loops cause diminishing returns and increased positional mismatch costs. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Looped…

5
Hugging Face Daily Papers research 13d ago

Variable-Width Transformers

Abstract A novel transformer architecture with nonuniform width allocation across layers achieves better performance and efficiency compared to uniform designs by utilizing a parameter-free residual resizing mechanism. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Scaling model…

5
arXiv — Machine Learning research 13d ago

Noise-Driven Escape from Metastable Phases explains Grokking in Deep Neural Networks

arXiv:2606.17120v1 Announce Type: new Abstract: Deep neural networks (DNNs) exhibit first order phase transitions under variations of the L2 regularization strength, with each transition marking the onset of a new learnable feature. Below a critical regularization strength, all…

34
arXiv — Machine Learning research 13d ago

The Discrete-Log Clock: How a Transformer Learns Modular Multiplication

arXiv:2606.17399v1 Announce Type: new Abstract: When small transformers grok modular multiplication, prior work reports that the learned embedding has a "dense" Fourier spectrum requiring all frequencies. This contrasts with modular addition, where only a sparse set of key…

11
arXiv — Machine Learning research 13d ago

Credibility-Weighted Pricing of Autonomous Vehicle Liability Under Operational Design Domain Shift

arXiv:2606.17451v1 Announce Type: new Abstract: Automated Driving System deployments create a foundational ratemaking challenge: sparse experience, shifting operational design domains, and non-stationary risk across software releases. We propose a hierarchical Bayesian…

22
arXiv — NLP / Computation & Language research 13d ago

Looped World Models

arXiv:2606.18208v1 Announce Type: cross Abstract: Current world models face a fundamental tension: faithful long-horizon simulation demands deep computation, but deeper models are expensive to deploy and prone to compounding errors. We resolve this by introducing Looped World…

19
arXiv — NLP / Computation & Language research 13d ago

Self-Generated Error Training for Token Editing in Diffusion Language Models

arXiv:2606.17175v1 Announce Type: new Abstract: Token-to-token (T2T) editing lets LLaDA2.1 revise committed tokens during block-diffusion decoding. The released recipe trains this editor on random vocabulary corruptions, but at inference the editor sees the model's own fluent,…

25
arXiv — NLP / Computation & Language research 13d ago

MLLP-VRAIN UPV system for the IWSLT 2026 Simultaneous Speech Translation task

arXiv:2606.17255v1 Announce Type: new Abstract: This work describes the participation of the MLLP-VRAIN research group in the shared task of the IWSLT 2026 Simultaneous Speech Translation track. Our submission utilizes the recently released Parakeet and Qwen 3.5 models to create…

20
arXiv — NLP / Computation & Language research 13d ago

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

arXiv:2606.18237v1 Announce Type: new Abstract: Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility, but they are difficult to scale…

36
arXiv — NLP / Computation & Language research 13d ago

A Red-Team Study of Anthropic Fable 5 & Opus 4.8 Models

arXiv:2606.18193v1 Announce Type: cross Abstract: We evaluate the adversarial robustness of two frontier large language models (LLMs) developed by Anthropic, Fable 5 and Opus 4.8, against four families of automated jailbreak attack across 7 826 harmful intents spanning a…

6
Vercel — AI dev-tools 13d ago

Introducing Vercel Connect

Giving your agents access to your tools, data, and services is what makes them useful. As agents perform deeper work across systems, authenticating and authorizing that access becomes central to your application architecture. Today, agent access is usually granted through…

21
Vercel — AI dev-tools 13d ago

Introducing eve

Today, we are proud to introduce eve , an open-source agent framework for building, running, and scaling agents. eve is designed around the idea that building an agent should mean defining what it does without assembling all of the pieces that it needs to run in production.…

15
Hacker News — AI on Front Page community 13d ago

US holds off blacklisting DeepSeek, more than 100 firms deemed security risks

https://archive.ph/MlU1U Comments URL: https://news.ycombinator.com/item?id=48565498 Points: 332 # Comments: 358

28
Simon Willison community 13d ago

NetNewsWire Status

NetNewsWire Status I find this inspiring. Brent Simmons retired a year ago, and his retirement project is making one piece of software really, really good - free from any commercial pressure. The software is NetNewsWire, first released in 2002 and made open source in 2018. I've…

14
Hugging Face Daily Papers research 13d ago

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

Abstract ChLogic benchmark reveals persistent performance gaps between English and Chinese logical reasoning in large language models, influenced by surface realization differences and translation artifacts. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Large language models…

37
Hugging Face Daily Papers research 13d ago

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

Abstract Spectral Forcing, a time-conditional 2D-DCT low-pass operator, improves diffusion model efficiency by explicitly separating signal from noise in pixel-space models. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Pixel-space diffusion models are trained on full-bandwidth…

32
Hugging Face Daily Papers research 13d ago

ProCUA-SFT Technical Report

Abstract Training computer-use agents using a large-scale synthetic dataset with automated task generation and verification achieves significantly improved performance on desktop interaction benchmarks. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Training computer-use agents…

4
Hugging Face Daily Papers research 13d ago

OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

Abstract OPD-Evolver is a self-evolving agent framework that combines slow-fast co-evolution with on-policy self-distillation to enhance memory management and policy learning across multiple domains. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Memory has become a standard…

28
Hugging Face Daily Papers research 13d ago

Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus

Abstract Research agents face significant challenges when evidence is in a different language than the query, with performance degrading even when gold evidence is provided directly. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Deep research agents are increasingly evaluated on…

28
Hugging Face Daily Papers research 13d ago

A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization

Abstract Training instability in reinforcement learning with verifiable rewards is analyzed through token-level gradient dynamics, leading to a stable policy optimization method that updates only on positive-advantage completions. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

20
Hugging Face Daily Papers research 13d ago

Looped World Models

Abstract Looped World Models introduce iterative latent state refinement through shared transformer blocks, achieving 100x parameter efficiency while adapting computational depth to prediction complexity. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Current world models face a…

14
Hugging Face Daily Papers research 13d ago

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

Abstract A framework called TRIAGE is proposed to improve clinical early warning systems by training large language models to generate dialectical reasoning for continuous risk scoring with better calibration and interpretability. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

29
Hugging Face Daily Papers research 13d ago

Aligning Quantum Operators with Large Language Models

Abstract Large language models can be adapted to understand quantum operators by mapping unitary matrices into their latent space, enabling quantum circuit synthesis and language-conditioned gate constraint specification. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Can Large…

19
r/LocalLLaMA community 13d ago

Someone awhile ago did a quant shootout for Qwen3.6, I did shoddy math on it (again)

  submitted by   /u/Diablo-D3 [link]   [comments]

25
Vercel — AI dev-tools 13d ago

Introducing eve, an open-source agent framework

eve is now available in public preview. eve is an open-source framework for building, running, and scaling agents. An agent is just a directory of files, and production comes built in: Durable execution Sandboxed compute Human-in-the-loop approvals Subagents Evals The smallest…

31
OpenAI official-blog 13d ago

Introducing LifeSciBench

Introducing LifeSciBench, an expert-authored, expert-reviewed benchmark for evaluating how AI systems handle real-world life science research tasks and decisions.

19
Hugging Face Daily Papers research 13d ago

Attacks on Machine-Text Detectors Retain Stylistic Fingerprints

Abstract Machine-text detection remains challenging despite evasion techniques, but stylistic features can provide robust defense when analyzed across multiple documents rather than individual instances. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Despite considerable progress…

17
Vercel — AI dev-tools 13d ago

Vercel for Enterprise Apps and Agents

Today we are introducing Vercel for Enterprise Apps and Agents , a platform that gives your entire company the ability to ship with AI safely, behind your access and security boundaries. Over the past year, employees across Vercel shipped hundreds of agents and internal apps.…

34
Ars Technica — AI news-outlet 13d ago

Trump admin tries to block Clean Air Act lawsuit over xAI's gas turbines

NAACP lawsuit says xAI uses gas turbines without permits for Grok data center.

19
OpenAI Python SDK releases dev-tools 13d ago

v2.42.0

2.42.0 (2026-06-16) Full Changelog: v2.41.1...v2.42.0 Features api: admin spend_alerts ( 6134198 ) api: manual updates ( f337bf4 ) api: update OpenAPI spec or Stainless config ( 7015158 ) Build System fix release workflow permissions ( #3389 ) ( a526ee8 ) Use CI environment for…

38
Simon Willison community 13d ago

datasette 1.0a34

Release: datasette 1.0a34 Quoting the release notes: The big feature in this alpha is tools to insert, edit and delete rows within the Datasette interface. These features are available on table pages, and edit and delete are also available as action items on the row page. The…

36
r/LocalLLaMA community 13d ago

GLM-5.2 is now 1st on Design Arena — ahead of the now unavailable Claude Fable 5.

https://x.com/Designarena/status/2066940737011560652   submitted by   /u/Recoil42 [link]   [comments]

36
Ars Technica — AI news-outlet 13d ago

Anthropic "pauses" token-based billing for its Claude Agent SDK

Move originally planned for Monday would have heavily increased power users' costs.

21
r/LocalLLaMA community 13d ago

Is Le Gros Chaton opensource?

so i keep hearing about le gros chaton, the upcoming mistral model that allegedly destroys claude mythos, gpt-5.5, my sleep schedule, and possibly the french economy. people say it has 1b context, self-improves in real time, writes perfect code, and only hallucinates in elegant…

38
Hacker News — AI on Front Page community 13d ago

GrapheneOS has been ported to Android 17

Article URL: https://discuss.grapheneos.org/d/36469-grapheneos-has-been-ported-to-android-17-and-official-releases-are-coming-soon Comments URL: https://news.ycombinator.com/item?id=48561654 Points: 273 # Comments: 110

16
r/LocalLLaMA community 13d ago

GLM-5.2 just dropped open weights and it already looks weirdly strong for coding

GLM-5.2 just released and the early numbers look pretty insane. 1M context window, open weights, MIT license, two reasoning effort modes, and it is already showing up near the top of coding arenas. I know every new model gets hyped for 24 hours, but this one actually looks worth…

28
r/LocalLLaMA community 13d ago

GLM 5.2 API is live, weights are on HF, and ollama has it already

GLM 5.2 dropped on Friday locked behind the GLM Coding Plan. That was annoying if you just wanted to test it without subscribing to another IDE tier. Two hours ago today they opened the API and pushed weights to HuggingFace under MIT. Ollama already has it. So now you can…

15
TechCrunch — AI news-outlet 13d ago

Android 17 launches with new multitasking tools as Google expands Gemini features

Google has released Android 17 and Wear OS 7, introducing new multitasking features, parental controls, security tools, and smartwatch upgrades. The launch is also accompanied by a Pixel Drop that brings Google’s latest AI models to its devices.

9
Hacker News — AI on Front Page community 13d ago

GPT‑NL: a sovereign language model for the Netherlands

Article URL: https://www.tno.nl/en/digital/artificial-intelligence/gpt-nl/ Comments URL: https://news.ycombinator.com/item?id=48559188 Points: 206 # Comments: 203

15
r/LocalLLaMA community 13d ago

Mistral - New family of open-weight models @ July

Tweet : https://xcancel.com/arthurmensch/status/2066913353860018596#m   submitted by   /u/pmttyji [link]   [comments]

9
Hugging Face Daily Papers research 13d ago

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

Abstract Temporal Difference in Vision (TDV) presents a novel self-supervised learning approach for video data that eliminates traditional inductive biases by leveraging causal relationships between past and future frames. Generated by Qwen/Qwen2.5-Coder-32B-Instruct Progress in…

30
Simon Willison community 13d ago

datasette-tailscale 0.1a0

Release: datasette-tailscale 0.1a0 A very experimental alpha plugin which lets you do this: datasette tailscale mydata.db \ --ts-authkey tskey-auth-xxxx --ts-hostname datasette-preview This starts a localhost Datasette server with a Tailscale sidecar that connects it to your…

10
Simon Willison community 13d ago

Quoting Georgi Gerganov

I can 100% attest to the fact that Qwen3.6-27B is a very capable local model for coding tasks. Over the last month and a half I've been using it almost daily, either on my M2 Ultra or on my RTX 5090 box. I use it for small mundane tasks at ggml-org - nothing really impressive,…

9
Hugging Face Daily Papers research 13d ago

Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks

Abstract Track2View generates novel camera viewpoints from videos by using 3D point tracks to establish explicit spatiotemporal correspondences, achieving superior visual quality and camera accuracy compared to existing methods. Generated by Qwen/Qwen2.5-Coder-32B-Instruct…

9
r/MachineLearning community 13d ago

[ECCV 2026] Final Decisions [D]

ECCV 2026 final decisions are expected to be released on June 17, 2026 . Since there was no exact release time specified, results will likely roll out within 48 hours. This thread is for everyone to share updates, discuss outcomes, and support each other through the decisions.…

26

A near-autonomous AI chemist improves a challenging reaction in medicinal chemistry

GLM-5.2: Built for Long-Horizon Tasks

It looks like Rio 3.5 397B could've simply been a semi-failed embezzling of funding

Text-Vision Co-Instructed Image Editing

Learning from the Self-future: On-policy Self-distillation for dLLMs

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

Variable-Width Transformers

Noise-Driven Escape from Metastable Phases explains Grokking in Deep Neural Networks

The Discrete-Log Clock: How a Transformer Learns Modular Multiplication

Credibility-Weighted Pricing of Autonomous Vehicle Liability Under Operational Design Domain Shift

Looped World Models

Self-Generated Error Training for Token Editing in Diffusion Language Models

MLLP-VRAIN UPV system for the IWSLT 2026 Simultaneous Speech Translation task

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

A Red-Team Study of Anthropic Fable 5 & Opus 4.8 Models

Introducing Vercel Connect

Introducing eve

US holds off blacklisting DeepSeek, more than 100 firms deemed security risks

NetNewsWire Status

ChLogic: Evaluating Robustness of Logical Reasoning in Chinese Expressions

Show the Signal, Hide the Noise: Spectral Forcing for Pixel-Space Diffusion

ProCUA-SFT Technical Report

OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation

Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus

A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization

Looped World Models

TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs

Aligning Quantum Operators with Large Language Models

Someone awhile ago did a quant shootout for Qwen3.6, I did shoddy math on it (again)

Introducing eve, an open-source agent framework

Introducing LifeSciBench

Attacks on Machine-Text Detectors Retain Stylistic Fingerprints

Vercel for Enterprise Apps and Agents

Trump admin tries to block Clean Air Act lawsuit over xAI&#039;s gas turbines

v2.42.0

datasette 1.0a34

GLM-5.2 is now 1st on Design Arena — ahead of the now unavailable Claude Fable 5.

Anthropic "pauses" token-based billing for its Claude Agent SDK

Is Le Gros Chaton opensource?

GrapheneOS has been ported to Android 17

GLM-5.2 just dropped open weights and it already looks weirdly strong for coding

GLM 5.2 API is live, weights are on HF, and ollama has it already

Android 17 launches with new multitasking tools as Google expands Gemini features

GPT‑NL: a sovereign language model for the Netherlands

Mistral - New family of open-weight models @ July

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

datasette-tailscale 0.1a0

Quoting Georgi Gerganov

Track2View: 4D-Consistent Camera-Controlled Video Generation via Paired 3D Point Tracks

[ECCV 2026] Final Decisions [D]

Trump admin tries to block Clean Air Act lawsuit over xAI's gas turbines