arXiv — NLP / Computation & Language · May 25, 2026 · 4 min read

CoFrGeNet: Continued Fraction Architectures for Language Generation

Mirrored from arXiv — NLP / Computation & Language for archival readability. Support the source by reading on the original site.

Like Read original ↗

Computer Science > Computation and Language

arXiv:2601.21766 (cs)

[Submitted on 29 Jan 2026 (v1), last revised 22 May 2026 (this version, v4)]

Title:CoFrGeNet: Continued Fraction Architectures for Language Generation

Authors:Amit Dhurandhar, Vijil Chenthamarakshan, Dennis Wei, Tejaswini Pedapati, Karthikeyan Natesan Ramamurthy, Rahul Nair

View a PDF of the paper titled CoFrGeNet: Continued Fraction Architectures for Language Generation, by Amit Dhurandhar and 5 other authors

View PDF HTML (experimental)

Abstract:Transformers are arguably the preferred architecture for language generation. In this paper, inspired by continued fractions, we introduce a new function class for generative modeling. The architecture family implementing this function class is named CoFrGeNets - Continued Fraction Generative Networks. We design novel architectural components based on this function class that can replace Multi-head Attention and Feed-Forward Networks in Transformer blocks while requiring much fewer parameters. We derive custom gradient formulations to optimize the proposed components more accurately and efficiently than using standard PyTorch-based gradients. Our components are a plug-in replacement requiring little change in training or inference procedures that have already been put in place for Transformer-based models thus making our approach easy to incorporate in large industrial workflows. We experiment on two very different transformer architectures GPT2-xl (1.5B) and Llama3 (3.2B), where the former we pre-train on OpenWebText and GneissWeb, while the latter we pre-train on the docling data mix which consists of nine different datasets. Results show that the performance on downstream classification, Q\& A, reasoning and text understanding tasks of our models is competitive and sometimes even superior to the original models with $\frac{2}{3}$ to $\frac{1}{2}$ the parameters and shorter pre-training time. We believe that future implementations customized to hardware will further bring out the true potential of our architectures.

Comments:	Earlier version accepted to ICML 2026
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2601.21766 [cs.CL]
	(or arXiv:2601.21766v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2601.21766 arXiv-issued DOI via DataCite

Submission history

From: Amit Dhurandhar [view email]
[v1] Thu, 29 Jan 2026 14:16:39 UTC (2,718 KB)
[v2] Fri, 30 Jan 2026 13:45:39 UTC (2,718 KB)
[v3] Sun, 3 May 2026 12:41:29 UTC (2,665 KB)
[v4] Fri, 22 May 2026 13:17:40 UTC (2,665 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled CoFrGeNet: Continued Fraction Architectures for Language Generation, by Amit Dhurandhar and 5 other authors

View PDF
HTML (experimental)
TeX Source

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2026-01

Change to browse by:

cs
cs.AI

References & Citations

BibTeX formatted citation

Data provided by:

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

Author
Venue
Institution
Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Discussion (0)

No comments yet. Sign in and be the first to say something.