arXiv — NLP / Computation & Language · · 4 min read

CoFrGeNet: Continued Fraction Architectures for Language Generation

Mirrored from arXiv — NLP / Computation & Language for archival readability. Support the source by reading on the original site.

Computer Science > Computation and Language

arXiv:2601.21766 (cs)
[Submitted on 29 Jan 2026 (v1), last revised 22 May 2026 (this version, v4)]

Title:CoFrGeNet: Continued Fraction Architectures for Language Generation

View a PDF of the paper titled CoFrGeNet: Continued Fraction Architectures for Language Generation, by Amit Dhurandhar and 5 other authors
View PDF HTML (experimental)
Abstract:Transformers are arguably the preferred architecture for language generation. In this paper, inspired by continued fractions, we introduce a new function class for generative modeling. The architecture family implementing this function class is named CoFrGeNets - Continued Fraction Generative Networks. We design novel architectural components based on this function class that can replace Multi-head Attention and Feed-Forward Networks in Transformer blocks while requiring much fewer parameters. We derive custom gradient formulations to optimize the proposed components more accurately and efficiently than using standard PyTorch-based gradients. Our components are a plug-in replacement requiring little change in training or inference procedures that have already been put in place for Transformer-based models thus making our approach easy to incorporate in large industrial workflows. We experiment on two very different transformer architectures GPT2-xl (1.5B) and Llama3 (3.2B), where the former we pre-train on OpenWebText and GneissWeb, while the latter we pre-train on the docling data mix which consists of nine different datasets. Results show that the performance on downstream classification, Q\& A, reasoning and text understanding tasks of our models is competitive and sometimes even superior to the original models with $\frac{2}{3}$ to $\frac{1}{2}$ the parameters and shorter pre-training time. We believe that future implementations customized to hardware will further bring out the true potential of our architectures.
Comments: Earlier version accepted to ICML 2026
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as: arXiv:2601.21766 [cs.CL]
  (or arXiv:2601.21766v4 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2601.21766
arXiv-issued DOI via DataCite

Submission history

From: Amit Dhurandhar [view email]
[v1] Thu, 29 Jan 2026 14:16:39 UTC (2,718 KB)
[v2] Fri, 30 Jan 2026 13:45:39 UTC (2,718 KB)
[v3] Sun, 3 May 2026 12:41:29 UTC (2,665 KB)
[v4] Fri, 22 May 2026 13:17:40 UTC (2,665 KB)
Full-text links:

Access Paper:

Current browse context:

cs.CL
< prev   |   next >
Change to browse by:

References & Citations

Loading...

BibTeX formatted citation

loading...
Data provided by:

Bookmark

BibSonomy Reddit
Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos

Demos

Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers

Recommenders and Search Tools

Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from arXiv — NLP / Computation & Language