arXiv — NLP / Computation & Language · · 4 min read

Lexicons and grammars for language processing: industrial or handcrafted products?

Mirrored from arXiv — NLP / Computation & Language for archival readability. Support the source by reading on the original site.

Computer Science > Computation and Language

arXiv:2606.03412 (cs)
[Submitted on 2 Jun 2026]

Title:Lexicons and grammars for language processing: industrial or handcrafted products?

Authors:Eric Laporte
View a PDF of the paper titled Lexicons and grammars for language processing: industrial or handcrafted products?, by Eric Laporte
View PDF
Abstract:During the recent years, the use of linguistic data for language processing increased progressively. Such data are now commonly called language resources. Most of the language resources used for this purpose are collections of texts as the Brown Corpus and the Penn Treebank, but electronic lexicons (WordNet, FrameNet, VerbNet, ComLex, Lexicon-Grammar...) and formal grammars (TAG...) developed recently. Most processes of construction of lexicons and grammars are manual, whereas the construction of corpora has always been highly automated. However, more and more specialists of language processing realize that the information content of lexicons and grammars is richer than that of corpora, and hence the former make more elaborate processing possible. The difference in construction time is likely to be connected with the difference in information content: the handcrafting of lexicons and grammars by linguists would make them more informative than automatically generated data. This situation can evolve into two directions: either specialists of language technology get progressively used to handling manually constructed resources, which are more informative and more complex, or the process of construction of lexicons and grammars is automated and industrialized, which is the mainstream perspective. Both evolutions are already in progress, and a tension exists between them. The relation between linguists and computer scientists depends on the future of these evolutions, since the first implies training and hiring numerous linguists, whereas the other depends essentially on solutions elaborated by computer engineers. The aim of this article is to analyse practical examples of the language resources in question, and to discuss about which of the two trends, handcrafting or generating industrially, or a combination of both, can give the best results or is the most realistic.
Subjects: Computation and Language (cs.CL)
ACM classes: I.7.0
Cite as: arXiv:2606.03412 [cs.CL]
  (or arXiv:2606.03412v1 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2606.03412
arXiv-issued DOI via DataCite (pending registration)
Journal reference: Léxico e gramática: dos sentidos à construção da significação, Cultura acadêmica, 2009, Trilhas Lingüísticas, 16, pp.51-84

Submission history

From: Eric Laporte [view email]
[v1] Tue, 2 Jun 2026 09:55:44 UTC (374 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled Lexicons and grammars for language processing: industrial or handcrafted products?, by Eric Laporte
  • View PDF

Current browse context:

cs.CL
< prev   |   next >
Change to browse by:
cs

References & Citations

Loading...

BibTeX formatted citation

loading...
Data provided by:

Bookmark

BibSonomy Reddit
Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos

Demos

Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers

Recommenders and Search Tools

Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from arXiv — NLP / Computation & Language