r/LocalLLaMA · · 5 min read

Book Review: Domain-Specific Small Language Models by Guglielmo Iozzia

Mirrored from r/LocalLLaMA for archival readability. Support the source by reading on the original site.

Domain-Specific Small Language Models

Guglielmo Iozzia

Review by u/skiata

I came across Domain-Specific Small Language Models (https://www.manning.com/books/domain-specific-small-language-models) by attending the author's talk at an ACM Tech Talk (https://learning.acm.org/techtalks) on June 25--a book tour for nerds I suppose.

My background and orientation

It's useful to have an idea of a reviewer's orientation towards the book to help calibrate the review. So real quick:

  • I am an AI time-traveller, founded my first company in 1999, involved with LingPipe, an early open source NLP toolkit and have built more than 50 less than 500 (depending on how you count) AI systems spanning legal, defense, finance and done research for DARPA, NIH and so on.
  • I work with SLMs (small language models) all the time.
  • I have nothing to do the publisher, Manning. Bought the book like a regular schmoe.
  • I don't know Guglielmo Iozzia, but technically speaking he is clearly a brother from another Nonna and I get where he is coming from.

TL;DR

Not a beginner book but accessible to a manager familiar in the LLM space, a recipe book that dives into details, important topic, good overview, useful thoughts/discussions will follow.

Review

This book argues that SLMs (small language models) are the wave of the future so pull your head out of OpenAI's *** (generalist LLMs) and get with the program of creating specialized SLMs fine-tuned to the needs at hand.

The best lines came from Iozzia's talk:

The book argues a paradigm shift ...

  • from renting intelligence to owning it
  • from general capability to specific mastery
  • from centralized intelligence to distributed intelligence

Iozzia provides a general framework for approaching domain-specific language models, honestly 'small' is irrelevant, and backs it with sufficient juice to make this an argument from example rather than principles, popularity or hipness.

Excellent. My kind of book.

The book "fits better" a year ago when fine-tuning was top of mind for LLM practitioners, more of "how to fine-tune vibe" back then than the current "is fine-tuning is worth it? Probably not" vibe now. But I don't let breathless predictions of generalist AGI and massive IPOs dictate my engineering decisions and neither should you.

I rather appreciated the stance on AGI, I quote:

In early 2023, large tech organizations started rushing to “win” the LLM race and reach so-called AGI (artificial general intelligence), fueled by daily hype. That push continued through 2024 and early 2025 and led to larger and larger language models, based on the assumption that more data and more compute (and lately also time-scale compute) would make these models reason like humans across a wide range of tasks, rather than excel at a single narrow task (or a small set), as with today’s ML/AI. The reality is that, because of their architecture, language models based on Transformer variants won’t converge to AGI. They are, however, useful for narrow but nontrivial tasks when tuned on high-quality domain-specific datasets or integrated into a broader system.

I guess he, with me, will be the first against the wall when AGI happens.

The particular use-cases don't matter, pharma and general multi-agent toy systems, the architectures and laundry lists of libraries do. We have in particular:

  1. How to fine-tune
  2. How to quantize
  3. RAG
  4. Graph-DBs
  5. Parameter optimization
  6. Multi-agent
  7. Production deployment
  8. Run on your laptop (underrated exercise IMHO)
  9. A rather enjoyable Formula-1 analogy in chapter 13.

None of it in great detail, but enough to get started. Perfect. That is where the value is--get control, get visibility into what your LMs are doing and tune the crap out of them.

Criticisms

Over half the book is recipes and a minor criticism is that the LLM universe has moved considerably since the some of recipes were written. Unsolvable, but the value remains because even 2 year old frameworks are a useful starting place if you happen to want to build a RAG-graph-db multi-agent SLM system.

More seriously, Iozzia fails to convey how hard it is to fine-tune an LM, Small or Large. It is akin to going to the dealership and buying a Miata vs building your own race car. It is 10 to 100 times the effort in my experience. A fine-tuned model may well fix your problems, but you are going to have to work for it.

Related, the skills necessary to fine-tune are rare. It is like building AI systems at the turn-of-the-century (ha, just made a bunch of people feel very old).

There is limited discussion of evaluation harnesses (3.4, 4.1, ...) in a tactical role. Evaluation functions as the spine of any serious project, it is not an add-on. I'd have organized the entire book around evaluation because it guides so many decisions.

There is talk of how do SLMs address regulatory issues but I don't see any details. How does having a fine-tuned LM help when facing the FDA? Some pointers there I'd really appreciate.

Structured decoding and learning have little discussion despite the book covering Manim Python (Ch.3/7), SMILES strings and protein/antibody sequences (Ch.8). There is a good discussion in chapter 13's use of CodeAgent (actions as Python) vs ToolCallingAgent (actions as JSON). In fairness, Iozzia notes the value of determinism and directs one to validate formats and data ranges but <soapbox> a) there are trivial ways to achieve valid syntax (e.g, llguidance) and b) I'd argue that the lack of verifiable quality in structured output semantics is a huge problem fundamentally blocking LM adoption, S or not. </soapbox>

Conclusion

If you have any creative role in LM systems then you owe yourself exposure to the ideas in this book even if to just disagree with them. There are management level chapters and you can full on geek out on running code--so something for everybody. AI hype is real, this book is about system building independent of that hype.

submitted by /u/Skiata
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/LocalLLaMA