arXiv — NLP / Computation & Language · · 4 min read

Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification

Mirrored from arXiv — NLP / Computation & Language for archival readability. Support the source by reading on the original site.

Computer Science > Computation and Language

arXiv:2605.20193 (cs)
[Submitted on 4 Apr 2026]

Title:Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification

View a PDF of the paper titled Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification, by Aisvarya Adeseye and 2 other authors
View PDF HTML (experimental)
Abstract:Quantized Large Language Models (LLMs) are used more often in qualitative analysis because they run fast and need fewer computing resources. This study examines how different lower bits quantization levels (8-bit, 4-bit, 3-bit, and 2-bit) and quantization types affect the performance of LLaMA-3.1 (8B) on qualitative analysis. The study uses expert and non-expert responses from 82 interview transcripts. Low-bit models often produce higher levels of hallucinations and unstable results, especially when reading non-expert language with unclear terms. To improve performance, we propose a quantization-aware multi-pass prompt verification method. This method guides the model through controlled steps that reduce hallucinations. It removes unreliable content and passes the results to the next transcript after verification, improving accuracy. To validate performance, human coders analyzed transcripts using NVivo and BF16 LLaMA. BF16 LLaMA-3.1 produced high-precision output but had semantic drift and hallucination. These errors were corrected manually. The corrected BF16 output and NVivo human coding were combined to create a gold-standard ground truth (GSGT) for thematic extraction and frequency analysis. The results show that 8-bit models stay closest to the GSGT. The 4-bit models lose accuracy but become stable when the proposed method is applied. The 3-bit and 2-bit models drop in performance because of heavy compression, but they improve with the proposed prompt design and verification. The study also finds that models at the same bit level behave differently depending on quantization type. Overall, the method helps low-resource LLMs become more stable, accurate, and suitable for qualitative research at lower cost.
Comments: Accepted to publish in 12th Intelligent Systems Conference 2026; 3-4 September 2026 in Amsterdam, The Netherlands
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as: arXiv:2605.20193 [cs.CL]
  (or arXiv:2605.20193v1 [cs.CL] for this version)
  https://doi.org/10.48550/arXiv.2605.20193
arXiv-issued DOI via DataCite

Submission history

From: Aisvarya Adeseye Mrs [view email]
[v1] Sat, 4 Apr 2026 04:50:03 UTC (8,180 KB)
Full-text links:

Access Paper:

    View a PDF of the paper titled Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification, by Aisvarya Adeseye and 2 other authors
  • View PDF
  • HTML (experimental)
  • TeX Source

Current browse context:

cs.CL
< prev   |   next >
Change to browse by:

References & Citations

Loading...

BibTeX formatted citation

loading...
Data provided by:

Bookmark

BibSonomy Reddit
Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle
alphaXiv (What is alphaXiv?)
Links to Code Toggle
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub Toggle
DagsHub (What is DagsHub?)
GotitPub Toggle
Gotit.pub (What is GotitPub?)
Huggingface Toggle
Hugging Face (What is Huggingface?)
ScienceCast Toggle
ScienceCast (What is ScienceCast?)
Demos

Demos

Replicate Toggle
Replicate (What is Replicate?)
Spaces Toggle
Hugging Face Spaces (What is Spaces?)
Spaces Toggle
TXYZ.AI (What is TXYZ.AI?)
Related Papers

Recommenders and Search Tools

Link to Influence Flower
Influence Flower (What are Influence Flowers?)
Core recommender toggle
CORE Recommender (What is CORE?)
About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from arXiv — NLP / Computation & Language