Training Dynamics of Neural Software Defect Predictors under Coupled Data-Quality Issues
Mirrored from arXiv — Machine Learning for archival readability. Support the source by reading on the original site.
Computer Science > Machine Learning
Title:Training Dynamics of Neural Software Defect Predictors under Coupled Data-Quality Issues
Abstract:Context: Software defect prediction supports maintenance decisions such as testing prioritization, release-risk assessment, and quality monitoring. However, metric-based SDP datasets often contain coupled data-quality issues, especially class imbalance and class overlap. Prior work has mainly measured their impact through endpoint performance, while recent evidence suggests that such issues may also appear in neural training dynamics (gradients, weights, biases, error trajectories). However, these studies examine issues in isolation, leaving open how internal neural network training patterns manifest when data quality issues are coupled.
Objective: We investigate how training-dynamics patterns from class imbalance, overlap, and their coupling can be characterized under interaction-aware conditions in deep learning-based SDP.
Method: We conduct a controlled intervention study on class-level UBD datasets, training a fixed MLP under imbalance-only, overlap-only, and joint conditions across five seeds. Training dynamics are logged per epoch; fidelity is monitored via coupling ratios. Patterns are characterized using effect sizes, trajectories, sensitivity analyses, and rule-based classification.
Expected contribution: The study will produce an interaction-aware empirical protocol and a candidate taxonomy of training-dynamics patterns for coupled data-quality issues in metric-based SDP.
| Subjects: | Machine Learning (cs.LG) |
| Cite as: | arXiv:2606.24968 [cs.LG] |
| (or arXiv:2606.24968v1 [cs.LG] for this version) | |
| https://doi.org/10.48550/arXiv.2606.24968
arXiv-issued DOI via DataCite
|
Submission history
From: Emmanuel Charleson Dapaah [view email][v1] Tue, 23 Jun 2026 10:08:55 UTC (124 KB)
Access Paper:
- View PDF
- HTML (experimental)
- TeX Source
References & Citations
Bibliographic and Citation Tools
Code, Data and Media Associated with this Article
Demos
Recommenders and Search Tools
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.
More from arXiv — Machine Learning
-
Can AI Draw Science? A Benchmark for Evaluating Scientific Figure Generation by Text-to-Image and Multimodal Models
Jun 30
-
On the Necessity of a Liquid Substrate for Mesh Intelligence
Jun 30
-
Position: RL Researchers Need to Distinguish Between Solving Simulators and Using Simulators as a Proxy
Jun 30
-
Learning to Distributedly Estimate under Partially Known Dynamics: A Covariance-Agnostic Neural Kalman Consensus Filter
Jun 30
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.