r/MachineLearning · · 1 min read

Should I Commit and Publish the Results? [R]

Mirrored from r/MachineLearning for archival readability. Support the source by reading on the original site.

Hello Reddit

I've been working on QSPR (Quantitative Structure-Property Relationship) analysis for chemical compounds mentioned in the Jean-Claude Bradley Open Melting Point Dataset. Basically the idea is to see how accurate a model can predict melting points of compounds using only topological indices. After some work on the topological indices (feature engineering), each compound was represented by 26 features.

I trained a random forest model on the data and got a test r2 score of 0.66 (which is pretty respectable, given the constraints). However, the file size of the model was around 1.23GB. I didn't like it being that big, so I opened up PyTorch to build a custom deep learning architecture that could make predictions as accurately as the random forest but with much smaller file size.

After around 2 weeks of research, I build a 270,000 learnable parameter model (1.3-1.4MB according to torchinfo) that got an r2 score 0f 0.6399.

Given all this context, I wanted to ask the following question:
Should I commit and work on publishing the results, or should I keep working on improving the model?

Note: I'm obligated by my university to not give out intricate details of my research before publication, so please forgive me if such details are required for a high quality answer.

However, I can give out the metrics achieved by my little deep learning model. Here it is:

=== Evaluation Metrics (Expected Value) ===
R² Score : 0.639910
MAE : 41.246754
MSE : 2989.062744
RMSE : 54.672322
NRMSE : 0.083469

MAPE : 11.69%

The unit for MAE, MSE, RMSE and NRMSE is Kelvin (K).

submitted by /u/AgiGamesYT
[link] [comments]

Discussion (0)

Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.

Sign in →

No comments yet. Sign in and be the first to say something.

More from r/MachineLearning