YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Model Card for aedupuga/multioutput-regression-models
Model Description
This model card describes the multi-output regression models trained on the aedupuga/2025-scaffold-strucutres dataset. The models predict structural properties of DNA sequences based on their sequence and other features.
- Model developed by: Anuhya Edupuganti
- Model type: Multi-output regression models (e.g., Ridge, Elastic Net, etc.)
Model Sources
Direct Use
- These models can be used to predict structural properties of new DNA sequences. The inputs should be the sequence (one hot encoded), length_bp, GC_content, and AT_content in the same format as the training data.
Bias, Risks, and Limitations
- The models are trained on a specific dataset and may not generalize well to sequences with significantly different characteristics.
Training Data:
The models were trained on the original split of the aedupuga/2025-scaffold-strucutres dataset, which contains features like sequence, length_bp, GC_content and target variables mfe_energy, num_pairs, stem_len_mean, num_stems, num_hairpins, and num_internal_loops.
Evaluation Data:
The models were evaluated using Mean Absolute Error (MAE) per target variable, Overall Mean Squared Error (MSE), and Overall R2 score on a test set. The results of this evaluation are below:
| index | MAE per Target | Overall MSE | Overall R2 | Training Time (s) | Prediction Time (s) |
|---|---|---|---|---|---|
| Elastic Net Regression | {'mfe_energy': 52.246284144510895, 'num_pairs': 26.310440395684935, 'stem_len_mean': 0.12521268046915585, 'num_stems': 11.824946984005694, 'num_hairpins': 6.362566878951059, 'num_internal_loops': 10.42332493488957} | 1106.2239040178551 | 0.826949061716721 | 37.89513540267944 | 0.1340947151184082 |
| Gradient Boosting Regressor | {'mfe_energy': 93.86046583448288, 'num_pairs': 62.12858533728426, 'stem_len_mean': 0.1195790099334551, 'num_stems': 19.521731017111673, 'num_hairpins': 8.17095118930435, 'num_internal_loops': 13.708766069413938} | 8056.465535344057 | 0.6354714816262127 | 1064.1453528404236 | 0.1442549228668213 |
| Hist Gradient Boosting Regressor | {'mfe_energy': 92.7948317451044, 'num_pairs': 119.05137751966541, 'stem_len_mean': 0.09455135368867978, 'num_stems': 38.937795002481145, 'num_hairpins': 14.538582916907997, 'num_internal_loops': 17.869036566267987} | 22401.159492850904 | 0.8354263411439559 | 2276.7718391418457 | 0.05630350112915039 |
| LGBM Regressor | {'mfe_energy': 101.99282118712706, 'num_pairs': 118.43061288454638, 'stem_len_mean': 0.09833922311726692, 'num_stems': 40.143725672660345, 'num_hairpins': 14.649323146842754, 'num_internal_loops': 17.48710432164195} | 23866.947492270672 | 0.8261400755125136 | 110.61460065841675 | 2.587249279022217 |
| Ridge Regression | {'mfe_energy': 53.306863779432625, 'num_pairs': 25.654395957994026, 'stem_len_mean': 0.08403309633471835, 'num_stems': 11.393997952747661, 'num_hairpins': 5.67977376648804, 'num_internal_loops': 9.260745328034114} | 1260.7624462037288 | 0.9156932974948483 | 7.063617944717407 | 0.12312531471252441 |
| Lasso Regression | {'mfe_energy': 67.2766660142239, 'num_pairs': 31.48700612938905, 'stem_len_mean': 0.12521713179836697, 'num_stems': 13.158785656539967, 'num_hairpins': 6.854702974737726, 'num_internal_loops': 11.13869663689622} | 1823.6267070867707 | 0.8248397294025618 | 51.86927938461304 | 0.12734723091125488 |
| MLP Regressor | {'mfe_energy': 113.60031276554486, 'num_pairs': 76.11145098696264, 'stem_len_mean': 1.7844990300743258, 'num_stems': 19.919928534641326, 'num_hairpins': 9.225894814725708, 'num_internal_loops': 13.794781026278551} | 5507.494866833836 | -34.39226684672794 | 68.65580224990845 | 0.13591504096984863 |
Model Card Contact
Anuhya Edupuganti (Carnegie Mellon Univerity)- [email protected]
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support