YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card for aedupuga/multioutput-regression-models

Model Description

This model card describes the multi-output regression models trained on the aedupuga/2025-scaffold-strucutres dataset. The models predict structural properties of DNA sequences based on their sequence and other features.

  • Model developed by: Anuhya Edupuganti
  • Model type: Multi-output regression models (e.g., Ridge, Elastic Net, etc.)

Model Sources

Direct Use

  • These models can be used to predict structural properties of new DNA sequences. The inputs should be the sequence (one hot encoded), length_bp, GC_content, and AT_content in the same format as the training data.

Bias, Risks, and Limitations

  • The models are trained on a specific dataset and may not generalize well to sequences with significantly different characteristics.

Training Data:

The models were trained on the original split of the aedupuga/2025-scaffold-strucutres dataset, which contains features like sequence, length_bp, GC_content and target variables mfe_energy, num_pairs, stem_len_mean, num_stems, num_hairpins, and num_internal_loops.

Evaluation Data:

The models were evaluated using Mean Absolute Error (MAE) per target variable, Overall Mean Squared Error (MSE), and Overall R2 score on a test set. The results of this evaluation are below:

index MAE per Target Overall MSE Overall R2 Training Time (s) Prediction Time (s)
Elastic Net Regression {'mfe_energy': 52.246284144510895, 'num_pairs': 26.310440395684935, 'stem_len_mean': 0.12521268046915585, 'num_stems': 11.824946984005694, 'num_hairpins': 6.362566878951059, 'num_internal_loops': 10.42332493488957} 1106.2239040178551 0.826949061716721 37.89513540267944 0.1340947151184082
Gradient Boosting Regressor {'mfe_energy': 93.86046583448288, 'num_pairs': 62.12858533728426, 'stem_len_mean': 0.1195790099334551, 'num_stems': 19.521731017111673, 'num_hairpins': 8.17095118930435, 'num_internal_loops': 13.708766069413938} 8056.465535344057 0.6354714816262127 1064.1453528404236 0.1442549228668213
Hist Gradient Boosting Regressor {'mfe_energy': 92.7948317451044, 'num_pairs': 119.05137751966541, 'stem_len_mean': 0.09455135368867978, 'num_stems': 38.937795002481145, 'num_hairpins': 14.538582916907997, 'num_internal_loops': 17.869036566267987} 22401.159492850904 0.8354263411439559 2276.7718391418457 0.05630350112915039
LGBM Regressor {'mfe_energy': 101.99282118712706, 'num_pairs': 118.43061288454638, 'stem_len_mean': 0.09833922311726692, 'num_stems': 40.143725672660345, 'num_hairpins': 14.649323146842754, 'num_internal_loops': 17.48710432164195} 23866.947492270672 0.8261400755125136 110.61460065841675 2.587249279022217
Ridge Regression {'mfe_energy': 53.306863779432625, 'num_pairs': 25.654395957994026, 'stem_len_mean': 0.08403309633471835, 'num_stems': 11.393997952747661, 'num_hairpins': 5.67977376648804, 'num_internal_loops': 9.260745328034114} 1260.7624462037288 0.9156932974948483 7.063617944717407 0.12312531471252441
Lasso Regression {'mfe_energy': 67.2766660142239, 'num_pairs': 31.48700612938905, 'stem_len_mean': 0.12521713179836697, 'num_stems': 13.158785656539967, 'num_hairpins': 6.854702974737726, 'num_internal_loops': 11.13869663689622} 1823.6267070867707 0.8248397294025618 51.86927938461304 0.12734723091125488
MLP Regressor {'mfe_energy': 113.60031276554486, 'num_pairs': 76.11145098696264, 'stem_len_mean': 1.7844990300743258, 'num_stems': 19.919928534641326, 'num_hairpins': 9.225894814725708, 'num_internal_loops': 13.794781026278551} 5507.494866833836 -34.39226684672794 68.65580224990845 0.13591504096984863

Model Card Contact

Anuhya Edupuganti (Carnegie Mellon Univerity)- [email protected]

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using aedupuga/multioutput-regression-models 1