SentenceTransformer based on answerdotai/ModernBERT-base

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: answerdotai/ModernBERT-base
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- json

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'EHL tendon reconstruction',
    'A Combined Surgical Approach for Extensor Hallucis Longus Reconstruction: Two Case Reports. ',
    'Flexor tendon reconstruction. ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Dataset: triplet-dev
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	0.887

Training Details

Training Dataset

json

Dataset: json
Size: 10,053 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 4 tokens mean: 8.86 tokens max: 34 tokens	min: 4 tokens mean: 21.84 tokens max: 62 tokens	min: 3 tokens mean: 13.65 tokens max: 50 tokens

Samples:

anchor	positive	negative
`COM-induced secretome changes in U937 monocytes`	`Characterization of calcium oxalate crystal-induced changes in the secretome of U937 human monocytes.`	`Monocytes.`
`Metamaterials`	`Sound attenuation optimization using metaporous materials tuned on exceptional points.`	`Metamaterials: A cat's eye for all directions.`
`Pediatric Parasitology`	`Parasitic infections among school age children 6 to 11-years-of-age in the Eastern province.`	`[DIALOGUE ON PEDIATRIC PARASITOLOGY].`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
learning_rate: 0.0002
num_train_epochs: 2
lr_scheduler_type: cosine_with_restarts
warmup_ratio: 0.1
bf16: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 64
per_device_eval_batch_size: 64
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 0.0002
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 2
max_steps: -1
lr_scheduler_type: cosine_with_restarts
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	triplet-dev_cosine_accuracy
0	0	-	0.457
0.0189	1	5.2934	-
0.0377	2	5.2413	-
0.0566	3	4.9969	-
0.0755	4	4.5579	-
0.0943	5	3.9145	-
0.1132	6	3.3775	-
0.1321	7	2.8787	-
0.1509	8	3.0147	-
0.1698	9	2.7166	-
0.1887	10	2.7875	-
0.2075	11	2.3848	-
0.2264	12	2.1921	-
0.2453	13	1.7009	-
0.2642	14	1.7649	-
0.2830	15	1.7948	-
0.3019	16	1.5384	-
0.3208	17	1.6039	-
0.3396	18	1.3364	-
0.3585	19	1.3852	-
0.3774	20	1.2427	-
0.3962	21	1.3216	-
0.4151	22	1.4202	-
0.4340	23	1.2754	-
0.4528	24	1.281	-
0.4717	25	1.1709	0.815
0.4906	26	1.2363	-
0.5094	27	1.2169	-
0.5283	28	1.1495	-
0.5472	29	1.0066	-
0.5660	30	1.0478	-
0.5849	31	1.1511	-
0.6038	32	0.9992	-
0.6226	33	1.095	-
0.6415	34	1.1699	-
0.6604	35	0.9866	-
0.6792	36	1.1303	-
0.6981	37	1.1126	-
0.7170	38	0.889	-
0.7358	39	1.0355	-
0.7547	40	1.0129	-
0.7736	41	1.118	-
0.7925	42	0.8494	-
0.8113	43	1.0829	-
0.8302	44	0.8751	-
0.8491	45	0.8115	-
0.8679	46	0.8579	-
0.8868	47	1.1111	-
0.9057	48	0.9032	-
0.9245	49	1.0394	-
0.9434	50	0.9691	0.862
0.9623	51	1.023	-
0.9811	52	0.9465	-
1.0	53	0.6713	-
1.0189	54	0.9773	-
1.0377	55	0.8693	-
1.0566	56	0.7187	-
1.0755	57	0.805	-
1.0943	58	0.728	-
1.1132	59	1.0967	-
1.1321	60	0.7036	-
1.1509	61	0.8213	-
1.1698	62	0.57	-
1.1887	63	0.7006	-
1.2075	64	0.5091	-
1.2264	65	0.5758	-
1.2453	66	0.4484	-
1.2642	67	0.397	-
1.2830	68	0.6172	-
1.3019	69	0.513	-
1.3208	70	0.4447	-
1.3396	71	0.3205	-
1.3585	72	0.5881	-
1.3774	73	0.2543	-
1.3962	74	0.3648	-
1.4151	75	0.4849	0.876
1.4340	76	0.3455	-
1.4528	77	0.3424	-
1.4717	78	0.224	-
1.4906	79	0.18	-
1.5094	80	0.2255	-
1.5283	81	0.3024	-
1.5472	82	0.1835	-
1.5660	83	0.1946	-
1.5849	84	0.1958	-
1.6038	85	0.1568	-
1.6226	86	0.1626	-
1.6415	87	0.1774	-
1.6604	88	0.1934	-
1.6792	89	0.2426	-
1.6981	90	0.2958	-
1.7170	91	0.1606	-
1.7358	92	0.2281	-
1.7547	93	0.1786	-
1.7736	94	0.2241	-
1.7925	95	0.1909	-
1.8113	96	0.236	-
1.8302	97	0.1332	-
1.8491	98	0.1247	-
1.8679	99	0.156	-
1.8868	100	0.2152	0.889
1.9057	101	0.1549	-
1.9245	102	0.2226	-
1.9434	103	0.21	-
1.9623	104	0.2139	-
1.9811	105	0.1864	-
2.0	106	0.0719	0.887

Framework Versions

Python: 3.12.3
Sentence Transformers: 3.3.1
Transformers: 4.48.0.dev0
PyTorch: 2.5.1
Accelerate: 1.2.1
Datasets: 3.2.0
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Downloads last month: 52

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for wwydmanski/modernbert-pubmed-v0.1

Base model

answerdotai/ModernBERT-base

Finetuned

(981)

this model

Evaluation results

Cosine Accuracy on triplet dev
self-reported

0.887