Elvis-t9 commited on
Commit
fbf453a
·
verified ·
1 Parent(s): a14932e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -7
README.md CHANGED
@@ -5,13 +5,11 @@
5
  </a>
6
  </div>
7
 
8
- # Introduction
9
 
10
 
 
11
 
12
- ## C2LLM: Advanced Code Embeddings for Deep Semantic Understanding
13
-
14
- **C2LLMs (Code Contrastive Large Language Model)** is a powerful new model for generating code embeddings, designed to capture the deep semantics of source code.
15
 
16
  #### Key Features
17
 
@@ -27,7 +25,7 @@ C2LLM is designed to be a go-to model for tasks like code search and Retrieval-A
27
 
28
  ## Usage (**HuggingFace Transformers**)
29
 
30
- ```plain
31
  from transformers import AutoModel, AutoTokenizer
32
  import torch
33
 
@@ -36,6 +34,9 @@ model_path = "codefuse-ai/C2LLM-7B"
36
  # Load the model
37
  model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True)
38
 
 
 
 
39
  # Prepare the data
40
  sentences = ['''int r = (int) params >> 8 & 0xff;
41
  int p = (int) params & 0xff;
@@ -63,6 +64,8 @@ return new RangeInfo(inclusive ? tempTo : tempTo + 1, tempFrom + 1, true);
63
  return new RangeInfo(tempFrom, inclusive ? tempTo + 1 : tempTo, false);
64
  }''']
65
 
 
 
66
  # Get the embeddings
67
  embeddings = model.encode(sentences)
68
  ```
@@ -113,7 +116,7 @@ embeddings = model.encode(sentences)
113
 
114
  ## Evaluation (**MTEB**)
115
 
116
- ```plain
117
  from sentence_transformers import SentenceTransformer
118
  from mteb.models import ModelMeta
119
  from mteb.cache import ResultCache
@@ -141,4 +144,4 @@ If you find this project helpful, please give it a star. It means a lot to us!
141
 
142
  ## Correspondence to
143
 
144
- Jin Qin ([email protected]), Zihan Liao ([email protected]), Ziyin Zhang ([email protected]), Hang Yu ([email protected]), Peng Di ([email protected])
 
5
  </a>
6
  </div>
7
 
 
8
 
9
 
10
+ # A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling
11
 
12
+ **C2LLMs (Code Contrastive Large Language Models)** are powerful new models for generating code embeddings, designed to capture the deep semantics of source code.
 
 
13
 
14
  #### Key Features
15
 
 
25
 
26
  ## Usage (**HuggingFace Transformers**)
27
 
28
+ ```Python
29
  from transformers import AutoModel, AutoTokenizer
30
  import torch
31
 
 
34
  # Load the model
35
  model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True)
36
 
37
+ # Prepare your custom instruction
38
+ instruction = "xxxxx"
39
+
40
  # Prepare the data
41
  sentences = ['''int r = (int) params >> 8 & 0xff;
42
  int p = (int) params & 0xff;
 
64
  return new RangeInfo(tempFrom, inclusive ? tempTo + 1 : tempTo, false);
65
  }''']
66
 
67
+ sentences = [instruction+sentence for sentence in sentences]
68
+
69
  # Get the embeddings
70
  embeddings = model.encode(sentences)
71
  ```
 
116
 
117
  ## Evaluation (**MTEB**)
118
 
119
+ ```python
120
  from sentence_transformers import SentenceTransformer
121
  from mteb.models import ModelMeta
122
  from mteb.cache import ResultCache
 
144
 
145
  ## Correspondence to
146
 
147
+ Jin Qin ([email protected]), Zihan Liao ([email protected]), Ziyin Zhang ([email protected]), Hang Yu ([email protected]), Peng Di ([email protected])