|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- gpt |
|
|
- byte-tokenization |
|
|
- mobile |
|
|
- embedded |
|
|
- onnx |
|
|
license: cc-by-nc-4.0 |
|
|
datasets: |
|
|
- custom |
|
|
- web |
|
|
language: en |
|
|
widget: |
|
|
- text: "In order to make pancakes, you need to" |
|
|
- text: "Once upon a time" |
|
|
--- |
|
|
|
|
|
<p align="center"> |
|
|
<img src="logo.png" alt="IJK Technology" width="150"> |
|
|
</p> |
|
|
|
|
|
<h1 align="center">IJK Technology β ByteGPT-small</h1> |
|
|
|
|
|
|
|
|
**ByteGPT-small** is a small GPT-style language model trained using byte tokenization inspired by the ByT5 paper. It is designed for use on compute- and memory-constrained devices, such as mobile phones and embedded systems. |
|
|
|
|
|
## π Overview |
|
|
- **Model Type:** GPT-style causal language model |
|
|
- **Tokenizer:** Byte-level tokenization (from ByT5) |
|
|
- **Intended Use:** Edge devices, mobile phones, embedded systems |
|
|
- **Size:** Small (initial prototype) |
|
|
- **Training:** Custom-trained from scratch |
|
|
|
|
|
## π§ Why Byte Tokenization? |
|
|
Byte tokenization offers several advantages for small-scale, efficient models: |
|
|
|
|
|
1. **Reduced Memory Footprint:** |
|
|
Byte-level tokenization drastically reduces the size of the embedding layer, making the model suitable for devices with limited RAM. |
|
|
|
|
|
2. **No External Dependencies:** |
|
|
Unlike subword tokenizers (e.g., SentencePiece, BPE), byte tokenization requires no external libraries for tokenization. A simple Python script can handle tokenization. |
|
|
|
|
|
3. **Robustness to Noise:** |
|
|
Byte-level models are more robust to misspellings, typos, and out-of-vocabulary tokens. |
|
|
|
|
|
## π‘ Future Plans |
|
|
This is the **first** in a series of models. While this model is not yet highly useful due to its small size, it represents the foundation for future versions. Upcoming releases will include: |
|
|
|
|
|
- **Larger Models:** Scaled-up versions with better performance |
|
|
- **Distilled Models:** Using GPRO distillation to create highly efficient small models |
|
|
- **Benchmark Results:** Comparative performance on mobile devices |
|
|
|
|
|
## π» Usage |
|
|
|
|
|
### **Quick Start (with `transformers`):** |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained("ijktech/ByteGPT-small", trust_remote_code=True) |
|
|
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-small") |
|
|
|
|
|
input_text = "What is the capital of France?" |
|
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
### Tokenizer |
|
|
|
|
|
The tokenizer is byte-level, compatible with AutoTokenizer from Hugging Face: |
|
|
|
|
|
```python |
|
|
tokenizer = AutoTokenizer.from_pretrained("ijktech/ByteGPT-small") |
|
|
``` |
|
|
|
|
|
### ONNX |
|
|
|
|
|
The model is also available in ONNX format, and can be used with the ONNX Runtime: |
|
|
|
|
|
```python |
|
|
import onnxruntime as ort |
|
|
import numpy as np |
|
|
|
|
|
# Create ONNX Runtime session |
|
|
ort_session = ort.InferenceSession("model.onnx") |
|
|
|
|
|
# Helper function to generate text using the ONNX model |
|
|
def generate_with_onnx(prompt_ids, max_new_tokens=50, temperature=1.0): |
|
|
input_ids = prompt_ids.clone() |
|
|
|
|
|
for _ in range(max_new_tokens): |
|
|
# Get the last block_size tokens if input is too long |
|
|
if input_ids.shape[1] > model.block_size: |
|
|
input_ids = input_ids[:, -model.block_size:] |
|
|
|
|
|
# Run inference |
|
|
ort_inputs = { |
|
|
'input': input_ids.cpu().numpy() |
|
|
} |
|
|
logits = ort_session.run(None, ort_inputs)[0] |
|
|
|
|
|
# Get predictions for the next token |
|
|
logits = torch.from_numpy(logits) |
|
|
logits = logits[:, -1, :] # Only take the last token's predictions |
|
|
|
|
|
# Apply temperature |
|
|
if temperature != 1.0: |
|
|
logits = logits / temperature |
|
|
|
|
|
# Sample from the distribution |
|
|
probs = torch.nn.functional.softmax(logits, dim=-1) |
|
|
next_token = torch.multinomial(probs, num_samples=1) |
|
|
|
|
|
# Append the new token |
|
|
input_ids = torch.cat([input_ids, next_token], dim=1) |
|
|
|
|
|
return input_ids |
|
|
|
|
|
# Test the generation |
|
|
prompt = "Hello" |
|
|
prompt_ids = tok(prompt, return_tensors="pt")["input_ids"] |
|
|
generated_ids = generate_with_onnx(prompt_ids) |
|
|
generated_text = tok.decode(generated_ids[0], skip_special_tokens=True) |
|
|
print(f"Generated text: {generated_text}") |
|
|
#Generated text: Hello everyone! |
|
|
#A dinner is only available for St. Loui |
|
|
``` |
|
|
|
|
|
### Android Usage |
|
|
|
|
|
We've just released an Android SDK. You can find the SDK on our [GitHub](https://github.com/ijktech/ByteGPT-Android). |
|
|
|
|
|
The SDK can be included in your Android project by adding the following to your `build.gradle` file: |
|
|
|
|
|
``` |
|
|
repositories { |
|
|
maven { |
|
|
url = uri("https://raw.githubusercontent.com/ijktech/ByteGPT-Android/maven-repo") |
|
|
} |
|
|
} |
|
|
|
|
|
dependencies { |
|
|
implementation("com.github.ijktech:ByteGPT-Android:1.0.9") |
|
|
} |
|
|
``` |
|
|
|
|
|
|
|
|
### iOS Usage |
|
|
|
|
|
Coming Soon! |
|
|
|
|
|
|
|
|
## π License |
|
|
π **CC-BY-NC-4.0**: Free for non-commercial use. |
|
|
|
|
|
πΌ **Commercial Use**: Contact IJK Technology Ltd for licensing at [[email protected]](mailto:[email protected]). |
|
|
|
|
|
## π οΈ About IJK Technology Ltd |
|
|
IJK Technology Ltd (IJKTech) develops innovative machine learning models optimized for on-device inference. Our focus is on efficiency, privacy, and usability across mobile and embedded platforms. |