How to add txt, csv file and customize embedfile ?

by lbarasc - opened Jan 26

Discussion

lbarasc

Jan 26

Hi, i just discovered your Embedfile tool, and this is really huge !

I want to use but i have some questions about it.

For example : if i want to add text file, i do :

all-minilm-l6-v2.f16.embedfile.exe import --embed text mytest.txt mybase.db

can you tell me the caracteristics of the .txt file (encoding : utf-8 ?, line break : CR+LF, ... ?)

if i want to add CSV file :

all-minilm-l6-v2.f16.embedfile.exe import --embed text mytest.csv mybase.db

can you specify format of CSV (char separator,encoding..., number of column, name of columns...)

Last question : how to create my own Embedfile.exe with add .gguf ? can i simply binary copy embedfile.exe + gguf ? or what else ?

Thank you for your help, and your tool.

Sincerely,

Lionel.

jglowa

Aug 11

@lbarasc answering your questions using ChatGPT based on the embedfile source code:

TXT File Input Characteristics

Encoding:

UTF-8 is required. The source uses lines_read(?) internally via sqlite-lines, which assumes UTF-8 encoded text.

Line Breaks:

lines_read handles both LF (\n) and CR+LF (\r\n)

Format:

One logical entry per line.
No special header or metadata required.
Empty lines may be treated as empty strings (not skipped unless handled downstream).

CSV File Input Characteristics

Encoding:

Same as TXT: UTF-8. The CSV virtual table (sqlite-csv) reads from file directly using filename, and there's no transcoding.

Character Separator:

Default: Comma , The code does not specify a custom separator, so only standard CSV is supported.

Header:

Required: "CREATE VIRTUAL TABLE temp.source USING csv(filename=\"%w\", header=yes)". If your CSV lacks a header, import will fail or misinterpret the first row.

Column Names:

Must be valid SQLite identifiers (letters, digits, underscores).
Avoid duplicate column names.
Required to match the --embed COLUMN name (case-sensitive match in SQLite by default).

Column Count:

No hard limit, but embedfile uses SELECT * FROM temp.source, so the number of columns must match consistently across all rows.

Creating Your Own `embedfile` With a Custom Model

You can use a process similar to llamafile:

Option 1: `zipalign` approach (like llamafile)

zipalign -j0 embedfile model.gguf .args

.args should contain CLI arguments like:

-m
model.gguf

Option 2: Environment or CLI flag

You can also just do:

embedfile -m ./my-model.gguf import --embed text input.csv output.db

This is equivalent, but less portable than a self-contained binary.

Summary Table

Format	Encoding	Line Breaks	Special Notes
`.txt`	UTF-8	LF or CR+LF	One entry per line. Used with `lines_read()`
`.csv`	UTF-8	LF or CR+LF	Comma-separated, header required. No support for custom delimiters.
`.json`/`.ndjson`	UTF-8	LF or CR+LF	Structured parsing via `json_each()` and `lines_read()`.
`.db`	SQLite DB	—	You must provide `--table NAME`. Currently not implemented

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

How to add txt, csv file and customize embedfile ?

TXT File Input Characteristics

Encoding:

Line Breaks:

Format:

CSV File Input Characteristics

Encoding:

Character Separator:

Header:

Column Names:

Column Count:

Creating Your Own embedfile With a Custom Model

Option 1: zipalign approach (like llamafile)

Option 2: Environment or CLI flag

Summary Table

Creating Your Own `embedfile` With a Custom Model

Option 1: `zipalign` approach (like llamafile)