diff --git a/README.md b/README.md index 8436f5d..a34687a 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,7 @@ tags: - sentence-transformers - sentence-similarity - feature-extraction +- text-embeddings-inference --- # Qwen3-Embedding-8B @@ -197,6 +198,29 @@ print(scores.tolist()) 📌 **Tip**: We recommend that developers customize the `instruct` according to their specific scenarios, tasks, and languages. Our tests have shown that in most retrieval scenarios, not using an `instruct` on the query side can lead to a drop in retrieval performance by approximately 1% to 5%. +### Text Embeddings Inference (TEI) Usage + +You can either run / deploy TEI on NVIDIA GPUs as: + +```bash +docker run --gpus all -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7.2 --model-id Qwen/Qwen3-Embedding-8B --dtype float16 +``` + +Or on CPU devices as: + +```bash +docker run -p 8080:80 -v hf_cache:/data --pull always ghcr.io/huggingface/text-embeddings-inference:cpu-1.7.2 --model-id Qwen/Qwen3-Embedding-8B --dtype float16 +``` + +And then, generate the embeddings sending a HTTP POST request as: + +```bash +curl http://localhost:8080/embed \ + -X POST \ + -d '{"inputs": ["Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: What is the capital of China?", "Instruct: Given a web search query, retrieve relevant passages that answer the query\nQuery: Explain gravity"]}' \ + -H "Content-Type: application/json" +``` + ## Evaluation ### MTEB (Multilingual)