Jina Embeddings V3: A Frontier Multilingual Embedding Model

jina-embeddings-v3 is a frontier multilingual text embedding model with 570M parameters and 8192 token-length, outperforming the latest proprietary embeddings from OpenAI and Cohere on MTEB.

Dynamic image showing the characters "V3" formed by bright green dots varying in size on a black background.
jinaai/jina-embeddings-v3 · Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
jina-embeddings-v3: Multilingual Embeddings With Task LoRA
We introduce jina-embeddings-v3, a novel text embedding model with 570 million parameters, achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting context lengths of up to 8192 tokens. The model includes a set of task-specific Low-Rank Adaptation (LoRA) adapters to generate high-quality embeddings for query-document retrieval, clustering, classification, and text matching. Additionally, Matryoshka Representation Learning is integrated into the training process, allowing flexible truncation of embedding dimensions without compromising performance. Evaluation on the MTEB benchmark shows that jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while achieving superior performance compared to multilingual-e5-large-instruct across all multilingual tasks.

Today, we are excited to announce jina-embeddings-v3, a frontier text embedding model with 570 million parameters. It achieves state-of-the-art performance on multilingual data and long-context retrieval tasks, supporting input length of up to 8192 tokens. The model features task-specific Low-Rank Adaptation (LoRA) adapters, enabling it to generate high-quality embeddings for various tasks including query-document retrieval, clustering, classification, and text matching.

In evaluations on the MTEB English, Multilingual and LongEmbed, jina-embeddings-v3 outperforms the latest proprietary embeddings from OpenAI and Cohere on English tasks, while also surpassing multilingual-e5-large-instruct across all multilingual tasks. With a default output dimension of 1024, users can arbitrarily truncate embedding dimensions down to 32 without sacrificing performance, thanks to the Matryoshka Representation Learning (MRL) integration.

Chart comparing the performance of various NLP tools on MTEB English Tasks, with scores ranging from 60 to 65.5, displayed on
The performance of jina-embeddings-v3 vs other embedding models across all MTEB English tasks. Full evaluation results per task can be found in our arXiv paper.
Graph depicting MTEB Multilingual Tasks Performance, comparing multilingual embeddings and 'jina embeddings' versions with sc
The performance of jina-embeddings-v3 has been evaluated across a broad selection of multilingual and cross-lingual MTEB tasks. Please note that jina-embeddings-v2-(zh/es/de) refers to our bilingual model suite, which was only tested on Chinese, Spanish, and German monolingual and cross-lingual tasks, excluding all other languages. Additionally, we do not report scores for openai-text-embedding-3-large and cohere-embed-multilingual-v3.0, as these models were not evaluated on the full range of multilingual and cross-lingual MTEB tasks.
Bar graph showing performance of different embeddings on long document retrieval tasks with scores for various libraries.
The performance of jina-embeddings-v3 on six long-document retrieval tasks from the LongEmbed benchmark shows a significant improvement over other models. Scores are nDCG@10; higher is better. This suggests the effectiveness of our RoPE-based positional embeddings, which outperform both the fixed positional embeddings used by baai-bge-m3 and the ALiBi-based approach used in jina-embeddings-v2.

As of its release on September 18, 2024, jina-embeddings-v3 is the best multilingual model and ranks 2nd on the MTEB English leaderboard for models with fewer than 1 billion parameters. v3 supports 89 languages in total, including 30 languages with the best performance: Arabic, Bengali, Chinese, Danish, Dutch, English, Finnish, French, Georgian, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Latvian, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu, and Vietnamese.

Leaderboard table comparing language models across various performance metrics with highlighted rankings, set on a dark, prof
As of its release on September 18, 2024, jina-embeddings-v3, featuring 570 million parameters and 1024 output dimensions, stands as the most efficient, powerful, and reliable multilingual embedding model with fewer than 1 billion parameters.
Graph showing Scaling Law of Embedding Models with 'Parameter Size' on the x-axis and 'MTEB Performance' on the y-axis, featu
Scaling law of embedding models. The average MTEB performance on English tasks is plotted against the number of model parameters. Each dot represents an embedding model. The trendline, representing all models, is highlighted, with multilingual models emphasized in cyan. One can see that jina-embeddings-v3 demonstrates superior performance compared to models of similar size, also showing a superlinear improvement over its predecessor, jina-embeddings-v2. This graph was created by selecting top-100 embedding models from the MTEB leaderboard , excluding those without size information, typically closed-source or proprietary models. Submissions identified as obvious trolling were also filtered out.

Additionally, compared to LLM-based embeddings that have recently gained attention, such as e5-mistral-7b-instruct, which has a parameter size of 7.1 billion (12x larger) and an output dimension of 4096 (4x larger) but offers only a 1% improvement on MTEB English tasks, jina-embeddings-v3 is a far more cost-efficient solution, making it more suitable for production and on-edge computing.

Model Architecture

Feature Description
Base jina-XLM-RoBERTa
Parameters Base 559M
Parameters w/ LoRA 572M
Max input tokens 8192
Max output dimensions 1024
Layers 24
Vocabulary 250K
Supported languages 89
Attention FlashAttention2, also works w/o
Pooling Mean pooling

The architecture of jina-embeddings-v3 is shown in the figure below. To implement the backbone architecture, we adapted the XLM-RoBERTa model with several key modifications: (1) enabling effective encoding of long text sequences, (2) allowing task-specific encoding of embeddings, and (3) improving overall model efficiency with latest techniques. We continue to use the original XLM-RoBERTa tokenizer. While jina-embeddings-v3, with its 570 million parameters, is larger than jina-embeddings-v2 at 137 million, it is still much smaller than embedding models fine-tuned from LLMs.

Flowchart mapping sentiment classification. Begins with “Downstream Task: sentiment = classify” and includes stages like “Mea
The architecture of jina-embeddings-v3 is based on the jina-XLM-RoBERTa model, with five LoRA adapters for four different tasks.

The key innovation in jina-embeddings-v3 is the use of LoRA adapters. Five task-specific LoRA adapters are introduced to optimize embeddings for four tasks. The model’s input consists of two parts: the text (the long document to be embedded) and the task. jina-embeddings-v3 supports four tasks and implements five adapters to choose from: retrieval.query and retrieval.passage for query and passage embeddings in asymmetric retrieval tasks, separation for clustering tasks, classification for classification tasks, and text-matching for tasks involving semantic similarity, such as STS or symmetric retrieval. The LoRA adapters account for less than 3% of the total parameters, adding very minimal overhead to the computation.

To further enhance performance and reduce memory consumption, we integrate FlashAttention 2, support activation checkpointing, and use the DeepSpeed framework for efficient distributed training.

Get Started

Via Jina AI Search Foundation API

The easiest way to use jina-embeddings-v3 is to visit Jina AI homepage and navigate to the Search Foundation API section. Starting today, this model is set as the default for all new users. You can explore different parameters and features directly from there.

Screenshot of a dark-themed interface with options like 'Join us', 'Explore', showing 'Start instantly - no credit card or re
curl https://api.jina.ai/v1/embeddings \
	 -H "Content-Type: application/json" \
	 -H "Authorization: Bearer jina_387ced4ff3f04305ac001d5d6577e184hKPgRPGo4yMp_3NIxVsW6XTZZWNL" \
	 -d '{
	"model": "jina-embeddings-v3",
	"task": "text-matching",
	"dimensions": 1024,
	"late_chunking": true,
	"input": [
		"Organic skincare for sensitive skin with aloe vera and chamomile: ...", 
		"Bio-Hautpflege für empfindliche Haut mit Aloe Vera und Kamille: Erleben Sie die wohltuende Wirkung...", 
		"Cuidado de la piel orgánico para piel sensible con aloe vera y manzanilla: Descubre el poder ...", 
		"针对敏感肌专门设计的天然有机护肤产品:体验由芦荟和洋甘菊提取物带来的自然呵护。我们的护肤产品特别为敏感肌设计,...", 
		"新しいメイクのトレンドは鮮やかな色と革新的な技術に焦点を当てています: 今シーズンのメイクアップトレンドは、大胆な色彩と革新的な技術に注目しています。..."
    ]}'

Compared to v2, v3 introduces three new parameters in the API: task, dimensions, and late_chunking.

Parameter task

The task parameter is crucial and must be set according to the downstream task. The resulting embeddings will be optimized for that specific task. For more details, refer to the list below.

task value Task Description
retrieval.passage Embedding documents in a query-document retrieval task
retrieval.query Embedding queries in a query-document retrieval task
separation Clustering documents, visualizing a corpus
classification Text classification
text-matching (Default) Semantic text similarity, general symmetric retrieval, recommendation, finding similar items, deduplication

Note that the API does not first generate a generic meta embedding and then adapt it with an additional fine-tuned MLP. Instead, it inserts the task-specific LoRA adapter into every transformer layer (a total of 24 layers) and performs the encoding in one shot. Further details can be found in our arXiv paper.

Parameter dimensions

The dimensions parameter allows users to choose a trade-off between space efficiency and performance at the lowest cost. Thanks to the MRL technique used in jina-embeddings-v3, you can reduce the dimensions of embeddings as much as you want (even down to a single dimension!). Smaller embeddings are more storage-friendly for vector databases, and their performance cost can be estimated from the figure below.

Scatter plot titled "Performance of Different Output Dimensions" showing performance metrics across increasing MRL dimensions

Parameter late_chunking

Late Chunking in Long-Context Embedding Models
Chunking long documents while preserving contextual information is challenging. We introduce the “Late Chunking” that leverages long-context embedding models to generate contextual chunk embeddings for better retrieval applications.

Finally, the late_chunking parameter controls whether to use the new chunking method we introduced last month for encoding a batch of sentences. When set to true, our API will concatenate all sentences in the input field and feed them as a single string to the model. In other words, we treat the sentences in the input as if they originally come from the same section, paragraph, or document. Internally, the model embeds this long concatenated string and then performs late chunking, returning a list of embeddings that matches the size of the input list. Each embedding in the list is therefore conditioned on the previous embeddings.

From a user perspective, setting late_chunking does not change the input or output format. You will only notice a change in the embedding values, as they are now computed based on the entire previous context rather than independently. What's important to know when using late_chunking=True is that the total number of tokens (by summing up all tokens in input) per request is restricted to 8192, which is the maximum context length allowed for jina-embeddings-v3. When late_chunking=False, there is no such restriction; the total number of tokens is only subject to the rate limit of the Embedding API.

Via Azure & AWS

jina-embeddings-v3 is now available on AWS SageMaker and Azure Marketplace.

AWS Marketplace: Jina Embeddings v3
Microsoft Azure Marketplace

If you need to use it beyond those platforms or on-premises within your company, note that the model is licensed under CC BY-NC 4.0. For commercial usage inquiries, feel free to contact us.

Via Vector Databases & Partners

We closely collaborate with vector database providers such as Pinecone, Qdrant, and Milvus, as well as LLM orchestration frameworks like LlamaIndex, Haystack, and Dify. At the time of release, we are pleased to announce that Pinecone, Qdrant, Milvus and Haystack have already integrated support for jina-embeddings-v3, including the three new parameters: task, dimensions, and late_chunking. Other partners that have already integrated with the v2 API should also support v3 by simply changing the model name to jina-embeddings-v3. However, they may not yet support the new parameters introduced in v3.

Via Pinecone

The vector database to build knowledgeable AI | Pinecone
Search through billions of items for similar matches to any object, in milliseconds. It’s the next generation of search, an API call away.

Via Qdrant

Jina Embeddings - Qdrant
Qdrant is an Open-Source Vector Database and Vector Search Engine written in Rust. It provides fast and scalable vector similarity search service with convenient API.

Via Milvus

Integrate Milvus with Jina | Milvus Documentation
This guide demonstrates how to use Jina embeddings and Milvus to conduct similarity search and retrieval tasks. | v2.4.x

Via Haystack

Jina AI | Haystack
Use the latest Jina AI embedding models

Conclusion

In October 2022, we released jina-embeddings-v2-base-en, the world’s first open-source embedding model with an 8K context length. It was the only text embedding model that supported long context and matched OpenAI's text-embedding-ada-002. Today, after a year of learning, experimentation, and valuable lessons, we are proud to release jina-embeddings-v3—a new frontier in text embedding models and a big milestone of our company.

With this release, we continue to excel in what we are known for: long-context embeddings, while also addressing the most requested feature from both the industry and the community—multilingual embeddings. At the same time, we push performance to a new high. With new features such as Task-specific LoRA, MRL, and late chunking, we believe jina-embeddings-v3 will truly serve as the foundational embedding model for various applications, including RAG, agents, and more. Compared to recent LLM-based embeddings like NV-embed-v1/v2, our model is highly parameter-efficient, making it much more suitable for production and edge devices.

Moving forward, we plan to focus on evaluating and improving jina-embeddings-v3 performance on low-resource languages and further analyzing systematic failures caused by limited data availability. Moreover, the model weights of jina-embeddings-v3, along with its innovative features and hot takes, will serve as the foundation for our upcoming models, including jina-clip-v2, jina-reranker-v3, and reader-lm-v2.