Tech Blog - Jina AI

Jina AI

Sign in Subscribe

Tech Blog

A collection of 112 posts

Futuristic illustration with a central white circle surrounded by white dots on a dotted background.

What Should We Learn From ModernBERT?

Bigger training data, efficient parameter sizing, and a deep-but-thin architecture, ModernBERT sets a direction for future BERT-like models.

3D rendered scene with a black-screened laptop on a geometrical pedestal and patterned spheres, against a blue backdrop.

Text-Image Global Contrastive Alignment and Token-Patch Local Alignment

CLIP can visualize token-patch similarities, however, it’s more of a post-hoc interpretability trick than a robust or official "attention" from the model. Here's why.

Three abstract figures in white, gray, and pink on matching cubes placed on a colorful checkered surface against a green back

Text Embeddings Fail to Capture Word Order and How to Fix It

Text embedding models struggle with capturing subtle linguistic nuances like word order, directional relationships, temporal sequences, causal connections, comparisons, and negation. Understanding these challenges is key to improving model performance.

David Hockney artwork of a hand holding a rod with three colored spheres on a blue-toned background.

Scaling Test-Time Compute For Embedding Models

Better results scale with compute—more on learning, more on search. A good pretrained model takes you far, but test-time compute takes you further. It's time to recognize this paradigm of test-time compute, even for embedding models.

Artistic pixel art of two seagulls on colored pipes with speech bubbles; one reads "Too long?" and the other shows math equat

Still Need Chunking When Long-Context Models Can Do It All?

Comparing how long-context embedding models perform with different chunking strategies to find the optimal approach for your needs.

Two hands, each holding a key positioned to interact with each other, depicted against a deep blue background.

Watermarking Text with Embedding Models to Protect Against Content Theft

You use our embedding models to do what? This might be the most "out-of-domain" applications of embeddings we learned at EMNLP 2024.

Digital transformation icons with arrows on a teal background indicate file conversion, with contrasting blue and grey accent

Meta-Prompt for Better Jina API Integration and CodeGen

Is Meta-Prompt the new norm for API specs? Feed it to LLMs and generate integration code that reliably integrates Jina's APIs, saving you from the usual trial-and-error process.

Abstract digital landscape with wave-like green and pink dunes against a dark background, conveying a tranquil atmosphere.

Beyond CLIP: How Jina-CLIP Advances Multimodal Search

Learn how Jina-CLIP enhances OpenAI's CLIP with better retrieval accuracy and more diverse results through unified text-image embeddings.

A pattern of yellow file icons on a blue background with one icon displaying a smiley face creating an emotive contrast.

Finding Optimal Breakpoints in Long Documents Using Small Language Models

We trained three small language models to better segment long documents into chunks, and here are the key lessons we learned.

Neon green squares form intricate patterns on a black digital background, creating a dynamic, abstract design.

Bridging Language Gaps in Multilingual Embeddings via Contrastive Learning

Multilingual models often face a "language gap," where similar phrases in different languages don't align. We show how contrastive learning can bridge this gap, enhancing cross-language performance.

Slide depicting the "Late Chunking" process, with flow charts and a model highlighting the transition from a "Long Document"

What Late Chunking Really Is & What It’s Not: Part II

Part 2 of our exploration of Late Chunking, a deep dive into why it is the best method for chunk embeddings and improving search/RAG performance.

A digital upgrade theme with "V3" and a white "2", set against a green and black binary code background, with "Upgrade" centr

Migration From Jina Embeddings v2 to v3

We collected some tips to help you migrate from Jina Embeddings v2 to v3.

Futuristic black image with "modality gap" in 3D purple letters, additional text, and a dynamic glass sphere effect.

The What and Why of Text-Image Modality Gap in CLIP Models

You can't just use a CLIP model to retrieve text and images and sort the results by score. Why? Because of the modality gap. What is it, and where does it come from?

Diagram illustrating the 'Late Chunking' and 'Long Document Model' processes in machine learning on a black background.

Tech Blog Featured

Late Chunking in Long-Context Embedding Models

Chunking long documents while preserving contextual information is challenging. We introduce the "Late Chunking" that leverages long-context embedding models to generate contextual chunk embeddings for better retrieval applications.

Black background with a green pixel dinosaur in the center, surrounded by green and yellow text related to data labeling.

Rephrased Labels Improve Zero-Shot Text Classification by 30%

When using embedding models for zero-shot classification, rephrasing the class label to "This is seriously about 'LABEL'" gives higher accuracy vs. using LABEL alone. But how, and why?

Rows of numbered wooden pieces on a white background, ranging from single digits to high numbers.

Can Embedding/Reranker Models Compare Numbers?

A lot of LLMs can't figure out that 9.11 is actually smaller than 9.9. Can our embedding and reranker models do any better?

Abstract background with dynamic green particles and lines on a black backdrop, emitting a sense of motion and energy.

No. You Can't Use Reranker to Improve SEO

But if you work in SEO, it could be interesting to see things from the other side of the table; understand how embeddings and rerankers play their roles in modern search systems.

Colorful geometric art with blue, purple, and green shapes against a dark background, creating a vibrant, abstract compositio

Handcrafting Image Prompts Is Dead: Reverse Engineer Midjourney-style Images with PromptPerfect

From Punk Einstein to Turbo Pigeons: Use PromptPerfect Interactive to reverse engineer prompts from pictures and generate Midjourney-style images with real-time feedback.

Digital representation of a golden building seen through a blue and yellow mesh pattern, evoking a technological vibe.

AI Explainability Made Easy: How Late Interaction Makes Jina-ColBERT Transparent

AI explainability and transparency are hot topics. How can we trust AI if we can't see how it works? Jina-ColBERT shows you how, with the right model architecture, you can easily make your AI spill its secrets.

Black background with vivid geometric shapes on the sides and central logos "Embeddings," "Reranker," and "Milvus."

Implementing a Chat History RAG with Jina AI and Milvus Lite

Enhance your search applications in Python with Jina Embeddings and Reranker and lightweight, easy-to-deploy Milvus Lite.

Colorful digital chain graphic with vibrant bricks against a black background, conveying energy and connectivity.

Bypass Limitations with PromptPerfect: Generate the Images the Models Don’t Want You to See

See how PromptPerfect overcomes restrictions and limitations of image generation models like Stable Diffusion XL and DALL-E 3.

Abstract design with geometric shapes, white clouds, and colorful gradients on a black background, suggesting a futuristic am

AIR-Bench: Better Metrics for Better Search Foundation

AIR-Bench is a new approach to AI metrics that uses generative AI to make more realistic and flexible benchmarks. With AIR-Bench, you can create your own benchmarks for your own domain, and know that benchmarks data hasn't leaked into model training data.

Futuristic digital 3D model of a coffee grinder with blue neon lights on a black background, featuring numerical data.

Binary Embeddings: All the AI, 3.125% of the Fat

32-bits is a lot of precision for something as robust and inexact as an AI model. So we got rid of 31 of them! Binary embeddings are smaller, faster and highly performant.

Albus by Springworks: Empowering Employees with Enterprise Search

Learn how a leading HR-tech startup uses Jina AI’s models to talk with structured and unstructured data.

Neon-lit 3D microphone on black background with a white 'P' and arrow pointing right, amidst a green and blue glow.

Create Your Personalized Podcast With Jina Reader and PromptPerfect

Use Jina Reader and PromptPerfect to generate your custom news podcast with RSS feeds, article extraction, LLMs, and Text-to-Speech.