Hybrid Retrieval with BM42


In this notebook, we will see how to create Hybrid Retrieval pipelines, combining BM42 (a new Sparse embedding Retrieval approach) and Dense embedding Retrieval.

We will use the Qdrant Document Store and Fastembed Embedders.

⚠️ Recent evaluations have raised questions about the validity of BM42. Future developments may address these concerns. Please keep this in mind while reviewing the content.

Why BM42?

Qdrant introduced BM42, an algorithm designed to replace BM25 in hybrid RAG pipelines (dense + sparse retrieval).

They found that BM25, while relevant for a long time, has some limitations in common RAG scenarios.

Let’s first take a look at BM25 and SPLADE to understand the motivation and the inspiration for BM42.

BM25 \begin{equation} \text{score}(D,Q) = \sum_{i=1}^{N} \text{IDF}(q_i) \times \frac{f(q_i, D) \cdot (k_1 + 1)}{f(q_i, D) + k_1 \cdot \left(1 - b + b \cdot \frac{|D|}{\text{avgdl}}\right)}
\end{equation}

BM25 is an evolution of TF-IDF and has two components:

  • Inverse Document Frequency = term importance within a collection
  • a component incorporating Term Frequency = term importance within a document

Qdrant folks observed that the TF component relies on document statistics, which only makes sense for longer texts. This is not the case with common RAG pipelines, where documents are short.

SPLADE

Another interesting approach is SPLADE, which uses a BERT-based model to create a bag-of-words representation of the text. While it generally performs better than BM25, it has some drawbacks:

  • tokenization issues with out-of-vocabulary words
  • adaptation to new domains requires fine-tuning
  • computationally heavy

For using SPLADE with Haystack, see this notebook.

BM42

\begin{equation} \text{score}(D,Q) = \sum_{i=1}^{N} \text{IDF}(q_i) \times \text{Attention}(\text{CLS}, q_i) \end{equation}

Taking inspiration from SPLADE, the Qdrant team developed BM42 to improve BM25.

IDF works well, so they kept it.

But how to quantify term importance within a document?

The attention matrix of Transformer models comes to our aid: we can the use attention row for the [CLS] token!

To fix tokenization issues, BM42 merges subwords and sums their attention weights.

In their implementation, Qdrant team used all-MiniLM-L6-v2 model, but this technique can work with any Transformer, no fine-tuning needed.

⚠️ Recent evaluations have raised questions about the validity of BM42. Future developments may address these concerns. Please keep this in mind while reviewing the content.

Install dependencies

!pip install -U fastembed-haystack qdrant-haystack wikipedia transformers

Hybrid Retrieval

Indexing

Create a Qdrant Document Store

from haystack_integrations.document_stores.qdrant import QdrantDocumentStore

document_store = QdrantDocumentStore(
    ":memory:",
    recreate_index=True,
    embedding_dim=384,
    return_embedding=True,
    use_sparse_embeddings=True,  # set this parameter to True, otherwise the collection schema won't allow to store sparse vectors
    sparse_idf=True  # required for BM42, allows streaming updates of the sparse embeddings while keeping the IDF calculation up-to-date
)

Download Wikipedia pages and create raw documents

We download a few Wikipedia pages about animals and create Haystack documents from them.

nice_animals= ["Capybara", "Dolphin", "Orca", "Walrus"]

import wikipedia
from haystack.dataclasses import Document

raw_docs=[]
for title in nice_animals:
    page = wikipedia.page(title=title, auto_suggest=False)
    doc = Document(content=page.content, meta={"title": page.title, "url":page.url})
    raw_docs.append(doc)

Indexing pipeline

Our indexing pipeline includes both a Sparse Document Embedder (based on BM42) and a Dense Document Embedder.

from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy
from haystack import Pipeline
from haystack_integrations.components.embedders.fastembed import FastembedSparseDocumentEmbedder, FastembedDocumentEmbedder
hybrid_indexing = Pipeline()
hybrid_indexing.add_component("cleaner", DocumentCleaner())
hybrid_indexing.add_component("splitter", DocumentSplitter(split_by='sentence', split_length=4))
hybrid_indexing.add_component("sparse_doc_embedder", FastembedSparseDocumentEmbedder(model="Qdrant/bm42-all-minilm-l6-v2-attentions", meta_fields_to_embed=["title"]))
hybrid_indexing.add_component("dense_doc_embedder", FastembedDocumentEmbedder(model="BAAI/bge-small-en-v1.5", meta_fields_to_embed=["title"]))
hybrid_indexing.add_component("writer", DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE))

hybrid_indexing.connect("cleaner", "splitter")
hybrid_indexing.connect("splitter", "sparse_doc_embedder")
hybrid_indexing.connect("sparse_doc_embedder", "dense_doc_embedder")
hybrid_indexing.connect("dense_doc_embedder", "writer")
<haystack.core.pipeline.pipeline.Pipeline object at 0x7fb6bc33a2f0>
πŸš… Components
  - cleaner: DocumentCleaner
  - splitter: DocumentSplitter
  - sparse_doc_embedder: FastembedSparseDocumentEmbedder
  - dense_doc_embedder: FastembedDocumentEmbedder
  - writer: DocumentWriter
πŸ›€οΈ Connections
  - cleaner.documents -> splitter.documents (List[Document])
  - splitter.documents -> sparse_doc_embedder.documents (List[Document])
  - sparse_doc_embedder.documents -> dense_doc_embedder.documents (List[Document])
  - dense_doc_embedder.documents -> writer.documents (List[Document])

Let’s index our documents!

⚠️ If you are running this notebook on Google Colab, please note that Google Colab only provides 2 CPU cores, so the embedding generation with Fastembed could be not as fast as it can be on a standard machine.

hybrid_indexing.run({"documents":raw_docs})
Calculating sparse embeddings: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 340/340 [00:27<00:00, 12.52it/s]
Calculating embeddings: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 340/340 [01:23<00:00,  4.07it/s]
400it [00:00, 1179.66it/s]                         





{'writer': {'documents_written': 340}}
document_store.count_documents()
340

Retrieval

Retrieval pipeline

As already mentioned, BM42 is designed to perform best in Hybrid Retrieval (and Hybrid RAG) pipelines.

  • FastembedSparseTextEmbedder: transforms the query into a sparse embedding
  • FastembedTextEmbedder: transforms the query into a dense embedding
  • QdrantHybridRetriever: looks for relevant documents, based on the similarity of both the embeddings

Qdrant Hybrid Retriever compares dense and sparse query and document embeddings and retrieves the most relevant documents, merging the scores with Reciprocal Rank Fusion.

If you want to customize the fusion behavior more, see Hybrid Retrieval Pipelines ( tutorial).

from haystack_integrations.components.retrievers.qdrant import QdrantHybridRetriever
from haystack_integrations.components.embedders.fastembed import FastembedTextEmbedder, FastembedSparseTextEmbedder


hybrid_query = Pipeline()
hybrid_query.add_component("sparse_text_embedder", FastembedSparseTextEmbedder(model="Qdrant/bm42-all-minilm-l6-v2-attentions"))
hybrid_query.add_component("dense_text_embedder", FastembedTextEmbedder(model="BAAI/bge-small-en-v1.5", prefix="Represent this sentence for searching relevant passages: "))
hybrid_query.add_component("retriever", QdrantHybridRetriever(document_store=document_store, top_k=5))

hybrid_query.connect("sparse_text_embedder.sparse_embedding", "retriever.query_sparse_embedding")
hybrid_query.connect("dense_text_embedder.embedding", "retriever.query_embedding")
<haystack.core.pipeline.pipeline.Pipeline object at 0x7fb6bc33ae30>
πŸš… Components
  - sparse_text_embedder: FastembedSparseTextEmbedder
  - dense_text_embedder: FastembedTextEmbedder
  - retriever: QdrantHybridRetriever
πŸ›€οΈ Connections
  - sparse_text_embedder.sparse_embedding -> retriever.query_sparse_embedding (SparseEmbedding)
  - dense_text_embedder.embedding -> retriever.query_embedding (List[float])

Try the retrieval pipeline

question = "Who eats fish?"

results = hybrid_query.run(
    {"dense_text_embedder": {"text": question},
     "sparse_text_embedder": {"text": question}}
)
Calculating sparse embeddings: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 82.10it/s]
Calculating embeddings: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  7.75it/s]
import rich

for d in results['retriever']['documents']:
  rich.print(f"\nid: {d.id}\n{d.meta['title']}\n{d.content}\nscore: {d.score}\n---")
id: 370071638e221257cf77702716695626d9b1b4dfe4212b4a10e255434bfeb08b
Orca
 Some populations in the Norwegian and Greenland sea specialize in herring and follow that fish's autumnal 
migration to the Norwegian coast. Salmon account for 96% of northeast Pacific residents' diet, including 65% of 
large, fatty Chinook. Chum salmon are also eaten, but smaller sockeye and pink salmon are not a significant food 
item. Depletion of specific prey species in an area is, therefore, cause for concern for local populations, despite
the high diversity of prey.
score: 0.5
---
id: 1ed8f49561630f10202b55c8c7619a32cd9f6a11675cbb56c64a578826e488ef
Orca
; Ellis, Graeme M. (2006). "Selective foraging by fish-eating killer whales Orcinus orca in British Columbia". 
Marine Ecology Progress Series.
score: 0.5
---
id: a9bb77dac4747c4fba48a7464038c9da206d7e3663d837f2c95f6d882de8111e
Orca
 On average, an orca eats 227 kilograms (500 lb) each day. While salmon are usually hunted by an individual whale 
or a small group, herring are often caught using carousel feeding: the orcas force the herring into a tight ball by
releasing bursts of bubbles or flashing their white undersides. They then slap the ball with their tail flukes, 
stunning or killing up to 15 fish at a time, then eating them one by one. Carousel feeding has been documented only
in the Norwegian orca population, as well as some oceanic dolphin species.
score: 0.41666666666666663
---
id: 33fdef8b4f33f4c5ce00cbbc9e3cb3605b778131854436d4bb7e54f5adaf79ae
Dolphin
 === Consumption === ==== Cuisine ==== In some parts of the world, such as Taiji, Japan and the Faroe Islands, 
dolphins are traditionally considered as food, and are killed in harpoon or drive hunts.
Dolphin meat is consumed in a small number of countries worldwide, which include Japan and Peru (where it is 
referred to as chancho marino, or "sea pork"). While Japan may be the best-known and most controversial example, 
only a very small minority of the population has ever sampled it.
Dolphin meat is dense and such a dark shade of red as to appear black.
score: 0.3333333333333333
---
id: 6b643c8aa3d47fc198063f8bbc98828bd1d2368d22c95b6b97c36beb60b7fbd0
Orca
" Although large variation in the ecological distinctiveness of different orca groups complicate simple 
differentiation into types, research off the west coast of North America has identified fish-eating "residents", 
mammal-eating "transients" and "offshores". Other populations have not been as well studied, although specialized 
fish and mammal eating orcas have been distinguished elsewhere. Mammal-eating orcas in different regions were long 
thought likely to be closely related, but genetic testing has refuted this hypothesis. A 2024 study supported the 
elevation of Eastern North American resident and transient orcas as distinct species, O.
score: 0.3333333333333333
---
question = "capybara social behavior"

results = hybrid_query.run(
    {"dense_text_embedder": {"text": question},
     "sparse_text_embedder": {"text": question}}
)
Calculating sparse embeddings: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00, 71.98it/s]
Calculating embeddings: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:00<00:00,  8.90it/s]
import rich

for d in results['retriever']['documents']:
  rich.print(f"\nid: {d.id}\n{d.meta['title']}\n{d.content}\nscore: {d.score}\n---")
id: d35c090ebfdad52eb882915b0ee2a9578c751a243ecef3a2c941ef0713a7c9aa
Capybara
 The capybara inhabits savannas and dense forests, and lives near bodies of water. It is a highly social species 
and can be found in groups as large as 100 individuals, but usually live in groups of 10–20 individuals. The 
capybara is hunted for its meat and hide and also for grease from its thick fatty skin. == Etymology ==
Its common name is derived from Tupi ka'apiΓ»ara, a complex agglutination of kaΓ‘ (leaf) + pΓ­i (slender) + ΓΊ (eat) + 
ara (a suffix for agent nouns), meaning "one who eats slender leaves", or "grass-eater".
score: 0.7
---
id: e1b0dcc9a1d01481052af5964616f438073f201ffaa0605282a7ddaf90fcafaf
Capybara
 Males establish social bonds, dominance, or general group consensus. They can make dog-like barks when threatened 
or when females are herding young.
Capybaras have two types of scent glands: a morrillo, located on the snout, and anal glands. Both sexes have these 
glands, but males have much larger morrillos and use their anal glands more frequently.
score: 0.6666666666666666
---
id: fd11addea30e8ae2f1d60274beae4d42646b075eb0579bef1c2899cde1e1bb2b
Capybara
1.31.0.1.
score: 0.5
---
id: 1600c15a21aa722965ef2cc4fab4e622474fc8d1ff9e0c555c955e78b038ee2d
Capybara
 In addition, a female alerts males she is in estrus by whistling through her nose. During mating, the female has 
the advantage and mating choice. Capybaras mate only in water, and if a female does not want to mate with a certain
male, she either submerges or leaves the water. Dominant males are highly protective of the females, but they 
usually cannot prevent some of the subordinates from copulating.
score: 0.25
---
id: 994f31c23e46c16744558b3a499cff0c446da33661a74bb2ddaede9e26e64e11
Capybara
40 ft) in length, stand 50 to 62 cm (20 to 24 in) tall at the withers, and typically weigh 35 to 66 kg (77 to 146 
lb), with an average in the Venezuelan llanos of 48.9 kg (108 lb). Females are slightly heavier than males. The top
recorded weights are 91 kg (201 lb) for a wild female from Brazil and 73.
score: 0.25
---

πŸ“š Resources

(Notebook by Stefano Fiorucci)