MMR Search Algorithm - Implementation and use cases

Maximal Marginal Relevance Algorithm

Introduction

The Maximal Marginal Relevance Algorithm is a foundational algorithm in information retrieval. It was first introduced in 1998 by Jamie Carboneel and Jade Goldstein in their paper "The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries." The algorithm became fundamental in modern search and recommendation systems, providing a robust method to balance relevance and diversity in ranked outputs.

Why was it developed?

Think about it like this- you have an article written by a big tech company talking about the latest advancements in LLM and Machine Learning, and it was so popular that 20 other articles cited it. Now you read the article and there are a few terms in the article that you do not understand, so you go on to search about it a bit more. But as you search, based specifically on the keywords, the simplest of information retrieval algorithms may result in all 20 articles as the top matching algorithms. Well, that does not give you the results you want or more specifically the diversity you want. To tackle this problem, MMR was made.

How does it work:

The MMR algorithm works by iteratively selecting documents that maximize the trade-off between relevance to the query and novelty with respect to previously selected documents. Mathematically, it can be expressed as:

So basically, whatever similarity metric you might be using, you would first calculate the similarity between the document and the query, and multiply it with the factor lambda which acts like a weight to decide the importance of the query's similarity with the document. We then give the remaining weight to the similarity between the previously selected documents. This helps manage the trade-off between variance and diversity.

I implemented the MMR research algorithm based on the original paper: https://www.cs.cmu.edu/~jgc/publication/The_Use_MMR_Diversity_Based_LTMIR_1998.pdf, on a small dataset of different steel companies. You can find out the codebook [here](https://github.com/Yash-Tobre/MMR-search).

The principles of MMR continue to resonate in contemporary systems:
**Diversity in Search Results**:Modern search systems rely on MMR-inspired techniques to ensure diverse viewpoints, especially in subjective or controversial topics like news or opinions.
**AI-Powered Summarization**:Tools like SummarizeBot or features in Google Docs use extractive summarization that incorporates MMR to provide coherent and non-redundant summaries.

Conclusion:

Overall, MMR built a strong foundation for a lot of search and systems that came after it - recommendation systems, summarization systems, and a plethora of systems that we use today. As we reach a rapidly developing age of Artificial Intelligence and prompt-generated videos, we all have Carbonell and Goldstein to thank for it all. They built the foundation that helped push the wheel of AI.

I hope this post was informative, reach out to me for any further questions.

‹ Cracking the Code on Hotel Cancellations: What the Data Reveals—and What Hotels Can Do About It

Data Science Case Study: Music Popularity using Spotify API ›