Search
Evaluation
Datasets
Offline
- An IR-based Evaluation Framework for Web Search Query Segmentation
- Recall, Robustness, and Lexicographic Evaluation
- Are We There Yet? A Decision Framework for Replacing Term Based Retrieval with Dense Retrieval Systems
- Grid-based Evaluation Metrics for Web Image Search
- Less if More: Introduced K-call @ N metric - PDF
- Effective Query Triage
- Quality rater and algorithmic evaluation systems: Are major changes coming?
- The crowd is made of people: Observations from large-scale crowd labelling
Bias
- What is Presentation Bias in search? - PDF
- An Experimental Comparison of Click Position-Bias Models - Slides
- Beyond Position Bias: Examining Result Attractiveness as a Source of Presentation Bias in Clickthrough Data
- Left Digit Bias - More than a Penny's Worth: Left-Digit Bias and Firm Pricing - Left-Digit Bias at Lyft - The Left-Digit Bias: When and Why Are Consumers Penny Wise and Pound Foolish?
- Improving Retrievability in Search with Query Generation
- Recent Advances in the Foundations and Applications of Unbiased Learning to Rank
Online
- A/B Testing for Search is Different
- Building a Data Driven Search Program with James Rubinstein of LexisNexis
- NDCG is overrated
- Meaningful metrics: How data sharpened the focus of product teams
Sparse Search
Papers/Slides/Docs
- Improvements to BM25 and Language Models Examined
- Learning to rank search results - Rank Fusion - Course
- The Probabilistic Relevance Framework: BM25 and Beyond
Tools
- Rank-BM25: A two line search engine
- ClickModels: A Click Model is a probabilistic graphical model used to predict search engine click data from past observations - ClickModelsWC: Extra Models - Tutorial and CheatSheet - Book with many probabilistic click models described
- Unbiased Learning to Rank Algorithms (ULTRA)
- Explore IR Measures
Query Understanding
Query Embeddings
Query Intent
- Semantic query parsing blueprint
- Understanding Natural Language Understanding - PDF
- Awesome Intent Analysis
- Using AI to Understand Search Intent
- Deconstructing E-Commerce Search UX: The 8 Most Common Search Query Types (42% of Sites Have Issues)
- Personalizing natural-language understanding using multi-armed bandits and implicit feedback
Query Refinement
- Entity-Centric Query Refinement
- Intent term selection and refinement in e-commerce queries
- Distant-Supervised Slot-Filling for E-Commerce Queries
- Advancing query rewriting in e-commerce via shopping intent learning
Query Tagging
- A User-Centered Concept Mining System for Query and Document Understanding at Tencent
- Knowledge-Rich Self-Supervision for Biomedical Entity Linking
- Unsupervised query segmentation using generative language models and wikipedia
- Concepts Identification from Queries and Its Application for Search
- Towards a simplified ontology for better e-commerce search - PDF - Automated Entity Tagging in Search Queries
- Semantic query parsing blueprint
Query Entity Linking
Session Based Signals
- Deep Learning Powered In-Session Contextual Ranking using Clickthrough Data
- SIGIR 2022 Tutorial: Sequential/Session-based Recommendations: Challenges, Approaches, Applications and Opportunities PDF
- Session-based Recommender Systems
- User Persona Identification and New Service Adaptation Recommendation - Introduces SessionBERT
Retrieval
- In Search of Recall
- Balance Your Search Budget!
- ProcSim: Proxy-based confidence for robust similarity learning
- Divide and Conquer: Towards Better Embedding-based Retrieval for Recommender Systems From a Multi-task Perspective
Ranking
- Haystack EU 2023 - Philipp Krenn: Reciprocal Rank Fusion (RRF) - How to Stop Worrying about Boosting
- From structured search to learning-to-rank-and-retrieve
- Learning to Rank for Information Retrieval
- Ranking Relevance in Yahoo Search
- Improving Web Search Ranking by Incorporating User Behavior - Microsoft
- How LambdaMART works - optimizing product ranking goals
- Istella Learning to Rank (LETOR) dataset
- Tiangong-ULTR (Unbiased Learning To Rank) dataset using search logs of Sogou.com
- A Large Scale Search Dataset from Baidu Search Engine
- Introduction to Learning to Rank - Very good summary of Learning to Rank methods
Rank Fusion
Ranking Architectures
Ranking for value
Query Suggestion
- Graph Learning for Exploratory Query Suggestions in an Instant Search System
- 9 UX Best Practice Design Patterns for Autocomplete Suggestions (Only 19% Get Everything Right)
- A Dataset for Evaluating Query Suggestion Algorithms in Information Retrieval
Sponsored Search
Result Presentation
Whole Page Ranking
- Beyond Ranking: Optimizing Whole-Page Presentation
- Whole Page Unbiased Learning to Rank from Baidu
- Whole page optimization with local and global constraints - Amazon
- The Whole-Page Optimization via Dynamic Ad Allocation - Alibaba
- GenSERP: Large Language Models for Whole Page Presentation - Microsoft
- Automate page layout optimization: An offline deep Q-learning approach - Amazon
- A Real-Time Whole Page Personalization Framework for E-Commerce - Walmart
- Page-level Optimization of e-Commerce Item Recommendations - eBay
Diversification
- Practical Diversified Recommendations on YouTube with Determinantal Point Processes
- Diversity in Search
- Result Diversification in Search and Recommendation: A Survey - Very good paper also helpful for query intent understanding
- Similarity-Sensitive Diversity
- Mixing and Matching - Part 4 - Andreas Wagner - Learning to Diversify Search Results - Shows diversification at the level of list results as opposed to search results.
- Measuring and Optimizing Findability in e-commerce Search (MICES 2019)
Faceted Search
- Relevant facets: how to select and promote facets with deep learning
- Faceted Search: An Overview
- Implementing faceted search with dynamic faceting
- Faceted search, where every JSON attribute counts
- Consider Promoting Important Filters (61% Don’t)
Recsys Inspiration
Explainability
Behaviour Driven Presentation
Personalization
- Personalized retrieval over millions of items
- Encoding History with Context-aware Representation Learning for Personalized Search
Online Learning
Recommendations
LLM
- LLM 4 IR Survey
- Recent Advances in RetrievalAugmented Text Generation
- Generative Information Retrieval (slides) - Marc Najork (2023) archive.org
Visual Search
- Seven Tips for Visual Search at Scale - Paper
- Searching for products in virtual reality: Understanding the impact of context and result presentation on user experience
Query Advertising
- Insights from Amazon shopping queries
- Consumption vs. de-consumption: 2 different motivations shaping shopping behavior
Conversational Search
Engineering
Domains
E-commerce
- Faster E-commerce Search
- The 2018 SIGIR Workshop On eCommerce
- Analyzing and Predicting Purchase Intent in E-commerce: Anonymous vs. Identified Customers
- Improving Deep Learning For Airbnb Search
- The Architecture of eBay Search - SIGIR 2017
- How to sell on Amazon: a guide for beginners - Shows all variables used for listing items on Amazon which help with search
- Shopify's Standard Product Taxonomy - page
- Image Classification with Taxonomy Mapping
- Amazon Search Mission and Query Understanding
- Semantic Web Challenge: Mining the Web of HTML-embedded Product Data
- MAVE: A Product Dataset for Multi-source Attribute Value Extraction
User Interfaces
- Search Interfaces for Biomedical Searching: How do Gaze, User Perception, Search Behaviour and Search Performance Relate?
- Modeling User Behavior for Vertical Search: Images, Apps and Products
- User Behavior Insights
Publication Venues
General
- Challenges and Research Opportunities in eCommerce Search and Recommendations
- Thoughts about Managing Search Teams
- Interview Questions for Search Relevance Engineers, Data Scientists, and Product Managers
- Awesome Search
- Building a Better Search Engine for the Allen Institute for Artificial Intelligence: A “tell-all” account of improving Semantic Scholar's academic search engine
- MIx-Camp E-commerce Search
- The Anatomy of a Large-Scale Hypertextual Web Search Engine - Original Google Search Engine Paper
- KDD 2020: Hands-onTutorials: Deep Learning for Search and Recommender Systems in Practice
- Haystack 2024 - My Favorite Talks
- The SIGIR 2024 Tutorial: Robust Information Retrieval