Abstract: A number of deep neural networks have beenproposed to improve the performance of documentranking in information retrieval studies. However,the training processes of these models usually needa large scale of labeled data, leading to data shortagebecoming a major hindrance to the improvement ofneural ranking models’ performances. Recently, severalweakly supervised methods have been proposed toaddress this challenge with the help of heuristics or users’interaction in the Search Engine Result Pages (SERPs)to generate weak relevance labels. In this work, weadopt two kinds of weakly supervised relevance, BM25-based relevance and click model-based relevance, andmake a deep investigation into their differences in thetraining of neural ranking models. Experimental resultsshow that BM25-based relevance helps models capturemore exact matching signals, while click model-basedrelevance enhances the rankings of documents that maybe preferred by users. We further proposed a cascaderanking framework to combine the two weakly supervisedrelevance, which significantly promotes the rankingperformance of neural ranking models and outperformsthe best result in the last NTCIR-13 We Want Web (WWW)task. This work reveals the potential of constructing betterdocument retrieval systems based on multiple kinds of weak relevance signals.
Keywords: document ranking, ad hoc retrieval, neuralranking model, weak supervision.