[083]Evaluation of TR system

Text Retrieval and Search Engines(3)-Evaluation

[083]Evaluation of TR system

  • 3.1 Evaluation of TR system

  • 3.2 Basic measures

  • 3.3 Evaluating a Ranked List

  • 3.4 Muti-Level Judgements

  • 3.5 Practical issues

3.1 Evaluation of TR system

  • 其中的evaluation方法叫 The Cranfield Evaluation Methodology,

Idea is to Build reusable test collections & define measures

3.2 Basic measures

  • 在這個方法下,能有兩個基本指標,

    • Precision: doc 吻合 query的特性

    • Recall : 所有相關doc中,出現了多少相關的doc

  • 並且,Combine Precision and Recall: F-Measure // 目的是把兩個basic measure做合併觀察

3.3 Evaluating a Ranked List

  • 剛剛提出兩個basic measures, Precision& Recall, 能用這兩個指標做圖,X軸是recall, y軸是precision

  • 解析圖表時,在不同的recall, 有不同的precision, 原則上是precision越高,成效越好

  • 接著,討論Mean Average Precision (MAP),

  • gMAP = geometric mean of average precision over a set of queries

3.4 Muti-Level Judgements

  • 問題:What If We Have Multi-level Relevance Judgments?

  • 意思是不同的doc排名,除了有不同的relevant分數外,不用“cumulative gain”, 而是用“Discounted Cumulative Gain”

  • 從DCG可以衍伸,Normalized Discounted Cumulative Gain

3.5 Practical issues

  • Statistical Significance Tests:

    • How sure can you be that an observed difference doesn’t simply result from the particular queries you chose?

  • 平均較高,不見得這個系統就是比較好,有可能平均中,上下極值差異很大

Last updated

Was this helpful?