[083]Evaluation of TR system
Text Retrieval and Search Engines(3)-Evaluation
[083]Evaluation of TR system
3.1 Evaluation of TR system
3.2 Basic measures
3.3 Evaluating a Ranked List
3.4 Muti-Level Judgements
3.5 Practical issues
3.1 Evaluation of TR system
其中的evaluation方法叫 The Cranfield Evaluation Methodology,
Idea is to Build reusable test collections & define measures
3.2 Basic measures
在這個方法下,能有兩個基本指標,
Precision: doc 吻合 query的特性
Recall : 所有相關doc中,出現了多少相關的doc
並且,Combine Precision and Recall: F-Measure // 目的是把兩個basic measure做合併觀察
3.3 Evaluating a Ranked List
剛剛提出兩個basic measures, Precision& Recall, 能用這兩個指標做圖,X軸是recall, y軸是precision
解析圖表時,在不同的recall, 有不同的precision, 原則上是precision越高,成效越好
接著,討論Mean Average Precision (MAP),
gMAP = geometric mean of average precision over a set of queries
3.4 Muti-Level Judgements
問題:What If We Have Multi-level Relevance Judgments?
意思是不同的doc排名,除了有不同的relevant分數外,不用“cumulative gain”, 而是用“Discounted Cumulative Gain”
從DCG可以衍伸,Normalized Discounted Cumulative Gain
3.5 Practical issues
Statistical Significance Tests:
How sure can you be that an observed difference doesn’t simply result from the particular queries you chose?
平均較高,不見得這個系統就是比較好,有可能平均中,上下極值差異很大
Last updated
Was this helpful?