[084]Probabilistic Model
2018-06-28(四)
Text Retrieval and Search Engines (4)
[084]Probabilistic Model
4.1 Probabilistic Retrieval Model
4.2 Statistical Language Model
4.3 Probabilistic Retrieval Model: Query Likelihood
4.4 Probabilistic Retrieval Model:Smoothing
4.1 Probabilistic Retrieval Model
常見的 Probabilistic models有三種
BM25
Language model-> Query Likelihood
Divergence-from-randomness model->PL2
Query Likelihood Retrieval Model:
假設:A user formulates a query based on an “imaginary relevant document”
4.2 Statistical Language Model
definition: A probability distribution over word sequences
簡化版:The Simplest Language Model: Unigram LM // 每個word是獨立計算的
目的:代表topic, 或是討論word associations
4.3 Probabilistic Retrieval Model: Query Likelihood
Unigram Query Likelihood 可能出現一個問題,那就是query中的words, 可能doc沒有,造成整個Query Likelihood = 0
改善方法:Improved Model: Sampling Words from a Doc Model
4.4 Probabilistic Retrieval Model:Smoothing
p(w|d) > 0 even if c(w, d)=0
目的:使得f (q,d)的曲線,不會出現階梯式斷層,而是平滑
Two smoothing methods
Jelinek-Mercer: Fixed coefficient linear interpolation
Dirichlet Prior: Adding pseudo counts; adaptive interpolation
四個假設
Assumption 1: Relevance(q,d) = p(R=1|q,d) ≈ p(q|d,R=1) ≈ p(q|d)
Assumption 2: Query words are generated independently
Assumption 3: Smoothing with p(w|C)
Assumption 4: JM or Dirichlet prior smoothing
Last updated
Was this helpful?