[084]Probabilistic Model

2018-06-28(四)

Text Retrieval and Search Engines (4)

[084]Probabilistic Model

  • 4.1 Probabilistic Retrieval Model

  • 4.2 Statistical Language Model

  • 4.3 Probabilistic Retrieval Model: Query Likelihood

  • 4.4 Probabilistic Retrieval Model:Smoothing

4.1 Probabilistic Retrieval Model

  • 常見的 Probabilistic models有三種

    • BM25

    • Language model-> Query Likelihood

    • Divergence-from-randomness model->PL2

  • Query Likelihood Retrieval Model:

    • 假設:A user formulates a query based on an “imaginary relevant document”

4.2 Statistical Language Model

  • definition: A probability distribution over word sequences

  • 簡化版:The Simplest Language Model: Unigram LM // 每個word是獨立計算的

  • 目的:代表topic, 或是討論word associations

4.3 Probabilistic Retrieval Model: Query Likelihood

  • Unigram Query Likelihood 可能出現一個問題,那就是query中的words, 可能doc沒有,造成整個Query Likelihood = 0

  • 改善方法:Improved Model: Sampling Words from a Doc Model

4.4 Probabilistic Retrieval Model:Smoothing

  • p(w|d) > 0 even if c(w, d)=0

  • 目的:使得f (q,d)的曲線,不會出現階梯式斷層,而是平滑

Two smoothing methods

  • Jelinek-Mercer: Fixed coefficient linear interpolation

  • Dirichlet Prior: Adding pseudo counts; adaptive interpolation

四個假設

  • Assumption 1: Relevance(q,d) = p(R=1|q,d) ≈ p(q|d,R=1) ≈ p(q|d)

  • Assumption 2: Query words are generated independently

  • Assumption 3: Smoothing with p(w|C)

  • Assumption 4: JM or Dirichlet prior smoothing

Last updated

Was this helpful?