Skip to content

Certainty

agshruti12 edited this page Dec 6, 2023 · 2 revisions

1. Feature Name

Certainty

2. Literature Source (Serial Number, link)

Certainty Lexicon Paper

3. Description of how the feature is computed (In Layman’s terms)

Given certainty wordlist, with words/phrases and associated certainty score.

  1. Sort the items in the lexicon first by number of words, and then by number of characters (need to go from longest to shortest)
  2. For each phrase/word found in the message: assign it the weight associated in the "Certainty" column. Even if a word is repeated, "replace" all instances with the value.
  3. Calculate the average: sum all Certainty scores and divide by the number of scores. This is the final "Certainty" value. ** Note: If there are no matches between the message and the certainty lexicon, then assign a default certainty score of 4.5. As certainty scores are generated from 0 to 9, with 0 representing UNcertain and 9 representing very certain, we've assumed 4.5 to be a rough representation of a general statement lacking a certainty component.

4. Algorithms used (KNN, Logistic Regression etc.)

N/A

5. ML Inputs/Features

N/A

6. Statistical concepts used

N/A

7. Pages of the literature to be referred to for details

Certainty Lexicon Paper

8. Any tweaks/changes/adaptions made from the original source

Validation technique - ensured equivalence between TPM certainty outputs to Lexical Suite certainty outputs. Also added default score of 4.5 to "neutral" statement, i.e. those that presented zero matches with the certainty lexicon.

Clone this wiki locally