Skip to content

E.24 Positivity (and Positivity z‐score)

Xinlan Emily Hu edited this page Jul 13, 2023 · 1 revision

1. Feature Name

E.24 Positivity

2. Literature Source (Serial Number, link)

E.24, https://www.cs.cmu.edu/~ylataus/files/TausczikPennebaker2013.pdf

3. Description of how the feature is computed (In Layman’s terms)

  1. Use BERT to compute a positivity score
  2. Compute the z score (of the BERT positivity score)
  3. Determine a threshold z score (We could do mean i.e the center and anything above the mean)
  4. Any sentence with a z-score above the threshold value is considered positive.

4. Algorithms used (KNN, Logistic Regression etc.)

None

5. ML Inputs/Features

None

6. Statistical concepts used

Z-score

7. Pages of the literature to be referred to for details

Refer point 6.

8. Any tweaks/changes/adaptions made from the original source

Paper E.24 and relevant citations do not contain exact details about how/ from where to get positive words. We would ideally have to label words as positive and then implement the code.

Implementation Notes

A previous version of this feature used the LIWC lexicons for positivity, also by Pennebaker.
List of Stop words - English Stopwords in NLTK Original source.

Due to greater reliability of measuring positive features, we shifted to a BERT-based approach rather than a lexical-based approach, and we use the BERT positivity scores to compute the z-scores. However, we retain the measurement of the positivity lexicons in our LIWC-based features (included as separate columns).

Clone this wiki locally