PGSA: Pretty good sentiment analysis

For dreams and other first person narrative texts.

Select a text document or enter text

Reload the file's text

Do this after you run a test to get back the original text.

Save text as file

Click to score the text by sentence

This is the core function of this software.

Click to score the text by paragraph

Most of the following functions are for testing and debugging the valence model.

Test pre-scored lines

Use /1 for positive valence, /-1 for negative and /0 for neutral.

Scores a random sample of 20 lines

Histogram shows valence statistics

Regular expression search

Removes scores and tags

Clear the text

Loads some test cases

Show the tag table

Tags

There are also a number of "hidden" tags that are used by the context and tag rules to identify pronouns, auxialary verbs, determiners, etc.

[valence score]

Debug: shows scored sentences

Generate scored lines

Tempoarily tag a word: red(--)

Settings

Hamburger open the settings.

Negated strong negative = neutral/weak positive

If neutral "They decided not to kill me" scores as 0. If positive, scores as slightly positive. "Not bad" scores as slightly positive with either setting.

Binary score

score less that 0 is -1, score greater than 0 is +1.

Score without modifiers

If without "might be a little angry" scores -2. With modifiers scores as -0.67.

Exceptions list

If off;
"It was like a car" scores positive.
"I Took a right turn" scores positive.
"That smells good" scores negative.
"Oh no, not again" scores neutral.
(And many more)

How it works

  1. The text is split into sentences.
  2. Each sentence is split into words.
  3. The words are tagged according to a large dictionary of word:tag pairs.
  4. Context rules are applied to a list of ambigious words; "like", "kind", "well", "mind", "pretty", "great", "fun", etc.
  5. Tagging rules are applied to the resulting tag strings, producing one or more 'tag tuples" for each sentence.
  6. These rules decide the scope of tags which modify the valence of other tags. Most importantly: negation.
  7. The tag tuples are scored and summed to give a score for each sentence.

Preparing the text

  1. Try to identify titles and names. Glom titles together with underscores "Red_Badge_of_Courage" so "courage" won't match positive valence.
  2. Names get an underscore at the end. Later get tagged as "PRN".
  3. sentences = "ends with [.?!:]"
  4. But first pre-process the text to remove periods after abbreviations Mr. Ave. Dr., etc.
  5. And in numbers 12.7, etc.
  6. Make everything lower case. The lexicon is lowercase.
  7. Remove curly quotes and other non-ASCI characters.
  8. Also do some manipulations so that simley faces are not lost. :)
  9. Change forms like "isnt" to "is n't".
  10. Change "can't" to "can n't".
  11. words = [a-z0-9_-']+
  12. No stemming is used.

The lexicon

A list of word:tag pairs. Tag every word with its default tag. (_) if it has no tag. Supplement the lexicon for hyphenated forms that are like valanced forms. "looooove" === "love"
(--) sucker-like
(+) nice-looking
(-) spider-like
(+) flawlessly-some
(-) strange-looking
(-) over-protective
(-) sinister-looking
(-) sword-fighting
(-) molester-man
(-) alone-like
(--) witch-like
(-) dream-nightmare
(-) mis-spelled
(--) vampire-like
(-) dead-looking
(-) escape-like
(-) hideous-looking
(+) streamlined-looking
(-) formidable-looking
(+) gaily-colored

(-)  owww
(-)  nooooooooooo
(-)  aaughh
(-)  aaaugh
(+)  yummm
(+)  yummmm

Apply context rules

Something like the Brill parts of speech tagger.
  1. Paralelle words and tags arrays.
  2. Running window of plus or minus 2 words. Though we can look back of ahead through the whole sentence to see if certain tags or words occur.
  3. Checks a list of ~400 exception words to see if their default tag should be changed.
  4. Makes use of lists of words or tags that preceed or follow a word when it is or isn't valanced.
Rule for "like": "like" is initially tagged "POS" (positive)
  case "like":
    if (prevword in aux || pprevword in aux || prevword in not_before_like) {
      et = "___";
    } else
    if (pprevword in comes) {
      et = "___";
    } else
    if ((prevword === "rather" || prevword === "quite") && !(prevwords.slice(-4).contains(Object.keys(pron)))) {
      et = "___";
    } else
    if (!(prevword === "to" || prevword in person_mods || prevword in comparitors || prevtag in NOT_INC_INW || prevword in does) || pprevword in aux) {
      et = "___"; // I did like the cake. fails
    }
    if (prevtag === "MOD" || prevword === "really" || (prevword === "you" && !(pprevword === "of")) || prevword === "who")
      et = "POS";
    if ((prevtag === "CNJ" || prevtag === "CNA") && (nextword in and_like) && !(nnextword in aux || nnexttag === "MOD"))
      et = "POS";
    // ... and I like grabbed on to a ..., ... and I was like going to turn the ...
    if ((pprevtag === "CNJ"  || pprevtag === "CNA") && prevword in def_pron)
      et = "___";
  break;
Outcome is an array of tags.

Process the tags

Tag rules

Some results

Ran the histograph search on a set of dreams from a Vietnam veteran with PTSD. (source = DreamBank) The first 98 dream are dated from 1970 to 2005. The final 32 dreams are from 2015. Results (below) show increasingly positive score from top to bottom.
running score per 4 paragraphs
                               0
          |                    | (-20.58)
             |                 |
        |                      |
                      |        |
                           |   |
            |                  | (-18.58)
                        |      |
           |                   |
                   |           |
            |                  |
                    |          | (-10.63)
                      |        |
                       |       |
                |              |
                       |       |
                      |        | (-8.88)
                      |        |
                     |         |
                          |    |
                       |       |
               |               | (-15.89)
                      |        |
                  |            |
                     |         |
                          |    |
                         |     | (-5.6)
                         |     |
                             | |
                               |
                               |
                              || (-0.33)
Another test: The Dreamboard nightmares (source = KB) shows 53% of sentences classified as negative. Sentence scores
positive: 161  (7%)
negative: 1196 (53%)
neutral: 885
Paragraph scores
positive: 13  (3%)
negative: 450 (95%)
neutral: 11
Compare to a baseline set of dreams showing 36% negative. Sentence scores
positive: 5985  (14%)
negative: 15350 (36%)
neutral: 21859
Paragraph scores
positive: 961  (9%)
negative: 3322 (31%)
neutral: 6584