2. building a database of music reviews ; index ; 4. discussing the results
After a considerable amount of deliberation and experimentation, I settled on the following ten statistics.
| name of statistic | description of this statistic |
|---|---|
| tot | Total number of occurrences of a particular word in the database |
| pos | Number of occurrences of the word in positive articles |
| neu | Number of occurrences of the word in neutral articles |
| neg | Number of occurrences of the word in negative articles |
| pwr | Positive Word Ratio (pos / tot) |
| nwr | Negative Word Ratio (neg / tot) |
| avg | Average rating of articles containing the word |
| art | Number of articles containing the word |
| aut | Number of authors who have used the word |
| wordscore | A metric that combines tot and avg into one number; very useful when comparing words. |
score 0 to 49 - a negative review (873 articles fall in this category)
score 50 to 74 - a neutral review (2362 articles)
score 75 to 100 - a positive review (2339 articles)
(To make things simpler in the database, I multiply all ratings by 10 to get a simple 100-point scale with no decimal points.)
make_statistics.pl
To view the words along with their statistics, I made this cgi web
frontend: words.pl
For each word, it prints a line that includes some of the statistics
associated with that word, and also provides links to view details for that word, view a concordance (KWIC), find other words that appear in the same sentence (coex), and find other words that appear in the same phrase (close). A line of output from this script looks like this:
Go to the words cgi to see the results of this script on the live database. The top line of the page is a form where you can select criteria for viewing and sorting the words; you can sort words by positivity (pwr), negativity (nwr), or total number of occurrences (tot).
Also, I made another cgi to view detailed statistics for a given word:
word.pl
You can see this cgi by clicking one of the word links on the right side of the words cgi above.
Here are some examples of the output of this cgi:
details for primal ;
violin ;
clouds
Here's an example of a few concordance lines for the word “tenor”.
The format is:
Artist: Title of album [Record label; rating nn/100] Author, Date of publication
The cgi that I wrote to generate concordances on the fly is:
I also added the very useful ability to generate a double concordance;
passing it two different words will return only the sentences where
both of those words appear. (This feature isn't exactly perfect, since
it tends to misreport the total occurrences at the top, but the actual
concordance is always correct.)
I ended up storing the names of the word categories in a table called
“class”, and words get tied to those categories via a
joining table called “word_class”. To help classify words,
I made a cgi interface called
I couldn't go through and classify all 100,000 words, so I came up with
a method for deciding which words to classify:
To classify a word, I usually read some of the concordance for that
word to figure out its meaning and context in the reviews before adding
it to a particular class. I ended up classifying 369 positive words,
139 negative words, and 254 frequently-appearing words. I created
classes as I needed them, ending up with the following categories in
the database:
Understanding word usage
To be able to tell how words are being used by the reviewers, I needed
to see the word in context. Many words can have multiple meanings
(“clouds”, “saw”), or could refer to a number
of different instruments (“tenor”, “strings”).
A list of the sentences that a particular word appears in is called a
concordance for that word, and concordances are sometimes
referred to as KWICs (Keyword In Context).
Here is the sentence where word appears. (paragraph number)
The surprises are packed tightly into every song: guitars crash in only to anticipate silence, pianos weave through minefields of modular percussion, and Knopf's pleasing tenor runs a gauntlet of fading and processing to deliver what are fundamentally very basic melodies. (p.4)
Despite it all, two discs' worth of jangly pop and McCaughan's exuberant tenor can defeat even the most steadfast listener (it really is a lot to take in at once), and as mentioned, not all of these tracks bear out the same level of quality. (p.6)
Atop this mesmeric mist hovers a soft male tenor with a female partner wading in and out of accompaniment. (p.2)
concordance.pl
You can view some concordances using these links:
concordances for clouds ;
tenor ;
accordion
Try doing a double concordance: guitar and clouds
Classifying the words
I spent a long time using my words interface to try and get a sense of what the Pitchfork critics were saying. Since my goal was to create lists of words that I could use to write some music, I knew I'd need to start classifying words into various categories at some point. edit_class.pl so I
didn't have to go through and edit the entries in the word_class table
by hand.
| name of class | description of this class | # of words in this class |
|---|---|---|
| none | this word corresponds to no class in particular (too vague or common) | 347 |
| good | positive aesthetic value judgment words | 41 |
| bad | negative aesthetic value judgment words | 51 |
| genre | references to musical genres | 30 |
| mood | the music's mood or the music's effect on the listener's mood | 47 |
| transition | descriptions of the transitions in the music | 12 |
| dynamics | descriptions of musical dynamics | 9 |
| instrument | instruments and words describing the way that certain instruments are being played | 56 |
| vocal | words referencing the vocals or vocal style | 17 |
| structure | words describing compositional structure | 34 |
| sound | words directly describing the sound of the recording or of the instruments | 15 |
| metaphor | metaphors that don't fit into other categories | 13 |
| complex | words which fall into multiple categories depending on context or are noteworthy but hard to categorize | 67 |
| consumerism | words which reference buying, advertising, and business | 9 |
| intelligence | words describing the intelligence of the artists or listeners | 12 |
To view the classes and perform operations on the words in a given
class, I made a cgi called
class.pl.
You can view the classes yourself if
you like. By default, this cgi shows you all of the classified words at
once, but you can use the dropdown menu at the top of the page to view
only positive words, only negative words, or only frequently-used
words.
Finding important words through word relationships
To complete my search for important words, I wanted to move past single
words and start looking at phrases and whole sentences. Thanks to the
last step, I now have a list of positive and frequently-used instrument
words, but what adjectives are likely to describe those instruments?
The word “guitar” is
found in the database 4885 times, but what other words is it likely
to appear with? And how do the results change if I also look at words
that appear alongside similar words like
“guitars”?
What I really needed was a cgi script that finds all occurrences of a word or group of words and then counts the words that appear in the same sentence or phrase. I called this concept coexistence, and I wrote two different scripts for this purpose:
coexistence.pl takes
a word or group of words, finds all of the sentences in the database
that contain those words, and counts up the other words in those
sentences.
View sample output from coexistence.pl.
close_coex.pl does the
same thing, except it only counts words within a variable length
"phrase" of words that you specify. (You get two blanks for this
purpose: one is the number of words before the given word that the
script should look at, and the other is the number of words after the
given word.)
View sample output from close_coex.pl.
I used these coexistence scripts a lot when I was putting together the final word lists. The scripts take a long time to run, so there was a lot of waiting involved.
Here is a table of many of the words that I found by looking at word relationships using the above scripts. These words are used both in positive and negative contexts, so I consulted their wordscores to see how each word fared in the database.
| instrument | words likely to appear in phrases with this instrument word |
|---|---|
| guitar | acoustic bass electric solo chords distorted steel lead strumming feedback noise backwards effects repetitive melodies chiming plaintive |
| bass | deep upright groove distorted throbbing thumping melodic synth rumbling heavy fuzz drone bassline |
| drums | pounding brushed steel hand programmed crashing distorted crisp rolling booming tinny |
| vocals | backing his female lead breathy harmony distorted ethereal layered fragile hushed nasal distant spoken plaintive |
| lyrics | "good" words: clever poetic melodies simple witty pretty "bad" words: vague nonsensical meaningless |
| piano | electric chords ballad melody solo simple flourishes plaintive |
| electronics | analog experimental piercing gurgling primitive glitchy abstract burbling lo-fi warm |
| strings | plucked swelling sweeping sampled bowed eerie swirling harmonies |
2. building a database of music reviews ; index ; 4. discussing the results