Week 11

The algorithm I wrote last week did not work. After writing out the code, I tested it using the sentence "she killed the fat man". It returned the ENTIRE sentence for both the verb phrase and the noun phrase. I realised that the first word, "she", was tagged as "/NN" and so were "fat" and "man". As such, the entire sentence was determined to be the noun phrase. 

I thought through the algorithm again and decided on a different approach:

1) Iterate through the entire sentence and save the location of the first instance of a noun ("/NN", "/PRP", etc). This is the start of the (first) noun phrase.
2) If there was a noun identified in the previous step:
    2(a) check if the word before this first instance of a noun is an adjective. If it is, count the adjective as the start of the noun phrase. Only perform this check if the first instance of the noun is NOT the first word in the sentence (to prevent error while reading memory because of reading posTags[-1]).
    2(b) iterate through the rest of the sentence from the word following that noun and check that it also is a noun. Else, the position of the previous word       (which is the last verified noun) is the end of the noun phrase.
3) Continue iterating through the sentence to identify the next noun phrase (if any), until the end of the sentence is reached.

NOTE: BETWEEN THE START AND END OF A NOUN PHRASE, I HAVE INCLUDED DETERMINERS ("/DT" TAG, E.G. "THE"), AS WELL AS ADJECTIVES (only "/JJ"), BECAUSE THEY DO FORM PART OF THE NOUN PHRASE.

This approach will cater for multiple noun phrases in a sentence as well.

I've written out the code and tested it with the following:

Input: she killed the man
Noun Phrase: she
Noun Phrase: the man

Input: she killed the fat man
Noun Phrase: she
Noun Phrase: the fat man

Input: jack and john fought
Noun Phrase: jack
Noun Phrase: john

Input: running is enjoyable
Noun Phrase: running

Input: running is fun
Noun Phrase: running
Noun Phrase: fun

In the last sentence tested, I noticed that there was a mistake, as there should only be one noun phrase, "running". However, upon inspection, I found out that the sentence was tagged as "running/NN is/VBZ fun/NN". The last word, "fun", was tagged as "/NN" instead of "/JJ" (adjective), since it has multiple meanings. This is an inherent problem within the PosTagger itself. As can be seen from the second last sentence though, if an unambiguous adjective is used instead, the PosTagger tags it correctly and so the noun phrase is identified correctly as well. 

Other test sentences used:

Input: jack and strong john fought
Noun Phrase: jack
Noun Phrase: strong john

Input: big jack and strong john fought
Noun Phrase: big jack
Noun Phrase: strong john

Input: she is prettier than the flower
Noun Phrase: she
Noun Phrase: the flower

Input: mary had a little lamb
Noun Phrase: mary 
Noun Phrase: a little lamb

In summary, this identification of the noun phrase seems to work. Will think of more sentences to test this algorithm with. The next step will be to use a similar approach to identify the verb phrase, then obtain the predicate.

Note: At this stage, the bulk of the code I've been writing and testing out is in DataProcessor.cpp, under the PredicateAnalysis() function (plus some other minor additions in MagicClasses.h and other files). Inside this function, I've also commented out plenty of code that I've written, such as those I wrote last week. I am keeping these codes there just in case they prove to be useful again somehow in subsequent stages. It also serves to keep track of the actual code I've written thus far.
Comments