Week 12

Received feedback from Ken about my algorithm, which included some suggestions on how to improve it.

There was a Skype meeting wif Prof Cheok and Makino on Thursday (5-6pm) which I couldn't attend since I had a tutorial during that time. Met Ken after that (about 6.30pm) and found out that LTA requested for some changes to be made to the visualisations. Prof Cheok also indicated that there was no need to incorporate the changes to the system made by my FYP as yet.

Incorporated some of Ken's suggestions into my algorithm. The code is now able to include multiple adjectives preceding a noun as part of the same noun phrase. This is based on the assumption that multiple adjectives preceding a noun will be in the form of "(adjective), (adjective), (adjective), (noun)" (i.e., separated by commas), or "(adjective) and (adjective) (noun)" (i.e. separated by connectives such as "and"). The modified algorithm is as follows:

1) Iterate through the entire sentence and save the location of the first instance of a noun ("/NN", "/PRP", etc). This is the start of the (first) noun phrase.
2) If there was a noun identified in the previous step:
    2(a) While the iterator is NOT at the first word (to prevent reading posTags[-1], which is illegal) AND the preceding word is either an adjective ("/JJ"), a connective ("/CC") or a comma,
        2(a)(1) if the preceding word is a connective or comma AND the word preceding that is an adjective, set the start of the noun phrase to two words before the iterator (i.e., include the comma (or connective) and the adjective before it as the start of the noun phrase).
        2(a)(2) if the preceding word is an adjective, set it as the start of the noun phrase
        2(a)(3) ELSE, this means the preceding word is either a connective or a comma, but the word preceding that is NOT an adjective, so BREAK.
    2(b) iterate through the rest of the sentence from the word following that noun (identified in step 1) and check that it also is a noun. Else, the position of the previous word, (which is the last verified noun) is the end of the noun phrase.
3) Continue iterating through the sentence to identify the next noun phrase (if any), until the end of the sentence is reached.

This has been tested with the following sentences and works fine:

'jack and john fought'
'jack and strong john fought'
'big jack and strong john fought'
'she is prettier than the flower'
'mary had a little lamb'
'she killed the man'
'she killed the fat man'
'running is enjoyable'
'running is fun'
'big jack and pretty, strong john fought'
'big jack and pretty and handsome and beautiful and strong john fought'
'She killed a fat man and a tall woman.'

The following sentence, however, gave a problem:

'Jack, Mary and I watched a nice movie.'

Instead of returning four separate noun phrases "Jack", "Mary", "I" and "a nice movie", it returned only three - "Jack", "Mary and I" and "a nice movie".
Upon inspection, I realised that the algorithm did not fail. It was yet another posTagging issue. "Mary" was tagged as an adjective ("/JJ"). If I rephrased the sentence and wrote "Jack and Mary and I watched a nice movie" instead, (even though grammatically, the use of multiple "and"s in such a manner leaves something to be desired) the code was able to return the four separate noun phrases correctly.

Now, I am looking at the verb phrase. When I look at page 625 of "Artificial Intelligence - Structures and Strategies for Complex Problem Solving", I recalled that a verb phrase can either constitute simply the verb itself, or a verb AND a noun phrase. Looking at my algorithm, it seems that in the latter case (i.e., verb followed by a noun phrase), my algorithm has already identified the noun phrase and separated it out from the verb preceding itself. As such, I postulate that I may be able to directly extract the verb from the sentence and form the predicate using itself and the noun phrases generated.

Previously, I have been outputting the noun_phrases to the screen through the use of the Noun_Phrase.pop_front() function. This causes the previously-saved positions of the noun phrases to be lost. As such, I am now trying to employ the use of iterators to access these values so that I can use them for matching with the verb phrase to construct the predicate. I have run into some problems with the use of iterators now, in particular the comparison of the value of an iterator with, say, Noun_Phrase.end(). I believe this is a minor problem. Once I overcome this, I will be able to match entire noun phrases with the verbs.

I plan to start with simple cases, where the noun phrase(s) before the verb's position is(are) the subject(s), while those after it will be the object(s). Then, I can consider cases where both the subject and object are before the verb's position, such as "big jack and small john fought".