Week 10

Monday (20th Oct)
From my limited understanding of the PosTagger code, I am unable to find (or even recognise) any parse tree of the SMS input sentence that's built during the PosTagging process. As such, I have tried to obtain the predicate form of the sentence based on my own understanding of first order logic and sentence structure that I have read thus far (see previous post on how a sentence is broken down into noun_phrase, verb_phrase, etc).

My approach is this:
1) Begin by locating/identifying the word that is a verb.
2) Look for the object relating to it (if any).
3) Look for the subject relating to it.
4) Form the predicate.
5) Adopt similar approach for adjective (I've worked under the assumption that an adjective predicate always has arity of 1, with the exception of a comparative adjective, which will then have arity of 2).

The rough algorithm of my code (for now):

(tg is the Tagger^ structure passed into the function)

I tried to identify and extract a verb from the SMS input sentence, by comparing the PosTags in tg->posTags with the PosTags of a verb (i.e., VB, VBD, VBG, VBN, VBP, and VBZ), then matching its index to the index of the word in tg->words. I then traversed the sentence from that point in a forward direction to identify and extract the first noun/pronoun, etc. For now, I've assumed that the first noun/pronoun, etc that appears AFTER the verb (if any) is the object. I then traversed the sentence backwards from the verb's position to locate and extract the first noun/pronoun etc. For now, I've assumed that the first noun/pronoun, etc that appears BEFORE the verb is the subject. I am then able to determine whether the verb predicate has arity of 1 or 2 and then output this to the console screen accordingly.

I adopted the same approach to form the adjective predicate, taking into account the special case of comparative adjective, as mentioned above.

I've tested out this algorithm on simple sentences and they seem to work alright thus far.

Input Sentence: she killed the man
Predicate: killed (she, man)

Input Sentence: she is prettier than the flower
Predicate: prettier(she,flower)

Input Sentence: mary had a little lamb
Predicate: had(mary,lamb)

And then there is a parameter index out of range error. Doesn't seem to be a big problem. Will look at it again.

In addition, next steps that I am planning will be to try and get the code to work with sentences that contain both a verb and an adjective (like in "mary had a little lamb"), as well as sentences containing multiple verbs and/or multiple adjectives.

Friday (24th Oct)
MP Penny Low's visit today. Due to the visit, I was unable to compile and test the code that I've written. I realised that there was an inherent problem with the PosTagging. When I ran my code with the sentence "she killed the fat man", I realised that the PosTagger tagged "fat" as a noun ("NN"). In addition, I tried running the sentence with other variations of the sentence, some of which do not even make sense, such as "she killed the flower man", "she killed the runs man" and "she killed the it man". Regardless of the word that I put in place of "fat", it was still tagged as a noun. It also occurred to me that the final output predicate to represent the sentence should be:

killed (she, fat man),

which is actually

killed (she, (fat(man))).

In light of this, I commented out the code I've written on Monday and instead, reverted back to trying to first identify the verb and noun phrases. The algorithm I used to identify the noun phrase is to locate the first noun in the sentence, then continue looking through the rest of the sentence and locate the last noun (if any). The words from the first to the last noun in the sentence form the noun phrase. A similar approach is used for identifying the verb phrase.

Will test out this code that I've written next week.