CA2

CA2 Write-up                                                            

In order to allow the system to select poetry lines that are more relevant to the SMS input received, as well as to achieve greater coherence in the final poem created, there is a need to obtain a better understanding of the input. To provide the system with this improved understanding, an approach involving the use of first-order logic, otherwise known as predicate argument structure analysis, was adopted.

 

In such an approach, the verb or adjective in a sentence is its central element. It describes the movement or state of an event. As such, it forms the predicate symbol. The person or thing in the sentence that is associated with this event forms the argument to this predicate. Consider the sentence, “John buys a book.” This sentence can be expressed in the form of the predicate buys (John, book). As this predicate has two arguments (“John” and “book”), it is said to have arity 2. The sentence “John runs”, will be represented by the predicate runs (John), which is a predicate with arity 1.

 

Predicate argument structure analysis is appropriate for the purposes of the BlogWall project as the system needs to interpret SMS input, which generally comprises one or more short sentences. By performing this analysis on every poetry line we have in the system, a table containing first-order logic representations of each poetry line can be obtained. After performing this analysis on SMS input received, a search can then be run through the database, retrieving poetry lines that have the same predicate symbol (i.e., same verb or adjective) and arity, as well as the same subject(s) and/or object(s) if possible. Otherwise, the search criteria can then be incrementally relaxed, first retrieving poetry lines that have the same predicate symbol and arity but different subject(s) and/or object(s), then lines that only have the same predicate symbol. In identifying lines that have the same predicate symbol, the synonyms retrieved earlier can also be applied.

 

By adopting this approach of using predicate argument structure analysis on the existing database of poetry lines as well as on the SMS input, then retrieving the best predicate-argument match, the system will be better able to generate poetry lines that have greater relevance in meaning to that of the SMS input received.

 

Algorithm

The algorithm used to perform this analysis is based on the syntax described in the book “Artificial Intelligence - Structures and Strategies for Complex Problem Solving” by George F Luger. The following syntax rules and parse tree are obtained from pp. 625 and 626 of the book.

 

           Figure 1: Syntax Rules          

   

Figure 2: Parse tree for sentence "The man bites the dog"


The parse tree in figure 2 is obtained from the syntax rules shown in figure 1. The BlogWall system adopts a similar approach, by breaking down a given input sentence into its constituent noun and verb phrases, after PosTagging has been performed on it. The algorithm used includes adjectives and determinants preceding a noun in the noun phrase. The corresponding flowchart is shown in figure 3 below.

 
Figure 3: Flow chart of algorithm used to identify noun phrases  

The system can accommodate sentences with multiple subjects and objects. However, at this stage, it can only handle sentences that take the form “[subject(s)]… [verb/adjective]… [object(s)]”. The following are images from test runs.

 

Figure 4: Test run with sentence "skinny mary pushed the fat man" (one subject, one object)

 

Figure 5: Test run with sentence "skinny john and david fought big fat goliath” (two subjects, one object)

 
Figure 6: Test run with sentence "skinny john and david fought big fat goliath and strong sam" (two subjects, two objects)
 

Predicates in the Poetry Database

A function has been written to perform a similar predicate analysis on every poem line in the database, then storing the resulting predicate symbol, arity and poem line ID in a newly-created table. This is to facilitate future comparisons for poetry line selection purposes. The GUI, as well as a screenshot of the table contents is shown.

 
Figure 7: GUI for performing predicate analysis on poems in database
 
Figure 8: Screenshot of Predicates table
 

Future Improvements

The system is still being developed to accommodate multiple verbs/adjectives. At present, it can only extract one verb (if any) from any input sentence. It is still unable to do the same for an adjective in the sentence. This will be improved upon in the immediate future.

 

Furthermore, given an input sentence that contains multiple verbs/adjectives, it should be able to represent the sentence using multiple predicates. Similarly, every poetry line in the database can have multiple entries in the predicate table, corresponding to the verbs and adjectives in the line. The system should also be able to handle sentences that take other forms, such as “[subject(s)]… [object(s)]… [verb(s)/adjective(s)]”. An example of such a sentence can be “Jack and John fought”.

 

A comparison/matching algorithm needs to be developed that searches for the best predicate match between the input and lines from the database. The matching conditions should be able to be relaxed incrementally.

 

Finally, as a failsafe, a procedure needs to be developed such that should the algorithm be unable to find any poetry line whose predicate(s) match that/those of the SMS input, the system will still be able to revert to the original method of poetry creation through keyword matching.

Comments