Recess Week

Monday (22nd Sept)
Had a discussion with Ken about our study of NLP and also how we should work on input analysis. In order to better understand users' input SMS, there is a need to understand the context of the SMS.

We discussed some of the problems and decided to experiment with the system to see how it handled words that can be, say, a noun and also a verb. We sent the SMS "Let me fly a fly" to the system and realised that the Data Processor only recognised the word "fly" once. The synonyms listed included BOTH nouns and verbs. In view of this, Ken suggested that it might be possible to better understand the context of words in the SMS input by using a Treebank Parser.

Will read up on this (see here and here), as well as study the code (rather than the relevant paper) of the POSTagger that we are using.

(Edit: The POSTagger actually tagged both "fly"s correctly (one as a verb and the other as a noun), but they were treated as one, in that only one "fly" was selected and the weightResult calculated and that the synonyms listed included both verbs and nouns. As a first step, I will try to make the Data Processor recognise which "fly" is selected and only shortlist the corresponding synonyms (i.e. synonyms that are either verbs or nouns but not both). This may help in getting more coherent output.)

Tuesday (23rd Sept)
Skype meeting with Makino took place as planned on Tuesday at 3 p.m. Received latest version of the TextDisplay code from him and also updated him on LTA's request for video overlay. Highlighted two problems to him:
1) Selected poetry (not the sms input)'s font size seems to be too small
2) When multiple smses are received, sometimes the poems displayed overlap.

Started running the new visualisation on the system to test for memory leaks. Test was commenced at 1719H on 23rd Sept (Tues).

Wednesday (24th Sept)
The visualisation was shutdown (apparently) when I came into the lab on 24th Sept (Wed). Nimesha restarted the program at about 1015H. When I left the lab at about 1845H, Task Manager showed that TextDisplay.exe was using about 1,040,000K of memory, but there was still no error message displayed. Will check on it again.

Dropped the idea of using Treebank Parser for now. After further discussion with Ken and Vidyarth, decided that I need to extrapolate more information from the limited input (SMS). The first step to do so would be to take each of the selected keywords from the SMS and extract their definitions from the dictionary wholesale. This is followed by PosTagging the definitions. Further processing will then be possible, that may better relate the SMS input with these definitions (e.g. co-relating the object in the SMS input to the object in the definition).

Have accomplished these (extraction of definitions and PosTagging them) today, in the process of which I understood existing portions of the code better (in particular, the data structures being used, such as DataTable, as well as the associated functions).

Thursday (25th Sept)
Checked with Ken and was informed that TextDisplay.exe ran into a kernel32.dll error eventually. Observed that polling results were repeatedly displayed on stdout. Suspect that this is the cause of the memory leak. Will look at the code again to identify where this problem stems from. The idea in mind now is to comment out the polling portion of the code and see if the visualisation code itself causes any memory leak. If it doesn't at least it means we have isolated the problem.
Comments