Monday, March 27, 2006

Getting down to work

I've been slightly irritated at having no decent RDDL-based XML versioning software to help me maintain my myriad XML vocabularies, and it seems the the best bet may just be to create it myself. So tomorrow I'm going to post up a spec sheet for people to mull over. I'm thinking of basing it off of Simon Yuill's Social Versioning System.

And then, I promise, I will switch blogging software and begin blogging more!

Sorry about not posting more, but I'm switching blogging software...

Sorry about not posting more, but to be honest I am becoming increasingly uncomfortable at storing my own private thoughts and public musings on the hard drive of someone I don't know, or more importantly, trust. Trust, some sort of mutual respect, is necessary for to share one's data. And yet, how can you trust someone like Google (or Yahoo!) who is clearly going to be data-mining your data for every cent it's worth. They only reason they give you space for free is so that they own your data and can do whatever they want with it. So, can we estabished a decentralized Web 2.0 where data is more open and free, and where it can be trusted? I think we can. It's called the "Semantic Web" - and the Semantic Web, in combination with decentralized "microformats" really is the way to go once all these Web 2.0 companies finally go the way of the dinosaurs.

It's not just applications that are valuable, it's data. I just wish the Free Software Foundation had some more forward thinking about this. I remember asking Richard Stallman himself about it when I set up his talk in Edinburgh, but he seemed genuinely not interested in the Web. Which is sad, because the times, they are a changing. And that includes for free software.

Complex Systems Summer School and Plans

Also, today Sandro Hawke from the Rules Interchange Working Group, whose working on using a sort of first-order logic for the Web, asked me why I wasn't in the group after making some fairly sensible suggestions about the future role of named graph syntax as a replace for reification. After all, most of us find it easier to get our heads around the 1-level RDF reflection of named graphs than the infinite tower of interpreters that can be done in RDF reification, and most libraries already implement named graphs, which is a good sign. So, it looks like I'm going to press for RIF to become my first (official!) W3C working group...

Sunday, September 11, 2005

Notes from Last Week

OK, fell a bit on my research blogging last week, but overall still maintaining. In summary, I finished the "Little Schemer" and now feel like I have my head around Scheme (although the Y Combinator is till a bit confusing, but I did manage to go through the last chapter and implement a giant table-based interpreted for Scheme in Scheme!). Can't wait to get my hands on the "Seasoned Schemer", and I can already see how lambdaXML will allow a level of flexibility in pipelines neverbefore imagined while relying on the good-old-friendly idioms of LISP and the lambda calculus.

Had an interesting meeting with Ewan in which he told me about the IR stuff he was doing and how his work and Bundy (!) could fit in mine possibly...Bundy apparently can clean up messy propositions, which I have lots of!

I managed to write a mapping from XPMDL to lambdaXML for Henry:

http://www.ibiblio.org/hhalpin/homepage/notes/lambda.html


Henry at the meeting revealed to me that the MTP actually decompiles to a significantly more powerful langauge that he believes can implement all the basics needed by lambdaXML version 1.0.
Which I'm not sure if I believe, but if he can explain it to me, then I'll implement it in short order.

I also wrote lengthy notes on Dretske and B.C. Smith, and am going through a giant stack of papers related to the philsophy of computation (starting with Chomskey's classic "Rules and Representations"..), but these aren't quite ready for HTML yet. However, I did manage to write down my draft thinking, available here:

http://www.ibiblio.org/hhalpin/homepage/notes/philosophyabstract.html

The stories from Senga have mysteriously not arrived, which has me worried. Will investigate tomorrow, and will dedicate the rest of next week to finishing the web-interface for the pipeline for Johanna and wrapping up draft one of the philosophy paper for Andy and Henry.

Friday, September 02, 2005

Meeting with Johanna

Quick notes. With the stories coming in next week, we should be well on the way to writing a EACL paper. Also, need to finish getting Web interface for the pipeline up, which should occupy me for most of next week. Lastly, it appears we might be able to automate the "pyramid" scheme using the DUC summarization system, since their system essentially grabs "semantic content units" in the same way that we do, just via humans and not by hands, and ranks in a similar manner.

Thursday, September 01, 2005

Pipelines to Functional Programming, KRL to the SemWeb

Had another meeting with Henry today, very productive. As for the Semantic Web Science workshop in Cambridge, Henry wrote a very nice little position paper that mentioned me :)

In essence, we went over a number of topics. First, that the f(X) Henry will be presenting as a W3C Member Submission is going to be rather simple (only wrap the outside of an Infoset) and only encode W3C standards, but he thinks this is the easiest way to get the ball rolling so to speak. He's quite interested in how f(X) could solve the PHP embedding code problem (as when random non-Infoset code starts appearing in Infosets!) and so lead to better code modularity, and how it could solve the AJAX problem of scripts just ad-hoc modifying DOM trees ( by letting an XML tree expose itself per se). Since I'm more of a Java hacker than a C hacker, it makes sense for me to modify the Markup Technology pipeline to bootstrap f(X) rather than the LX Toolset or coding it all myself, which I agree. I need to develop a side-by-side comparison of MT pipeline vs. f(X).

The second is that there are a number of lessons for the Semantic Web from old KRL projects. First, that Maturana is right on many counts, but his hardline stance against representation is wrong. Maturana has a hard time explaining how "internal structural changes" that have no connection to the signal, can explain the following Brian Smith parable: "To get through a rapids on a canoe, it actually makes sense to paddle upstream, where a balance in the rapids can be obtained so one can better reach a sort of stasis and get through the rapids safely." - and if you tell this to someone who doesn't know it, that they will try this line of action. It seems that there are quite a few stories that only representation can explain easily! However, if analytic philosophy seems to be coming up against all of these problems, and the obvious other choice to build AI upon is hermeuneutics, why isn't anyone following it up? Barwise never formalized situation semantics, Robin Cooper did note that "misunderstaning doesn't just happen, it's constitutive of natural language", and Brian only got so far. So what happened? Is hermeuneutics not enough, due to its *lack* of a representational story?

Lastly, that the KRL project got really complicated and more or less collapsed (having Rumelhart run away into neural networks!) when to properly update semantic networks they finally had to incorporate both action (such as backward and forward chainining triggers and traps!), and reflection to control. So - what's the main problem for the Semantic Web? When the Semantic representations change.

Wednesday, August 31, 2005

Maturana Notes finished

Got the Maturana notes finished. Also got feedback on the Goodman notes by John Lee, which I am busy incorporating into the latest set of notes.

Tuesday, August 30, 2005

Scheming

Basically, finished corpus work for Johanna and preceded to teach myself Scheme using the excellent "Little Schemer" book, and brushing up on my lambda calculus by reviewing the excellent Barendregt notes. Being familiar with Haskell and a bit of LISP, Scheme is sort of a simplified Haskell or LISP. Still, good to be back in a functional frame of mind!

Sunday, August 28, 2005

Corpus work...

Almost done spell-checking and recorrecting the corpus for Senga's regrading. Over two hundred children stories categorized...and to be finally done with by the end of the month!

Saturday, August 27, 2005

Reading Notes for Johanna

Just to keep track of the recommended reading Johanna gave me:

1) Semantic Role Labelling: With the release of Propbank, semantic role labelling is all the rage right now. The real question is should our system do this, and to what extent can it already? I'll have to take a look at what ccg2sem does, but I would guess that unless Johan added WordNet features, it doesn't. The papers "The Necessity of Syntactic Parsing for Semantic Role Labeling" shows that the semantic role-labelling should be divided into two distinct tasks, pruning, which identifies possible arguments, and then matching argument candidates to roles. And as Punyakanok et al. discover, using a full parse helps mainly by identifiying the correct constituents as argument candidates. The other paper about "Semantic Argument Classification Exploiting Argument Interdependence" basically goes even further by saying that any previous semantic roles already idetnfied should be used, but this produces only a one percent increase in recall.

2) The rest of the papers are about the "story comprehension" systems, which basically (using a sample corpus from Remedia which I imagine we could get a hold of, sixty children stories and question and answer sets) just tries to identify the relevant sentence that has the "answer". Basically the systems evolved from "Deep Read" that used prune "bag-of-word" approaches, to a rule-based approach that identified different scores for different levels of "clues" (Riloff and Thelen) to an interesting system (Grois and Wilkins) that uses a word-level transformation (directed by Q-learning) to transform the question ("Who does Nils jump on the back of?") to an answer ("Nils jumps on the back of ____"). This evolution goes from a 30-40-50 percent F-measure basically. It seems like the last method is smart but not hampered by dealing at the word level - after all, could we not do the same matching on using a dependency tree or other semantic representaiton level? One could almost think of a question as an empty semantic representation and one could do a search over available semantic representations to complete the model.

3) Lastly, the "pyramid model" (Nenkova and Passonneau) is interesting as it shows that basically they are using humans to identify "semantic content units", bits of frequently occuring text that are given a weight by how often human annotators use them. They appear to be fairly stable as the size grows, which is good news for any standard. It seems like this is something else that one might just want to do at the semantic level, as Haltern and Teufel have apparently been up to. However, they do not weight theirs (like we would), nor is it clear why one would want a human to be involved with anyways if one could just straightforwardly count overlap automatically.

Friday, August 26, 2005

Post-meeting with Johanna Notes

Overall, Johanna really wants to see more work done on the NLP pipeline to produce semantic represenations. I hope I made it clear that this will be a good example of an applicaton of the framework (both philosophical and technical) that I want to work on with Henry and Andy. However, the conceptual leap is to make the various bits of the thing work as a website. Now off to get the linode server working....

As for the actual capabilities of the server, it should be able to do the following on the text:

1) Optional Morphological Preprocessing
2) Word and Sentence Detection
3) Named Entity and Date Detection
4a) Chunking
4b) CCG-parsing
4c) Dependency Grammar Parsing (based on Optimality Theory)
5a) Coreference Resolution via Syntax
5b) Coreference Resolution via Semantics
6) Temporal Annotation of Semantic Representation
7a) Propositional Semantic Representation
7b) Propositional with Thematic Roles Semantic Representation
7c) Full First-Order Logic form

It will be interesting to see what components I can get up and working by next week. Gotta get
the linode server up to host all of this ASAP, as well as the stuff from Ewan that we had on axon working again.

Thursday, August 25, 2005

Post-meeting with Henry Notes

In summary, Henry basically approves of my chapter outline and the addition of types to functionalXML, although he admits its ambitious, and he thinks that come October if I make it to the functionalXML chapter I'll have something thesis-worthy to submit. Now once I get approval from Johanna over the general outline and narrative part, and get Andy to inspect the philosophy, I'll be ready to write the thesis plan over next week.

Second, I think I've noticed an interesting aspect of functionalXML that gives it a strong case for use *in conjunction* with other programming paradigms. Wadler has just posted code snippets of Links and they've received a more or less a negative response from the web programming community at large. The main point of critique is that Links as presented is just embedding functional code in HTML, which proves to be a horrible way of doing web programming.

What a good selling point of functionalXML could be is that it is XML-compliant, and can thus be a universal format for "embedding" processes (of whatever kind, be it Javascript/AJAX, Web Services, or even Links 0.2) into XML while keeping the actual process in XML compliant. Unfortunately when using PHP/Links the code that does the work is actually non-XML stuff embedded in XML, while in AJAX/RubyonRails methodology basically generates parts of an Infoset selectively *but* you can't tell what nodes its manipulating without first viewing the javascript code. An approach that abstracted away from the actual programming language details and just said "The content of this node will be changed by a program" and specifies the type and arguments of the program (and optionally its location, such as a http URI for a WebService, or a reference to a piece of client or server side code) would actually make web design and programming much easier. I'll write this point up over the weekend with some example code inline.

In other news, IMC Scotland just got its own office near the Uni. at Forest Cafe, and I'm in charge of installing networks. Also just installed ubuntu on my laptop after doing a thorough house-keeping on ibiblio and my laptop.

As for narrative stuff goes, I went through the corpus picking out the stories that needed to be regraded, and had a great meeting with Johan Bos to help guide him with refactoring the XML representation of ccg2sem.