I started my basement build of a cognitive system in March. The idea was prompted by Tony Pearson’s article, originally entitled, “IBM Watson – How to build your own Watson Jr” in your basement.” The title was recently changed to “IBM Watson — How to replicate Watson hardware and systems design for your own use in your basement.” As the article explains, “IBM is concerned that some businesses might be led to believe they could simply stand up their own systems based entirely on open source and internally developed code for business use.” It’s a valid concern. The article is a bit of a tease. Most of the article is about the hardware and networking requirements. A brief mention is made of OpenNLP, OpenCyc and UIMA, but the analytical capabilities required for a cognitive system are substantial. Open source technologies provide the software infrastructure for building an analytical system, but the design and development is a non-trivial process.
I called my basement build, “What, son?” a humorous reference to my son’s frequent questions, which a cognitive system might help answer. Shortened to “Whatson”, it is obviously a play on IBM’s “Watson”.
My project is a basement build. I wanted to walk through the development of a cognitive system at a granular code level, for the benefit of my own learning and anyone else who might want to tag along. For the first iteration (v0.1) I simply recreated the Question-Answer system outlined in Taming Text by Ingersoll, Morton and Farris. I replicated their Solr-OpenNLP code on public domain literature, building a system that used Natural Language Processing to answer questions about literature. For the second iteration (v0.2) I took deep dive into the OpenNLP code, producing code samples in Java that moved closer to my own ends. A recent post, for example, demonstrated how build a custom model for Book Title named entity recognition. I am close to concluding v0.2, then much more work is required, even for a basement build.
Every time I use the project name, Whatson, I feel the need to distinguish my basement build from Watson proper, the IBM product. I never imagined that my basement build would ever have the same capabilities as IBM’s Watson. What I am beginning to see is that a basement build may have virtues of its own. In a future post I will explain an architecture decision I have made to access and analyze external sources for answers, rather than crawling and indexing everything internally. I am quite excited about this idea. For now, I just want to point out the increasing divergence of the two products.
It is time for a better naming convention. Henceforth the general name for the basement build will be Physika, the name of this blog. Iterations will be getting more comprehensive and powerful, with more time inbetween. Each iteration will have a codename. Iterations 0.1 and 0.2 will keep the codename, Whatson. Iteration 0.3 will be codenamed Wilson. I’m still having fun with names. IBM’s Watson was named after the company’s founder but is also an allusion to the steady sidekick of Sherlock Holmes. In the TV series, House, Dr. Gregory House and Dr. James Wilson are a modern Holmes and Watson. The actor who played Wilson, Robert Sean Leonard, also appeared in Dead Poet’s Society as a young man with a passion for literature. It fits. The idea for the name came from my very literary son, the one who inspired “What, son?”