Book Search, Heatmaps and Information Access

February 22, 2007 at 9:54 PM

The quarter is winding down. At least, I hope with as much sincerity as I can muster that it'll end soon; I'm ready to be focusing on work rather than school. More on that in a different post; this one is about my Capstone Project.

I've always loved literature, so it was a natural choice to focus my efforts in that direction. The working name of my project is TomeTracker. My aim was fairly simple: I wanted to provide people with better access inside the books they already own.

Google Books and Amazon's A9 search engine already provide services aimed at helping people buy books. But I've already bought a bunch of books—far too many, if you ask my wife—and I want to make better use of them.

So, personalized library search is a key goal of my project. And (because the concept is cool), I also want to make use of heatmaps to provide intuitive search results visualization. Books may be long or short, but summarizing search results in heatmap form communicates very useful data regardless of a given book's length. Here are the results thus far, with a few (contrived?) examples:

First, a general library view This sample account has a scant 52 books, mostly because I'm lazy. lib-view.jpg

Full-text search; heatmap results I wondered how many of my 52 books talk about King Lear. Let's find out! lear.jpg

Detailed results by book It's obvious that King Lear (the play) will contain the words "king" and "lear", but some of these other books are less obvious. Ulysses, for instance—granted, Stephen Dedalus is always giving some-theory-or-another about Shakespeare, but he usually confines the ranting to Hamlet. So what does the text actually say? lear2.jpg

Ah. Thank you, James Joyce... that's so much clearer. Unfortunately, being able to search inside of Ulysses won't help you understand it. Moving right along...

Wonder when Friday appears in Robinson Crusoe? Yeah, it's pretty easy to locate unique passages in a given book, although it does require you to know something about the book in question. For example, knowing that "Friday" is a character in Robinson Crusoe and that Raskolnikov commits murder with an axe is imperative for the following two examples. friday.jpg

When does Raskolnikov commit murder in Crime and Punishment? axe.jpg

Heatmaps are great for identifying topical regions within a text, and being able to retrieve snippets from a specific book allows the searcher to find specific quotes. I've only been dabbling and testing thus far, but I think this would be really useful for writing papers and doing research.

What's next? I think there's a lot here already, but there's more in the works. I'm adding a social networking aspect to the prototype: find people with similar libraries, recommend books to other users, argue your pathetic heart out with a particular book as the focus... you get the gist.

Most of my "pending" work is in this arena. There are always touch-ups and optimizations that I'd love to implement, but the quarter is only so long. I'm going to have to make a poster and a presentation for this soon enough, so it's getting close to "code complete".

I've written other posts about the technology that's powering this system, so I won't touch on specific implementation here. Suffice it to say, this is all being powered by Open Source software: GNU/Linux, Solr, and Ruby on Rails. Even the texts are from Project Gutenberg. And I think that's pretty cool.

Until next time.