Well, that time of the year has come and gone, and we're all better people for it.
The place: New York City
The event: The Lexalytics User Group, 2013
We had a nice crowd, both in terms of size and the general niceness of people :). I had my mad MC skillz on the mic all day, introducing a number of great speakers and wrapping things up with a view into our new software release: Salience 5.1.1 (details forthcoming).
Videos will be posted as soon as I get 'em, but for now, use your imagination.
Anna Smith from Bitly led off, with a rousing presentation on what you can do with a view of every page that people are referencing on Twitter. Not just reliant on the very short 140 characters, they have a great view of the content behind the shortened links. They have hundreds of terabytes of content stored, and are adding several terabytes of new content each month. As she demonstrated, there's incredible power in having this view - you can see what people find important to share, and give advice back to content providers and enterprises as to where they're being successful. Also, cats.
Oleg Rogynskyy from Semantria (our beloved SaaS text analysis partner) showed how they've built a very flexible, scalable system on Salience - bringing our heavy duty text analysis platform within reach of those pesky surveys and small jobs that just aren't worth building a whole system. Semantria is a great option for those companies with low volume projects that need access to high-capability text analysis. Sentiment analysis, categorization, facets, themes - the whole nine yards wrapped up in a nice SaaS platform, and presented as an Excel plugin for the software that everybody uses.
Craig Golightly of Software Technology Group enraptured the group with his low-tech flipchart presentation. I would argue that he was the most engaging speaker (other than myself, of course) of the day. He used a fantastic analogy with a "dirty brownie" (no, really, he had brownies up there in front of the group) to show how you can carve out useful data from the messiest dataset. Craig has lots of experience with voice-to-text, an important feeder technology for text mining. He's a huge proponent of Voci's system, and after his presentation, we can understand why.
Elizabeth Baran of Lexalytics went over all the work that has gone into our Chinese language pack - from tokenizing words (no word boundaries in the printed text, you know) to handling Chinese idiom, she did a marvelous job of showing just why you can't rely on translation to get the most out of your text.
Brandon Kane of Angoss showed their predictive analytics system on a #bigdata (I just thought I'd throw in a random hashtag here) set of well over a million tweets concerning various cellphone brands. He showed interesting correlations between sentiment, Klout scores, gender, and issues like "returns". This is information that's priceless (well, it does have a price, but work with me here) for marketing people looking to make better decisions about where to put their resources.
Russell Couterier of Cybertap had by far the most questions and really grabbed the imagination of the group. They've built a product that's in use by a number of sooper-seekret organizations to help ferret out cybercrime, as well as being well tuned for compliance. They can capture all the packets on very high speed links, completely reassemble the sessions (down to showing you the web pages, emails, instant messages, and the like), and then they pass it through Salience and our text mining capabilities - giving a rich view of just what is being discussed - and popping out all the anomolous and illegal behavior with a nice bright highlight.
Tim Mohler of Lexalytics displayed a few ways that our customers can use Salience in ways that allow them to get more out of the text analysis that we provide. From rolling up discovered themes into coherent buckets to giving an "unlimited dive-in", he showed different ways that our base "set of Legos(tm)" can be used to provide very rich information through multiple processing and aggregation steps. He also showed some experimental (well, this was all experimental), but more experimental work that uses NHibernate to take a complete snapshot of Salience output and place it in a database so that our customers can root around and experiment with the output. This is important because often times it can be hard to decide what to extract and store as part of a big processing pipeline, by capturing everything from a dataset, he showed that you can surf around in that dataset and make better decisions around what you should show to everybody.
And then there was me, talking about Salience 5.1.1 (released on the day of the LUG), but that's a topic for another blog post.
Thanks again for all the attendees and thanks very much to the speakers for making our day a success!