How to get your submission accepted at Berlin Buzzwords

2013-05-07 11:21
Disclaimer: Intentionally posting on my private blog - these are my own criteria, not general advice from the review committee.

Berlin Buzzwords is in it's fourth year. Probably the most tedious task of all is having to select talks to make it into the final schedule. With roughly 120 submissions and roughly 30 slots to fill the result is that three quarters of all submissions have to be rejected. Last year I shared some details on how we do talk ranking given reviewers have provided their input.

Now the mechanics of ranking are clear, people have asked me what goes into the reviews themselves. Here I can only speak for myself: After doing reviews ourselves during the first two years, Simon, Jan and myself decided to spread the work of reviewing submissions among a larger team of people. As nearly all of them had attended Berlin Buzzwords in the past already (or had at least followed the conference remotely) we could assume they were roughly familiar with what kind of content would be a good fit. As a result review guidelines that we send out tend to be rather light:

Berlin Buzzwords is a conference from geeks for geeks: The goal is to get the people actively working in the field together to meet and exchange ideas. Content should have some technical depth - in particular pure marketing talks and obvious product placements without further technical value are not welcome. We usually invite both, interesting case studies as well as talks highlighting the technical details a project is built upon.

In the end judgement is up to the individual reviewer - so I can speak only for myself when listing what you should do to get your talk accepted.

  • Be on topic. There's always a handful of submissions that look and sound like pure marketing, product placement or simply aren't related to software engineering at all. Those tend to be easy to spot and weed out.
  • Tell us what you are talking about. An abstract is there to provide some detail on your presentation - don't be just funny, promising overly generic content. In order to decide whether or not your talk is relevant please provide some details on which direction you'll be heading.
  • Don't be too detailed in the abstract neither - there's no need to list the content of every slide. Make sure the abstract correctly summarizes your talk, making it catchy and nice to read usually helps if the content is solid.
  • We try to find those speakers that have not only an interesting topic to talk about but are also a pleasure to listen to, who can successfully get their point across. We cannot know every potential speaker in person though. As a result it helps if you list which conferences you've spoken at in the past, any videos of previous talks is helpful as well. As a general piece of advice: Choosing Berlin Buzzwords as your first conference to speak at ever usually is a great way to disaster. Get some practice at local meetups like the Berlin Hadoop Get Together, the data science day, the Java User Group Berlin Brandenburg, the RecSys Stammtisch Berlin or the MongoDB User Group Berlin to name just a few.
  • Make sure your talk is novel - submitting the same topic in 2012 and 2013 is a great way to ensure getting rejected. Also it is fine to submit a talk you have given at another conference earlier. However if everyone in the Buzzwords audience is very likely to have watched the exact same version of your presentation earlier already, we are less likely to accept your talk.
  • Finally: When drafting your bio make sure to include details that explain why you are the perfect expert to talk about the topic at hand. As much as I'd like to I don't know every project's committer by name. Provide some help by pointing out explicitly what your contributions have been or in what context you have used the technology you are presenting. Don't be shy to list that you are a co-founder of a successful project. Not only does this information help with selecting talks, it also provides some background for the audience to judge the claims you make.

Two words on the role of free software at Buzzwords: There is no explicit requirement to only talk about software that is publicly available under a free software license however if some project or framework is presented it helps to be open source to raise the applicability for the audience. Most projects discussed at Berlin Buzzwords are developed openly. In order to get the maximum out of these projects it pays to know how they work internally, how to get active yourself, how to contribute. As a result discussions and talks on project governance are generally welcome.

A parting note: With way more than half of all submissions to reject making a final decision will always be hard. Being rejected doesn't necessarily mean that your proposal was bad. Following the above advise may raise chances of being accepted - however it is no guarantee. We could raise the number of accepted talks by extending the conference by another track or even another day - at the cost of raising the ticket price substantially. However we want not only "big corp representatives" but a diverse audience, attendees that get active themselves, that help shape the conference:

There's plenty of space and time to get active in addition to the main conference program. Use the time and space to shape the conference.

On Taming Text

2013-01-01 20:21
This time of the year I would usually post pictures of my bicycle standing in the snow somewhere in Tierpark. This year however I was tricked into using public transport instead: a) After my husband found a new job, we now share some of the route to work - and he isn't crazy going by bike when it's snowing. b) I got myself a Nexus7 earlier this month which obsoleted having to take paper books with me when using public transport. c) Early in December Grant Ingersoll asked me for feedback on the by now nearly finished "Taming Text (currently available as MEAP at Manning). So I even had a really interesting book to read on my way home.

Up to mid-December "Taming Text" was one of those books that always were very high on my to-read list: At least from the TOC it looked like the book to read if ever you wanted to write a search application. So I was really curious which topics it would cover and how deep explanations would go when I got the offer to read and review the book.


Short version: If you are building search applications - that is anything that makes a search box available on a web site, be it an online store or a new article archive - this is the book to read. It covers all the gory details of how to implement features we have come to take for granted when using search: Type ahead, spelling correction, facetting, automatic tagging and more. The book motivates what the value of these features is from the user side, explains how to implement these features with proven technologies like Apache Lucene, OpenNLP, and Mahout and how those projects work internally to provide you with the functionality you need.

Longer summary

Search can be as easy as providing one box in some corner on your web site that users can type into to find relevant pages. However when thinking about the topic just a little more some more handy features that users have come to expect come to mind:

  • Type ahead to avoid superfluous typing - it also comes in handy to avoid spelling errors and to know exactly which query actually will return a decent number of documents.
  • Spelling correction is pretty much standard - and avoids user frustration with hard to spell query terms.
  • Facetting is a great way to discover and explore more content in particular when there are a few structured attributes attached to your items (prices to books, colors to cars etc).
  • Named Entity Recognition is well known among publishers who use automatic tagging services to support their staff.

The authors of Taming Text decided to structure the book around the task of building an automatic Question Answering system. Throughout the book they present technologies that need to be orchestrated to build such an application but are each valuable in it's own right.

In contrast to Search Patterns (which is focused mainly on the product manager perspective and contains much less technical detail) Taming Text is the book to read for any engineer working on search applications. In contrast to books like Programming Collective Ingelligence Taming Text takes you one level further by not only showing the tools to use but also explaining their inner workings so that you can adapt them exactly to your use case. To me, Taming Text is the ideal complimentary book to Mahout in Action (for the machine learning part) and Lucene in Action for the search part.

Back in 1998 it was estimated that 80% of all information is unstructured data. In order to make sense of that wealth of data we need technologies that can deal with unstructured data. Search is one of the most basic but also most powerful ways to analyse texts. With a good mixture of theoretical background and hands-on-examples Taming Text guides you through the process of building a successful search application, no matter if you are dealing with a vast product database that you want to make more accessible to your users, with an ever growing news archive or with several blog posts and twitter messages that you want to extract data from.

Book: Search Patterns

2012-07-28 20:41
I got the book months ago during FOSDEM - the O'Reilly book table always is a pretty dangerous place as a meeting point for me: Search Patterns - Design for Discovery is one of those small, deceivingly beautiful books that manages to explain effective search engine design by focusing on the end user needs but going into some detail concerning the basics of search engine backends as well.

We use them on a daily basis not only for finding content on the web but also for navigating shopping sites, discovering news content and even finding articles on blogs and open source project pages. Many discovery tasks can be easily expressed as a search problem and as a result tackled with by now standard off the shelve software like Apache Lucene - or event the commercial counterparts from the enterprise search market. Still oftentimes search is perceived as being made up of simple a small box that users type (typically one or two term) queries into and that as a result show a list of some ten links.

After setting the stage for search in the first chapter the book goes into some more detail in "The anatomy of search". In a very approachable way it explains all the components from user constraints, graphical interface, the basics of retrieval and evaluating search performance in terms of precision and recall. The third chapter shows some bahavioural patterns that make discovery easier for users - from incrementally constructing the answer, progessively disclosing more and more detail up to being predictable.

Finally the design patterns as identified by the authors are introduced. Pretty obvious to those working in the field but well explained to those not intimately familiar with the topic:

  • Though perceived as a mere convenience to type less by users, autocomplete can actually help guide the user's search in case of ambiguities and can help avoid imprecise results.
  • Expected as it might be by users, presenting the best result first actually goes a long way when building credibility for a search engine. Having more precise queries to guide e.g. as a result of autocomplete helps here. So does having strong ranking criteria to build up a compelling ranking function that is used by default (even though others might be offered as an alternative for users to explore more and different results).
  • Federated search has both - advantages (integrating otherwise isolated silos of knowledge) but also disadvantages (it's speed being dominated by the slowest connected search engine).
  • Facetted navigation is pretty much standard for any major search engine - giving the user the option to start with a broad query that returns an overwhelming amount of results but guiding the user when refining the query is one major way of driving searches.
  • Offering personalisation tends to be one beloved feature though it is particularly hard to implement and needs a good deal of user data to work well. Usually there are features that require less work to get done that are more promising to start with.
  • Pageination is as much standard to be expected by users - though its implementation can differ: Though we are used to clicking the next button, this actually may not make much sense and just lead to interrupting the user's flow. Much more appealing - but sometimes also confusing - can be interfaces that allow for simply extending the result page when scroling to it's end.
  • Structured results provide a way to give the user more than just an outlink - triggered by specific searches it may be possible to directly answer the user's question instead of linking to content that answers it.
  • Actionable results are a way for the user to get active - either by voting on results, bookmarking them or sharing them with others.
  • Unified discovery is about accepting that search always plays a role in a bigger context and has to play well with the discovery mode the user is in: When searching for "apple" while browsing the category "electronics" it's rather unlikely that I am looking for the fruit. Similarly search should take context into account and support me seamlessly when switching from discovery to directed search and back to discovery mode.

The book concludes by going into some detail on example search engines and presenting some features that are not yet commonplace but might change the world by employing search in new and creative ways.

Easy to read, well written, several nice examples to make the technical points simpler to understand. Definitely a good read for domain experts planning to build a search engine, designers trying to understand the basics of building effective search engines and engineers struggling for words to explain why a seemingly little box can cause a whole lot of pain when done wrong but a whole lot of joy when done right.