Seminar on scaling learning at DIMA TU Berlin #

Last Thursday the seminar on scaling learning problems took place at DIMA at TU Berlin. We had five students give talks.

The talks started with an introduction to map reduce. Oleg Mayevskiy first explained the basic concept, than gave an overview of the parallelization architecture and finally showed how jobs can be formulated as map reduce jobs.

His paper as well as his slides are available online.

Second was Daniel Georg - he was working on the rather broad topic of NoSQL databases. Being too fuzzy to be covered in one 20min talk, Daniel focussed on distributed solutions - namely Bigtable/HBase and Yahoo! PNUTS.

Daniel’s paper as well as the slides are available online as well.

Third was Dirk Dieter Flamming on duplicate detection. He concentrated on algorithms for near duplicate detection needed when building information retrieval systems that work with real world documents: The web is full of copies, mirrors, near duplicates and documents made of partial copies. The important task is to identify near duplicates to not only reduce the data store but to potentially be able to track original authorship over time.

Again, paper and slides are available online.

After a short break, Qiuyan Xu presented ways to learn ranking functions from explicit as well as implicit user feedback. Any interaction with search engines provides valuable feedback about the quality of the current ranking function. Watching users - and learning from their clicks - can help to improve future ranking functions.

A very detailedpaper as well as slides are available for download.

Last talk was be Robert Kubiak on topic detection and tracking. The talk presented methods for identifying and tracking upcoming topics e.g. in news streams or blog postings. Given the amount of new information published digitally each day, these systems can help following interesting news topics or by sending notifications on new, upcoming topics.

Paper and slides are available online.

If you are a student in Berlin interested in scalable machine learning: The next course IMPRO2 has been setup. As last year the goal is to not only improve your skills in writing code but also to interact with the community and if appropriate to contribute back the work created during the course.