Workshop Description

Overview: The emerging world of search we see is one which makes increasing use of information extraction, gradually blends in semantic web technology and peer to peer systems, and uses grid computing as part of resources for information extraction and learning. This workshop explores the theory and application of machine learning in this context for the internet, intranets, the emerging semantic web, and peer to peer search. The workshop also aims to advertise and promote suitable software and data infrastructure to support the research, and community platforms, open source solutions, and grid tools for large scale experiments.

Proceedings: "Learning in Web Search (LWS 2005)" (PDF, 1.5Mb)

Invited speakers:

Thomas Hofmann:
"Large Margin Methods in Information Extraction and Content Categorization"
Thorsten Joachims:
"Generating Accurate Training Data from Implicit Feedback"
Soumen Chakrabarti
"Type-enabled Keyword Searches with Uncertain Schema"

Areas: There are many exciting opportunities here both in applications for machine learning and in new fundamental research problems for the community: Some search areas we are interested in include (but are not limited to):

  • identifying named-entities, and keywords from minimal examples;
  • learning mappings between query words and document words for question answering;
  • identifying per-site boilerplate and template;
  • distributing indexes and routing queries in a peer to peer search engine;
  • optimising query ranking formula using clickthrough data;
  • the language modelling approach to web retrieval;
  • personalisation of search;
  • text summarization and results summarization or clustering;
  • information extraction from text for semantic indexing and automatic knowledge markup;
  • domain specific clustering and categorization, useful in domain specific search engines lacking good ontologies; and
  • semi-automatic support in ontology development and maintenance (ontology learning, evolution)

In addition, we have an interest in means of supporting machine learning via software platforms and tools suitable databases, content or logs (note TREC, MUC etc.). Position papers regarding the relevant application and theory of machine learning are also encouraged.

Format: The workshop will take place over the full day of Sunday, 7th August 2005. It will be open format though attendance will be limited by space. The afternoon panel will be drawn from the program committee and will include speakers positions as well as audience participation. The workshop proceedings will be available on-line.