Please see our Alvis website for more detail and the Open Source Search webite for general support.
The new economy is based on innovation, and innovation is based on up-to-date information. The semi-static Internet alone has in the order of 1000 million pages of information, and search has become a fundamental service required both by individual citizens and businesses alike. Search facilities have an impact on almost any task related to the information society. Reports indicate that European consumers are dismayed with the US saturation of existing services (as reported by the BBC, May 2, 2002). Moreover, only a few corporations have broad access to rich terabyte repositories of web data that can be used to provide unique value-added services in areas such as shopping, human resources, and business intelligence.
The vast quantity of information sets new challenges for even the best commercial search engines. Building next generation search engines is not just a question of scaling existing techniques. What is needed is a departure from the existing keyword search that has made current search cumbersome even for the skilled. Qualitatively better ways are needed to allow more meaningful, semantically aware queries, to allow search by example of a few select documents, to allow search based on automatically extracted theme, style, topic and word semantics, and to provide responses targeted to the users context, i.e., personalized to incorporate their prior interests and allow feedback.
Search can also be viewed as a knowledge sharing service on the Web, an interface to the Semantic Web . While some automation in building the Semantic Web has been achieved, it remains in part a labour intensive annotation process with problems in scaling up to the full free-text Web. Semantic-based search (which in the proposed model lacks a real ontology and uses implicit concepts rather than explicit knowledge) along with the services it provides could be viewed as a key infrastructure for more complete Semantic Web development, and arguably, as a safety net for it.[an error occurred while processing this directive] [an error occurred while processing this directive]
Current search systems with their centralized, monolithic model imply a business infrastructure with high barrier to entry that makes the economics of developing a centralized search engine impractical. Thus a radically new, cost-effective model for delivering search is also needed, one that allows piecemeal growth. Distributing search services is the obvious candidate and information retrieval research shows this can be done by distributing content according to topic. It is arguably impractical any other way if Web traffic is to be conserved. Thus automatic, hierarchical classification of both documents and queries becomes a central task for the success of the system in order to perform routing and manage the hierarchies.
Modern intelligent Web technology and advances in information retrieval research can make search peer-to-peer, personalized, and semantic-based, with fine-grained topic, style and synonyms automatically produced and maintained. Open source licensing and peer-to-peer functioning can make search an integral part of the Web infrastructure, equally contributed to by both businesses and the public sector. The repositories of classified Web data along with their semantic metadata can then become available for valued-added knowledge industries.
We envisage a number of configurations and applications of search systems software:
|Open source kernel for a search node||Topic-specified engine developers: SMEs, publicly funded organizations, researchers seeking deployment platform|
|Publicly available European search engine with advanced search capabilities||European citizens, public and commercial sector, scientists, European educational community|
|Web repositories with fast streaming access||Value-added knowledge industries|
|Configurable search systems for intranets||Large Corporations, government agencies|
|Next generation information retrieval||Publishers with pay-per-view access, libraries|