Table Of Contents
The Complex Systems Computation Group (CoSCo) at HIIT, Helsinki Institute for Information Technology. See http://cosco.hiit.fi/. Thus, a research group. CoSCo group does many different things and search is one of them. See http://cosco.hiit.fi/search/.2: What are you doing?
We research and develop next generation information retrieval. This includes statistical models to analyse the content, language models and the actual implementations. Our main effort is to develop a full-fledged open source Web search engine, including the crawlers. It will be based on content analysis, in contrast to the traditional keyword indexing engines. Content analyzing IRC search engine is one of our subprojects.
Please note that our main motivation is to research and develop theory and code for our search engine. By saying that we're developing a Web or IRC search engine, we mean that we're developing the code for it which will be released in open source. Our main motivation is not to provide public search services on Web, except some demonstrations maybe.The data we gather, let it be web pages or IRC logs, is used for this research and development. We will never use any data in public demonstrations without asking first for appropriate permissions. 3: How?
We analyse the content in web pages and IRC discussions using various statistical models. For detailed explanation, see
W.Buntine, S.Perttu, Is Multinomial PCA Multi-faceted Clustering or Dimensionality Reduction?. Pp. 300-307 in Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, edited by C.M. Bishop and B.J. Frey. Society for Artificial Intelligence and Statistics, 2003.
W.Buntine, Variational Extensions to EM and Multinomial PCA. Pp. 23-34 in Proceedings of the 13th European Conference on Machine Learning, edited by T. Elomaa, H. Mannila and H. Toivonen. Vol. 2430 in Lecture Notes in Artificial Intelligence, Springer-Verlag 2002.
For further reference, see the list of selected CoSCo group publications.4: Why IRC search?
There are many useful scenarios, e.g:
We find all this intriguing. Moreover highly dynamic enviroments such as IRC are interesting in the scientific point of view. Web and Usenet contain mostly static documents. Moreover to our knowledge no one has developed a system like this before in this scale.5: How will this benefit the community?
We publish research papers. We will publish our code under GPL. We will eventually develop these systems together with the community. We hope that we could even provide some novel and useful services to the Internet.
Those are useful services, but they don't tell you anything about the actual content of discussions. Thus they can't realize the scenarios presented in 4. Why IRC search?.