Table Of Contents
We want to develop a useful open-source tool to aid collaborative work. That is, to make often pretty chaotic IRC discussions more useful. But we can't develop the tool without some real-life data i.e actual IRC discussions.2: Do you have any ideas how to collect the test data set while respecting our privacy?
Yes. We have a suggestion:
In the current situation (18th Nov 2003) we are collecting a test data set which will be handled CONFIDENTIALLY. We will not use this data in public. The data will be used just to evaluate our statistical models.3: What are you going to do with the data?
We will run it through our text processing system, build the statistical models and start tweaking them. Eventually we hope that we could provide a publicly available prototype system allowing you to make searches on this one static data set.4: Searches like what?
It depends what will work and what will not. Probably you could type a query or give an example document and the system returns IRC channels with corresponding topics. Or you could see how the topics change on selected channels over time.
Search would be totally based on the collected test data (nicks gone etc.) and we would not use or collect any other information for this purpose.5: What then?
Even before publishing the prototype system we will provide a public CVS access etc. Then we will see whether this kind of service would be useful. After that everyone could utilize the code for his / her own purposes and maybe we could even try to arrange a bigger and even real-time reference system. We will never publish the data we've collected, code only.