The Bayesian Feed Filtering (BayesFF) project aimed to to identify those articles that are of interest to specific researchers from a set of RSS feeds of Journal Tables of Content by applying the same approach that is used to filter out junk emails.
We developed and investigated the performance of a tool that will aggregate and filter a range of RSS and ATOM feeds selected by a user. The algorithm used for the filtering is similar to that used to identify spam in many email filters only in this case it was “trained” to identify items that are interesting and should be highlighted, not those that should be junked.
An important element of the project was investigating whether the filtering was effective enough to be helpful to users (specifically, in this case, researchers looking at journal tables of content for interesting newly-published papers) and disseminating information about the potential of this approach within the JISC community. We appreciate that the potential applicability of the technique is much wider, it applies to any area where a user might want to monitor alerts from a wide range of sources in the knowledge that many of the items in the feeds will be irrelevant. Anyone who has subscribed to dozens of seemingly relevant feeds only to find that they are presented with more items than they can scan is familiar with this problem.
The project’s “final post” describes what we discovered and delivered.
You will of course notice that the “final post” is not the last post in this blog. While the initial project has finished we still use sux0r and Bayesian Feed Filter in other work and so update this blog occasionally to reflect this.
We’re all based at ICBL at Heriot Watt University.
Here’s the full project proposal: