Category Archives: management

BayesFF: Final post

Diagram of prototype: schematically we can show how the prototype supports the aggregation and of RSS feeds comprising table of contents information from selected journals and filters them (using pre-existing software called sux0r) into two feeds, one of which is has information about those papers that are predicted to be relevant to a user’s research interests. The project has added the ability to interact with sux0r through third-party software.

Our work has shown how effectively this works for a trial group of researchers; in most cases, after sufficient training of the system, the outgoing feeds were successfully filtered so that one contained a significantly higher concentration of interesting items than the raw feeds and the other did not contain a significant number of interesting items.

End User of Prototype:
We have an installation of sux0r which people are welcome to register on and which can be used to set up feeds for aggregation (you will not automatically be given sufficient privileged to approve feeds, so it is best to contact the project about this). The base URL for the API for this installation is and the API calls which have been implements are documented in the following posts on this blog: Return RSS items for a user and ReturnVectors and ReturnCategories. Also available: a summary of other features for the API have been scoped. The latest update was 08 December 2009.

Here’s a screen cast of Lisa using the API

(NB the version at the end of the link is a whole lot clearer than the embedded YouTube version, especially if you click on the view in HD option).

The code for our work on the API is in a branch of the main sux0r repository on sourceForge.

Project Team
Phil Barker,, Heriot-Watt University (project manager)
Santiago Chumbe,, Heriot-Watt University (developer)
Lisa J Rogers,, Heriot-Watt University (researcher)

Project Website:
PIMS entry:

Table of Content for Project Posts
Development work

User trialling

Community Engagement

Project Mangement


Filed under management


One of the “weaknesses” I put in the SWOT analysis was that we had a lot to learn. Fully understanding and implementing authentication and authorization for the API was one of the things that we had to learn. As of now, at the end of the funded work on the project, we seem to have failed in this.

Our first point of failure was in being pointlessly over ambitious in what we wanted to do via the API. When drawing up the initial feature set for the API I took the starting position that anything that you could do through the native sux0r interface should be doable remotely; so the feature set included register new user. This muddied the requirements for accessing the sux0r security procedures in a way that I can now see was quite unnecessary–it’s really not unreasonable to expect people to have an account with a service before the interact with it from another application.

Having clarified this it became clear that oAuth would be the authorization mechanism of choice, though we had no experience in implementing it. Santy got a client working with twitter and flickr based on Andy Smith’s library. He used
Google PHP OAuth library for the server on sux0r, but it didn’t work with either that client or Google’s own client. There is another library he would like to test for the server side, but had already spent more time than was available.

Struggling with oAuth meant less time to spend on actual features. In retrospect we should have implemented the features without authorization in the hope of adding some form of authorization later (which is indeed what Santy has done towards the end of the project), but it is always tempting to keep trying one more thing in the hope that the next try will succeed.

As a result we have fewer features implemented than we planned, and features that should require authorization don’t have it. We still hope to add some form of restriction on access, even HTTP digest authentication requiring sux0r user name and password to be entered into the third-party app is better than nothing.

Lessons learnt: 1) you don’t have to do everything through an API (god, that seems obvious when I write it); 2) get on with what you can do in parallel to trying to overcome road blocks; 3) analysing the problem and implementing the client did give us a better understanding of what oAuth should do.


Filed under management, technical

SWOT Analysis

Here are the Strengths, Weaknesses, Opportunities & Threats of the project, as estimated by Lisa and Phil during an informal project meeting over coffee. Following standard SWOT procedures (I used info and templates from and CIPD for guidance), Strengths and Weaknesses are internal and Opportunities and Threats are external. We think the “internals” of the project comprise the project team (our skills and connections to others) and the idea itself and the approach to realising it; the “externals” are the users, the sux0r project, the JISC environment and others (e.g. commercial interests, our host institution and the wider HE system).

(The points are numbered for ease of referencing, not for ordering.)


  1. We think we’re starting with a good idea, at least in principle; an innovative solution to a recognized need.
  2. Using sux0r as a starting point has given us access to existing OS code and put us in contact with a knowledgeable developer.
  3. We have a settled team who have worked well together on a number of previous projects over the last 4-10 years.
  4. We have good existing links with experts in JISC, CETIS, the IE, UKOLN, JISC services and projects (and we’re not afraid to use them).
  5. We have previous experience in related projects dealling with Journal ToC and other RSS feeds (e.g. PerX, TicTocs, GoldDust . . .).
  6. We work in close proximity to our intended test user group (which should help with encouraging engagement for the trials).


  1. We have lots of new stuff to learn: this is the most deliberately RESTful development we have undertaken; we’re using a project management technique that is new to us; this is first time we’ve worked on a branch of an existing OSS project; we need more robust user trials than we’ve previously managed.
  2. We have all that to learn in a short project time frame (six months, all the team are working part time on this project).
  3. Bayesian filtering is not a complete solution. Other techniques (e.g. popularity from usage data analysis; manual over-rides to specify that that everything from some authors is important, no matter what the topic) would help identify important items but are out of scope.
  4. Bayesian filtering might not work for our users with the type of data and sources we have (see threats), though as a good academic I think this is not so much a weakness as a potential research finding.


  1. Working with sux0r provides an opportunity to work with an existing user base and experienced developer.
  2. Other projects in the information environment provide additional/alternative usage scenarios (but see threat 2).
  3. It may be possible to embed the output of this project into other services, e.g. TicTocs, TechXtra, JISC IE or commercial services.
  4. There is good support for RESTful development approaches.
  5. There is a good developer community in the JISCRI projects.


  1. Lack of user engagement. We don’t know that users will be as enthusiastic about this approach as we are, they might just resent disruptive technologies.
  2. Expectation mismatch (see opportunity 2 & weakness 3), possibly leading to scope creep.
  3. There might be some unexpected conflict with the sux0r project (over approach or priorities).
  4. There might be a lack of table of content information from the right journals in RSS form, or what there is might be polluted (garbage in garbage out).
  5. Competing demands on time from other projects/tasks that the team are working on (see weakness 2).

I guess some mitigation of the negative factors is called for, that will come later, but a quick reflection is that engagement with the project externals is going to be important.

The programme guidance documentation suggests that the SWOT analysis is best to be undertaken in small steps, throughout the duration of the project; and the other guidance I read suggested that it should draw on as many view points as possible. So, hopefully this isn’t the last on SWOT, and please comment on anything that has been overlooked.

1 Comment

Filed under management

Project kicks off

The Bayesian Feed Filtering project will be trying to identify those articles that are of interest to specific researchers from a set of RSS feeds of Journal Tables of Content by applying the same approach that is used to filter out junk emails. We had the first project meeting this afternoon, though we’ve each done a little bit of work in the last week or two. We went over our plans for the two main work packages in some detail.

Continue reading


Filed under management, technical, trialling