The single most important feature that we are adding with this project is the ability to publish feeds from sux0r corresponding to specified criteria, for example a feed aggregated from all the feeds that a user is subscribed to that have been classified under the same heading by the Bayesian algorithm. (Here’s the full specification if you’re interested). We have now completed work on this.
The API call is an HTTP GET on [sux0rURL]/api/items/ (where [sux0rURL] is the URL for your sux0r installation, for this project that is http://icbl.macs.hw.ac.uk/sux0rAPI/icbl/ ). The parameters you can use are:
user to specify the user name;
vec_id to specify the vector id;
cat_id to specify the category id;
feed_id to specify the id or URL of the feed;
keywords to specify any keywords for filtering the result feed;
threshold to specify the threshold value for the probable relevance against the category;
maxHits to specify a maximum number of hits to return.
Sorting wasn’t implemented, the default sort order is on date. Also we didn’t get authentication working (but we dithered about whether it was necessary for this feature anyway, and life is easier if you can just get a feed into any feed reader).
Gives the most recent 20 items from all the feeds to which user philb (that’s me!) subscribes. (I should note that not many of the feeds I subscribe to are Journal ToCs, so I’m not really using this for the type of feed for which it was intended. Nevertheless I find it kind of works.)
Gives the most recent 20 items containing the word jisc from all the feeds to which I subscribe. Try changing jisc to jisc cetis or “jisc cetis” or “jisc AND cetis”.
This is more interesting, vector 12 is my vector for classifying relevance to my research interests and category 24 is the stuff that is relevant. So this a feed of the stuff that is predicted to be relevant to my research interests (since the probability threshold is set to 0.5).
The results feed for that last call looks like this:
<?xml version="1.0"?> <rss version="2.0" xmlns:api="http://icbl.macs.hw.ac.uk/sux0rAPI/api/xmlns" xmlns:atom="http://www.w3.org/2005/Atom"> <channel> <title>Philb's RSS ItemsVector ID: 12, Category ID: 24, Threshold: 0.5, maxHits: 30</title> <link>http://icbl.macs.hw.ac.uk/sux0r206/user/profile/philb</link> <description>Use Case: Return the RSS Items for a User. User Nickname: philb. Summary of applied filters: Threshold: 0.5; maxHits: 30 results</description> <atom:link href="http://icbl.macs.hw.ac.uk/sux0rAPI/icbl/api/items/?user=philb&vec_id=12&cat_id=24&threshold=0.5&maxHits=30" rel="self" type="application/rss+xml" /> <item> <title>An infrastructure service anti-pattern</title> <link>http://blog.paulwalk.net/2009/12/07/an-infrastructure-service-anti-pattern</link> <guid>http://blog.paulwalk.net/2009/12/07/an-infrastructure-service-anti-pattern</guid> <description>Last week I outlined an idea, that of the service anti-pattern, as part of a presentation I gave last week to the Resource Discovery Taskforce (organised by JISC in partnership with RLUK). The idea seemed to really catch the interest of and resonate with several of those members of the taskforce who were present at [...]</description> <pubDate>Mon, 07 Dec 2009 10:37:05 EST</pubDate> <source url="http://blog.paulwalk.net/feed">paul walk's weblog</source> <api:relevance>1</api:relevance> </item> <item> <title>Statistics of user trial results</title> <link>https://bayesianfeedfilter.wordpress.com/2009/12/07/statistics-of-user-trial-results</link> <guid>https://bayesianfeedfilter.wordpress.com/2009/12/07/statistics-of-user-trial-results</guid> <description>We now have results from our user trials showing how effective sux0r may be in filtering items from journal table of contents RSS feeds that are relevant to a user’s research interests. Quick reminder of how we ran the trials: 20 users had access to sux0r for 4 weeks to train the analyser in what [...]</description> <pubDate>Mon, 07 Dec 2009 07:41:18 EST</pubDate> <source url="https://bayesianfeedfilter.wordpress.com/feed">Bayesian Feed Filter</source> <api:relevance>1</api:relevance> </item> <!--lots more items--> </channel> </rss>
Apart from an additional element for the relevance of the item to the specified category, it’s plain RSS 2.0.
Unfortunately we couldn’t implement the error codes properly on our server, you get an HTTP status code of 200-OK whether or not it is. Also, I think there are some error conditions that we don’t trap satisfactorily, for example specifying a non-existent user or category.