Category Archives: dissemination

How to Install the Bayesian Feed Filter

The Bayesian Feed Filter (BayesFF) is an optional interface for the popular sux0r software package. To be able to use the BayesFF interface you only need to follow the normal process for installing sux0r and make a few edits in the sux0r configuration file.

The BayesFF interface will allow you to use the API and the web interface developed by the BayesFF project. In general, installing sux0r is a simple process that takes less than 30 minutes to complete, depending of the type of PHP configuration found in your web server. You may want to ask your IT support team to install sux0r for you, if you are not familiar with installing and configuring PHP packages that would require access to the web server configuration files. However, if you wish to install sux0r yourself, the following detailed installation guide would help you.

A. Prerequisites

* Configuring PHP to enable mb, gd, and PDO libraries:
– mb is non-default extension and you need to explicitly enable it with the configure option. See http://www.php.net/manual/en/mbstring.installation.php webpage for details
– gd represents the GD library that you will need to install (available at http://www.libgd.org/) and enable with the configure PHP command. See http://www.php.net/manual/en/image.installation.php webpage for details
– PDO driver is enabled by default as of PHP 5.1.0, but you may need to enable it to work with MySQL. Please consult the documentation at http://www.php.net/manual/en/pdo.installation.php and http://www.php.net/manual/en/ref.pdo-mysql.php web pages to find out more about PDO installation.

* MySQL 5.0.x, set to support UTF characters
(Further information on http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html)

* Apache 2.x webserver with mod_rewrite module enabled
(a simple but good tutotial on enabling mod_rewrite can be found at http://www.tutorio.com/tutorial/enable-mod-rewrite-on-apache)

B. Installation

To install sux0r code on your web server:
1. Login to your server and go to the directory where you want to install sux0r
2. Execute the following Unix command:
svn export https://sux0r.svn.sourceforge.net/svnroot/sux0r/branches/icbl/
3. Execute these two commands:
chmod 777 ./data
chmod 777 ./temporary

To create the MySQL database and tables for sux0r:
4. Create a database named “sux0r” on your MySQL server
5. Import ./supplemental/sql/db-mysql.sql into MySQL

C. Configuartion

1. From the shell, execute these commands:
mv ./sample-config.php ./config.php
mv ./sample-.htaccess ./.htaccess

2. Edit ./config.php and ./.htaccess appropriately (follow the instructions included inside these files.) The changes you need to make are pretty obvious.

Edit Database Connection: $CONFIG[‘DSN’]
Edit URL for your intallation of sux0r: $CONFIG[‘URL’]
Edit Title: $CONFIG[‘TITLE’]
If you want to use the BayesFF interface, you will need to change the default value of the $CONFIG[‘PARTITION’] configuration parameter found in config.php,
from:
$CONFIG[‘PARTITION’] = ‘sux0r’;
to:
$CONFIG[‘PARTITION’] = ‘bayesff’;

3. To check your installation, run the ./supplemental/dependencies.php script from your browser. Example:
http://yourwebsite/sux0r210/supplemental/dependencies.php (If there are no errors OK will be returnes with a link to your new installation.

4. If the previous step didn’t produce any error, point your web browser to http://yourwebsite/sux0r210/supplemental/root.php’ and follow the onscreen instructions to make yourself a sux0r root user.

5. Setup a CRON job to fetch RSS feeds every x minutes (we recommend you to start by running the CRON every 60 minutes). The PHP script that fetches the feeds is already provided by sux0r and it is available at http://yourwebsite/sux0r210/modules/feeds/cron.php
For example:
0 * * * * /bin/nice /usr/bin/wget -q -O /dev/null “http://yourwebsite/sux0r210/modules/feeds/cron.php” > /dev/null 2>&1

6. Delete the ./supplemental directory from the webserver.

Sux0r should now be successfully installed on your website.

1 Comment

Filed under dissemination, technical

How To Use Bayesian Feed Filter

I have created 5 screen casts showing users how to use the Bayesian Feed Filter.

  1. How to Register an account on Bayesian Feed Filter http://screenr.com/WkA
    • Go to http://icbl.macs.hw.ac.uk/sux0r206/
    • Click on Register (Top Right of Screen)
    • Enter a Nickname, Email Address and Password
    • Verify Your Password
    • Add Any Additional Information
    • Enter the Anti-Spam Code
    • Click Submit
  2. How to Login and Subscribe to RSS Feeds http://screenr.com/ckA
    • Once you have registered an account click on Login (Top Right of Screen)
    • Enter your Nickname and Password
    • Click on Feeds (You will be presented with a list of all feeds on your first login)
    • Scroll to the bottom of the list and click on Manage Feeds
    • Select the Checkboxes of the feeds you would like to subscribe to
    • Click Submit
    • You can add a new feed by clicking on Suggest Feed (an administrator will need to approve the feed first)
    • You can browse the feeds by clicking on the titles of the feeds
  3. How to train Bayesian Feed Filter to Filter your RSS Feeds http://screenr.com/3kA
    • Once you have logged in to your account and subscribed to some feeds you can start training
    • Click on your nickname (Top right ofthe screen)
    • Click on Edit Bayesian
    • Enter the name of a vector (list of categories) and click add (in this case the vector is called Interestingness)
    • Enter the name of your first category and click add (in this case Interesting)
    • Enter the name of your secondcategory and click add (in this case Not Interesting)
    • Click On your nickname then on Feeds
    • You can start training items by clicking on the drop down menu of categories
    • If the item is already displaying the category you wish to train it in you will first need to select the other category then reselect the correct category
    • Items that have been trained will display the Vector as green text
  4. How to train Bayesian Feed Filter using other documnets http://screenr.com/vSK
    • Once you have logged in to your account and subscribed to some feeds you can start training
    • Click on your nickname (Top right of the screen)
    • Click on Edit Bayesian
    • Copy and paste text from other documents into the Train Document text area
    • Select the category and click train
    • You can also categorise other documents
    • Copy and paste text from other documents into the Categorize Document text area
    • Select the vector and click categorize.
    • The probability of the document belonging to each category in the vector will be displayed.
  5. How to view filtered RSS Items by threshold/keywords http://screenr.com/y1K
    • Click on Feeds
    • At the top of the screen select the category and set a threshold
    • Click on threshold
    • Only the items relevant to the selected categroy above the set threshold are displayed
    • To filter by keywords, type your keywords into the keywords text box
    • Click on threshold
    • Only the items containing those keywords will be displayed

1 Comment

Filed under dissemination

BayesFF in 45 seconds

I’m doing a 45 second presentation on the Bayes Feed Filter project at the JISC Rapid Innvation Development meeting in Manchester today. This is it:

The Bayesian Feed Filter will help researchers keep up to date with current developments in thier field. It will automatically filter RSS and ATOM feeds from Journals’ tables of content to (hopefully) select those that are relevent to an individual’s research interests.

It uses Bayesian statistical analysis, the same approach used in many spam filters. First you need to train it with samples of what you are and aren’t interested in; then it compares the frequency with which words occur in the text to predict whether new items are on a similar topic to the samples that you were interested in.

We are testing whether this approach works for researchers and Table of Content feeds and building an API, so would like to talk anyone who can use it to personalize their own data presentation.

3 Comments

Filed under dissemination

New features planned for sux0r

My last post described what sux0r already does, this one describes the features for the API that we plan to add.

The idea is to allow users of a remote application to classify feeds and to see the results, i.e. do what was described in that last post but without using the sux0r interface. The hope is that this will allow the use of the filter to be embedded in their own personal toolset, and more generally make the functionality of sux0r as a feed filter/classifier available to other services and applications.

To do this we think the API needs to provide access to the following sux0r functionality (the priority refers to our priority for implementing the feature):

1. Authorise account access for user application
A user gains access to their account through an application using API (using OAuth). High priority.

2. Add a New Feed
A user suggests a feed to be made available for adding to sux0r users’ accounts. High priority

3. Approve a Feed for a User
An feed administrator approves a feed added by a user so that it can be added to users’ accounts. High priority

4. Associate feed with a user
A user associates an approved feed with their account. High priority

5. Create a new Vector for a User
A user creates a new classification vector. Medium priority

6. Create a new Category for a User’s Vector
A user creates a new classification category on a specified vector. Medium priority.

7. Train a Document for a User
The user submits a document and the desired classification to train the classifier. High Priority.

Note: The document could be an RSS Item, which already exists in the database and hence will have an RSS ID number, or it could be plain text, which needs to be added to the database and then trained.

8. Return the RSS Items for a User
A user gets all Items from RSS Feeds to which a user is subscribed. Feeds may be sorted or filtered according specified criteria (e.g. only those in a certain category). Very high priority .

9. Return RSS Feeds for All Users
A user gets a list of all the feeds in the database. Medium priority.

10. Return RSS Feeds for a User
A user gets a list of all the feeds they are subscribed to. High priority

11. Remove feed
A user requests to remove a feed (association) from their account. Medium priority

12. Return vectors
A user gets a list of all the vectors she has created. Medium priority

13. Return categories
A user wants to view all the categories they have created for a vector. Medium priority

14. Export the Bayesian Token Analysis for a User
A user gets the information on frequency of occurrence of words in each vector-category.

1 Comment

Filed under dissemination, technical

About sux0r

This post describes what is pretty much the starting point for our development work: the Sux0r OS software; my next will describe what we plan to add.

I came across sux0r while investigating the feasibility for the project, before writing the bid: while I found several references to the idea of Bayesian filtering of RSS feeds, and a couple of projects that had made a start on software to implement the idea, sux0r was the only open source project that I found that was still active. But sux0r is not just a personal feed filter, in fact it is something of an all-round content management system with Bayesian classification and support for group collaboration. It comprises a blogging platform, bookmarking, image repository and RSS feed aggregator.

While that’s great for a content management system, it’s a lot more than we really want to deal with. We considered the option of stripping out the functionality that we didn’t want to use, leaving just RSS aggregator and filter, but that seemed like fairly radical surgery to be performing, especially at the start of a project before we really got familiar with what did what in the sux0r code. It also didn’t seem to be a good strategy for contributing back to the sux0r project. So we adopted a more superficial approach: we have a complete installation of sux0r but we have customised the interface so that our users don’t get to see that there is an image library, blogging platform or social bookmark facility.

Using sux0r for feed filtering involves the following steps.

Continue reading

3 Comments

Filed under dissemination, technical