Category Archives: trialling

Conducting a User Trial

One of the aims of the Bayesian Feed Filter Project was to test the ability of the recommender service to identify new journal papers of interest to researchers based on a knowledge of papers which they have recently read. The recommender service used was sux0r a blogging package, an RSS aggregator, a bookmark repository, and a photo publishing platform with a focus on Naive Bayesian categorization and probabilistic content .

As well as creating an API for sux0r, the project created a Bayesian Feed Filter theme which included simplifying the sux0r interface so that user saw only the RSS Aggregator with Bayesian Filtering. The Bayesian Feed Filter uses Bayes’ theorem to attempt to predict whether or not a new item in a feed is relevant to an individual’s research interests based on previous categorization of items by the user. This explicit categorization by the user is known as training; the system also allows for other text documents to be used as training material.

Twenty researchers from Engineering and Science based schools within Heriot-Watt University volunteered to participate in the trial to test the ability of the Bayesian Feed Filter to identify new journal papers of interest to them based on knowledge of papers which they have recently read. The volunteers were asked to provide a list of journals that they follow or would like to follow if they had the time. Each volunteer was set up with an account on Bayesian Feed Filter, which was preloaded with RSS Feeds of the journals they said they were interested in and contained two categories for training: Interesting and Not Interesting.

An API was developed during the project which included the feature Return RSS Items for a User, which was used to create personalised RSS feeds for each user. The feeds could be filtered by category (interesting or not interesting) and by threshold (likelihood to belong to a particular category).


Stage 1: Initial Questionnaire

The first stage of the trial involved a short questionnaire to gauge the researchers’ methods of current awareness and their expectations of a service filtering journal articles matching their interests. (Results of Initial Questionnaire).

Stage Two: Demonstration of the Bayesian Feed Filter
Volunteers were each given a demonstration of how to mark items as relevant to their interests or not relevant to their interests. These items typically include the title and abstract of the journal article. The users were also shown how to use the train document feature which would allow them to include text not in the RSS feeds such as the full text of articles they had written, cited or read. (How to use Bayesian Feed Filter)

Stage Three: Training the Bayesian Feed Filter
The volunteers had access to the Bayesian Feed Filter for 6 weeks and they were asked to train the system by categorizing items as either “interesting” or “not interesting” periodically and to supplement the interesting items with other documents relevant to their interests. (User Activity).

At the end of the six week training period access to the Bayesian Feed Filter was suspended and all articles in the system were removed. The system would continue to run for 4 weeks, automatically catagorizing new articles as being “interesting” or “not interesting” to the researchers based upon the training provided. Unfortunately, two of our volunteers were not able to continue with the trial, therefore the trial continued with 18 volunteers.

Stage Four: Returning the Filtered Feeds
The users were presented with the two feeds. One feed comprised articles rated by the feed filter with at least a 50% chance of being of interest to them and the other feed articles rated with at least a 50% chance of not being of interest to them. The feeds were presented to the user using Thunderbird (an email and RSS client). Users were then asked to mark each article from both feeds with a star if they found it to be of interest. Thus the feeds represent the Bayesian Feed Filters categorization of items into “interesting” and “not interesting” and the stars show the users opinion of whether the items are relevant to their research interests of not.

The number of false positives (items in the interesting feed not starred) and number of false negatives (items in the not interesting feed starred) could then be calculated for each user. A successful scenario would be for the interesting feed to contain a significantly higher proportion of interesting articles than an unfiltered feed with few items of interest wrongly filtered into the “not interesting” feed. The success of the filtering seems to be dependant on the training provided, with users who trained over 150 items seeming to get a reasonable measure of success. (Statistics from the User Trials).

Stage 5: Follow Up Questionnaire
The final stage of the of the trial was a follow up questionnaire, in order to gauge the user’s satisfaction with the filtering process and whether they would be interested in using a similar system in the future and what the advantages of doing so would be. (Results of the Follow Up Satisfaction Survey).

Comments Off on Conducting a User Trial

Filed under trialling

User Trials Follow Up Satisfaction Survey

The user trials consisted of 5 main stages.

  • An initial meeting to demonstrate the system.
  • An initial questionnaire to gather expectations
  • Training: Users spent between 4-6 weeks training the system
  • A follow up meeting to indicate how successfully their interests had been matched; and;
  • A follow up questionnaire to gauge the users’ satisfaction.

The results of the follow up survey are discussed in more detail below.
Question 1.

Were enough “Not Interesting” articles filtered out of the “Interesting” feed to make reading this feed manageable?

Though the percentages of interesting items delivered to each user were in general lower than the users had indicated would be acceptable in the initial questionnaire. The users seemed to be happy with this result and in most cases the percentage of “not interesting” in the “interesting” feed was greatly reduced.

13 users answered yes, 4 answered no and 1 was not sure.

Question 2.

If the “Not Interesting” feed wrongly contained “interesting” articles, was the percentage small enough to tolerate?

The majority of the users were able to tolerate some “interesting” articles being filtered out into the “not interesting” feed.

15 users answered yes; 3 answered no.

Question 3.

Would you consider using a similar tool in the future?

The majority of users indicated that they would consider using a similar tool in the future. This gives us a certain confidence that the concept of applying Bayesian filtering to journal articles is worth investigating further.

15 users answered yes; 2 answered maybe; 1 answered no.

Question 3 cont…

If yes, which of the following would you consider?

[a] A stand alone tool?
[b] A tool integrated into an existing tool you use everyday i.e. in an email client/feed reader/iGoogle?
[c] Integrated into a library or research tool such as web of science?

Users were able to enter more than one choice.

There were 6 votes for [a]; 13 votes for [b]; 12 votes for [c]

Users were then asked which of the above would be their preferred option?

3 voted for [a]; 6 voted for [b]; 7 voted for [c]; 1 user thought daily/weekly email alerts would be a better option.

The strong preference for integration into other tools (options b and c) rather than use as a stand alone tool is interesting as it validates our supposition that an API would be useful, i.e. that it would be desirable to be able to integrate interact with sux0r into other tools.

Question 4.
If you would consider using a similar tool in the future, what do you think the advantages of doing so would be?

The main advantages offered by the users included time saving by filtering out unwanted articles, the ability to scan more journals and a single place to scan the latest articles form interesting journals. Only one user considered a similar tool not to have any advantages.

A selection of responses follow below:

If trained sufficiently the tool would save time in showing the searches from interesting results, with keywords on saved interests.

To flag up interesting articles without the user having to actively search for them i.e. it would help with horizon scanning.

Make e-journals more helpful when filtering interesting articles and not interesting ones.

1. One advantage would be a single place to find interesting reserach articles. 2. If the feed is trained well, then less time is spent on uninteresting articles. 3. If it is integrated into broader serach tools like iGoogle it would have wider reach.

As it highlights interesting/prospectively interesting journals that you may not be able to find easily using databases search such as science direct.

Quicker sorting of interesting and not interesting articles

Keeping up to date with new articles. But disadvantage is the guilt of seeing all the interesting things you should read but don’t have time to.

Saving time. However I am not sure I would be completely confident in the results I would get.

Screening for new articles would become more organised rather than my random search at the moment which only happens when I need to find information.

Tend to search on the basis of keywords; this appears to work better.

It does appear to throw up interesting articels that I might otherwise miss.

Time saving and effective worktime

Obviously it will save a lot of time

Simultaneous filtering of many journals

Make looking for papers more fun because much of the clutter is removed compared to reading journal indexes. And I find more interesting articles compared to googling or searching by keyword.

a) save time, reduce number of articles. b) We can create research group feed of interest

Even with uninteresting articles in the mix it still allowed me to find dozens of articles that would have passed me by otherwise. I felt it was worth the effort & still a lot less effort than reading all the tables of contents would have been. A key advantage for me was that it effectively allowed me to, in a similar length of time, scan the contents of a far greater number of journals than I would have studied by hand. A worthwhile tool if you can be bothered to train it.

Get an overview of recently published articles with at least some relevance to me, which at the moment I’m not getting.

2 Comments

Filed under trialling

Statistics of user trial results

We now have results from our user trials showing how effective sux0r may be in filtering items from journal table of contents RSS feeds that are relevant to a user’s research interests.

Quick reminder of how we ran the trials: 20 users had access to sux0r for 6 weeks to train the analyser in what they found interesting and not interesting. We then barred access for 4 weeks but continued to aggregate feeds and filter them based on that training. Then we invited the users to look at the results of the filtering: two feeds from sux0r; one aggregating information about journal articles that had been published while the users were barred that sux0r predicted the user would find relevant; the other feed had information about the rest of the articles, the ones that sux0r predicted the user wouldn’t find relevant. We had our users look through both feeds and tell us whether the articles really were relevant to their research interests. We lost two triallists and so have data on 18, you can see this data as a web page (or get the spreadsheet if you prefer).

The initial data needs a little explanation: The first columns (in yellow) relate to the number of items used in the initial six weeks to train the Bayesian analyser in what what was relevant to the users research interests, what wasn’t, and the total number of items used in training. The “Additional docs” column relates to information added that didn’t come from the RSS feeds: was asked users to provide some documents that were relevant to their research interested for training in order to make up for the fact that in a fairly short trial period the number of items published that were relevant may be low.

The next set of columns (in green) relate to the feed of items aggregated after the training (while the users had no access) that were predicted to match the user research interests, showing the number of items of interest in that feed, the total number of items in that feed and the proportion of items in the feed that were interesting. The next three columns (in red) do exactly the same for the feed of items that were predicted not to be relevant.

For a quick overview of the results, here’s a chart of the fraction of interesting items in both feeds:

You need to be careful interpreting this chart. It hides some things, for example, the data point showing that the fraction of interesting items in one of the feeds was 1 (i.e. the feed of interesting items did indeed only have interesting items in it) hides the fact that this feed only had 2 items in it; the user found 9 items overall to be relevant to their research interest, 7 of them were in the wrong feed. Perhaps that’s not so good.

So, did it work? Well, one way of rephrasing that question is to ask whether the feed that was supposed to be relevant (the “interesting feed”) did indeed contain more items relevant to the users research interests than than would otherwise have been the case. That is, is the proportion of interesting items in the interesting feed higher than the proportion of interesting items in the two feeds combined. The answer in all but one case is yes; typically by a factor of between two and three. (The exception is a feed which achieved similar success in getting it wrong. We don’t know what happened here.)

Also we can look at the false negatives, i.e. the number of items that really were of relevance to the user’s interests that were in the feed that was predicted not to be interesting. The chart above shows quite nicely that after using about 150 items for training this was very low.

What about some statistics? It’s worth checking whether the increase in concentration of items related to a user’s research interest as a result of filtering is statistically significant. We used a two sample Z test to compare the difference in the proportion of interesting items in the two feeds to the magnitude of difference that could be expected to happen as the result of chance:
.

I have some reservations about this because of the small number of “interesting” items found in the feed which should be uninteresting when the filtering works–this means that one of the assumptions of the Z-test might not be valid when the filtering is working best–but any value of Z above 3 cannot be reasonably expected to have happened by chance.

Conclusion: for users who used more than about 150 items in training the filtering produces a statistically significant improvement in the number of items in the feed that were relevant to the user’s research interests without filtering out a large number of items that would have been of interest. Next post: were the users happy with these results?

2 Comments

Filed under trialling

User activity

One indirect measure we have of the level of engagement from our trial users is how often they signed into the system looked at their feeds and did some training. Some analysis of the sux0r logs gives the following chart of activity with date (each colour represents a different user):

There was obviously a lot variation between users in how much they used the system (more on that very soon) but what I like from this graph is that for several users (about a third of them) it shows continual spontaneous use throughout the trial period, not just at the points when we were pushing them.

2 Comments

Filed under trialling

Preliminary findings of user trials

We’re now coming to the end of the user-trials, here are some preliminary conclusions which mostly relate to the start of the trails when we gave our users a questionnaire to try to check our assumptions of what would help and their expectations of what we might do.

Our users come from the Science and Engineering schools at Heriot-Watt University, they’re computer scientists, engineers, physicists, chemists, bioscientists and mathematicians. Just over half are PhD students, most of the others are post-docs though there are two lecturers and a professor.

This still seems like a good idea.
That is to say, potential users seem to think it will help them. We wanted 20 volunteer users for the trial and we didn’t find it difficult to get them; in fact we got 21. Nor was it too difficult to get them to use Sux0r; only one failed to use it in to the extent we required. Of course there was a bit of chivvying involved, and we’re giving them an amazon voucher as a thank-you when they complete the trial, which has probably helped, but compared to other similar evaluations it hasn’t been difficult to get potential users engaged with what we’re trying to do.

Our assumptions about how researchers keep up to date is valid for a section of potential users.
We assumed that researchers would try to keep up to date with what was happening in their field my monitoring what was in the latest issues of a defined selection of relevant journals. That is true of most of them to some extent. So for example 11 said that they received email alerts to stay up to date with journal papers. On the other hand the number of journals monitored was typically quite small (5 people looked at none; 8 at 1-4; 6 at 5-10; and 2 at 11-25). This matched what we heard from some volunteers that monitoring current journals wasn’t particularly important to them compared to fairly tightly focused library searches when starting a new project and hearing about papers through social means (by which I mean through colleagues, at conferences and through citations). Our impression is that it was the newer researchers, the PhD students, who made more use of journal tables of content. This would need checking, but perhaps it could be because they work on a fairly specific topic for a number of years and are less well connected to the social research network whereas a more mature researcher will have accreted a number of research interests and will know and communicate with others in the same field.

Feeds alone won’t do it.
Of our 21 mostly young science and technology researchers, 9 know they use RSS feeds (mostly through a personal homepage such as Netvibes), 5 don’t use them but know what they are, 7 have never heard of them; 2 use RSS feeds to keep up to date with journals (the same number as use print copies of journals and photocopies of journal ToCs), compared with 11 who use email alerts.

If you consider this alongside the use of other means of finding new research papers I think the conclusion is that we need to embed the filtered results into some other information discovery service rather than just provide an RSS feed from sux0r. Just as well we’re producing an API.

We have defined “works” for filtering
We found that currently fewer than 25% of articles in a table of contents are of interest to the individual researchers, and they have an expectation that this will rise to 50% or higher (7 want 50%, 7 want 75% and one wants everything to be of interest) in the filtered feed. On the other hand false negatives, that is the interesting articles that wrongly get filtered out, need to be lower than 5-10%.

Those are challenging targets. We’ll be checking the the results against them in the second part of the user tests (which are happening as I’ve been writing this), but we’ll also check whether what we do achieve is perceived as good enough.

Just for the ultra-curious among you, here’s the aggregate data from the questionnaire for this part of the trials

Total Started Survey: 21

Total Completed Survey: 21 (100%)

No participant skipped any questions

1. What methods do you use to stay up to date with journal papers?
Email Alerts 52.4% 11
Print copy of Journals 14.3% 3
Photocopy of Table of Contents 9.5% 2
RSS Feeds 9.5% 2
Use Current Awareness service (i.e. ticTOCs) 4.8% 1
None   0.0% 0
Other (please specify) 61.9% 13
2. How do you find out when an interesting paper has been published?
Find in a table of contents 14.3% 3
Alerted by a colleague 38.1% 8
Read about it in a blog 9.5% 2
Find by searching latest articles 76.2% 16
Other (please specify) 47.6% 10
3. How many journals do you regularly follow?
None 23.8% 5
1-4 38.1% 8
5-10 28.6% 6
11-25 9.5% 2
26+   0.0% 0
4. Do you subscribe to any RSS Feeds.
Yes, using a feed reader (i.e. bloglines, google reader) 9.5% 2
Yes, using a personal homepage (i.e. iGoogle, Netvibes, pageflakes) 23.8% 5
Yes, using a desktop client (thunderbird, outlook) 4.8% 1
Yes, using my mobile phone 4.8% 1
No, but I know what RSS Feeds are 23.8% 5
No, never heard of them 33.3% 7
Other (please specify)   0.0% 0
5. When scanning a table of contents for a journal you follow, on average, what percentage of articles are of interest to you?;
100%   0.0% 0
Over 75%   0.0% 0
Over 50% 4.8% 1
Over 25% 19.0% 4
Less than 25% 71.4% 15
I don’t scan tables of contents 4.8% 1
6. The Bayesian Feed Filter project is investigating a tool which will filter out articles from the latest tables of contents for journals that are not of interest to you.
What would be an acceptable percentage of interesting articles for such a tool?
I would expect all articles to be of interest 4.8% 1
I would expect at least 75% of articles to be of interest 33.3% 7
I would expect at least 50% of articles to be of interest 33.3% 7
I would expect at least 25% of articles to be of interest 19.0% 4
I would only occasional expect an article to be of interest 9.5% 2
7. What percentage of false negatives (i.e. wrongly filtering out interesting articles) would be acceptable for such a tool?
0% (No articles wrongly filtered out) 14.3% 3
<5% 23.8% 5
<10% 38.1% 8
<20% 4.8% 1
<30% 4.8% 1
<50%   0.0% 0
False negatives are not a problem 14.3% 3
8. What sources of research literature do you follow?
Journal Articles 95.2% 20
Conference proceedings 71.4% 15
Pre-prints 14.3% 3
Industry News 33.3% 7
Articles in Institutional or Subject Repositories 19.0% 4
Theses or Dissertation 57.1% 12
Blogs 33.3% 7
Other (please specify) 19.0% 4

4 Comments

Filed under trialling

User Trialling

We have recruited around 20 volunteers (researchers, academics and PhD students) to test the following use case for the Bayesian Feed Filtering project.

* Research staff who want to monitor research findings and opportunities from a wide range of sources but who are only interested in a specific research field.

The first stage of the trial involves a short questionnaire about the researchers methods of current awareness of journal articles and the expectations required of such a filtering service. The users have submitted a list of Journals that have RSS Feeds (from ticTOCs) to be added to the database. We are using a customised version Sux0r which will be performing the Bayesian Filtering. For the trial we have created accounts for each user, submitted RSS feeds for the journals they follow and set up a vector and categories against which articles can be placed. For this particular use case we created a Vector called “Interestingness” with two categories “Interesting” or “Not Interesting”. The volunteers were demonstrated how to train articles from the RSS feeds into the two categories and also how to “top-up” the train by submitting other interesting articles which are not available as RSS.

I hope to have conducted all the initial interviews with the volunteers by the 4th of September, allowing users 3-4 weeks of training. A second interview will be conducted at the end of October, to determine whether the Bayesian Filter is successful in correctly categorising new articles collected that month.

1 Comment

Filed under trialling

Trialling of Bayesian Feed Filter

Lisa and Phil had a coffee meeting this morning to plan work package 2 of the Bayesian Feed Filter project. The aim of this work package is to

“Test the ability of the recommender service to identify new journal papers of interest to researchers based on a knowledge of papers which they have recently read”.

We shall recruit around 20 volunteers, who have an interest in research, based locally here at Heriot-Watt. The trialling process will involve the researchers selecting a list of journals they are interested in, and marking some articles as either “Interesting” or “Not Interesting” using sux0r (a tool for Bayesian Filtering of RSS feeds). The users will be asked to continue marking items over a period of a month. If the users cannot find many articles that represent their interests they will be asked to top up the interesting articles by submitting the abstracts of other articles they have written, cited or are of particular interest to them. After the initial training period of one month, users will not access their accounts for a further month, allowing a collections of articles to build up and the filter to try to determine if these articles are of interest to that user or not. A follow up session will then take place allowing the users to confirm how accurate the filter has been.

The deadline for completion of this project is 30th November 2009, giving a tight schedule for the trialling process.

By Tuesday 11th August we need to formalise the evaluation process and prepare a questionnaire for users (i.e. user’s expectations, current practice in monitoring journal tables of contents etc.)

From week commencing the 10th of August volunteers will be recruited.

In the following two weeks accounts for each user will need to be created and journals added for each user.

By the 28th of August an initial meeting will be held with users, (either in small groups or individually) to explain what we want them to do and to make sure they have filled in the questionnaire. Users will be asked to train the system between the first meeting and the cut off date.

On the 25th of September will be the date users should stop training articles.

The final week in October is the proposed date for holding follow up meetings with the users. They will be asked in this meeting to indicate how accurate the system was at determining the interestingness of articles. The number of false positives and false negatives shall be recorded. A follow up questionnaire asking how the system matched their expectations shall also be given at this stage.

This schedule gives the project the whole of November to analyse and write up the findings of the trials.

2 Comments

Filed under trialling