BBC Full Feeds

About


I knocked this together to enable me to read the full news articles from the BBC News website on my PDA whilst commuting without having to subscribe to a data plan or using AvantGo. Essentially the script reads the chosen RSS feed and then scrapes the BBC website to get the rest of the article. To allow for devices which may not support them there is also the option to remove images from the feeds.

Quick Disclaimer: This site has no affliation with nor is it endorsed by the BBC, this service is provided as is, content provided from these RSS feeds is copyright BBC as stated in the enclosing rss files.

Get feeds…


…via Feedburner (prefered route)

Feedburner provides a slightly more reliable source to get the feeds from and I’ve added most of the feeds here:

…direct from sourceBBC News logo

If for any reason you wish to get feeds directly from my site your welcome to, there are some feeds which for one reason or another I hav’nt added to Feedburner, you can see these from the below link:

http://dev.barnesdmd.co.uk/ff/ and select whichever feed you like.

History


25/03/08
  • Thanks to Webcron.org, all feeds should all now update themselves automatically at least once an hour. In due time I may consider running a cron job from my home server to do this but in the interim this should improve feed access times.

03/03/08

  • Script has now been totally re-written for various reasons, its now a bit more flexible for feeds from other sources as well by being template and class driven as opposed to the previous large mess!
  • Most feeds are now available through feedburner, API integration to make this a bit more consistant and wide ranging might happen if I have time and think its a worthwhile learning excercise.
  • Feeds from other sites may start appearing soon now that the templating makes it possible.

Update 24/04/07

  • Removed various BBC CMS tags which were present and adding unneeded baggage to the feeds. The below are now all removed, some of them do prove useful in identifying page structures as the scraper runs through the page!

    “S IIMA”, “S IBYL”,”E IBYL”,”S ITAB”, “E ITAB”, “S IROW”, “S IANC”, “E IANC”, “E IROW”, “S ICOL”, “E ICOL”, “S IBOX”, “S ILIN”, “E ILIN”, “E IBOX”, “E IIMA”, “S SF”, “E SF”

    Not really a proper solution at the moment as it creates paragraphs too often, however I have added in a couple of lines of code to close the paragraph tags which are left open by the BBC CMS. Unfortunatly as these occur after almost every sentance it breaks up the text too much. While this works well on the fixed width BBC website, it tends to look weird when displayed in an RSS reader.

  • Have also done a code clean-up and corrected a minor caching fault where empty articles were created, never made a difference to the output just slowed down execution.

  • I’m slowly moving more of the feeds over to feedburner, please use this, believe it or not the traffic I get to this is quite high (listen up BBC!).

Update 16/02/07

I’ve fixed the fault where relative URL’s within the pages were not being identified and changed to direct URL’s correctly. This appears to of occured beacuse of a change in the BBC template, however the change I’ve made uses a different search method which should prevent this happening again. At the same time I’ve also added basic titles to these links, very few of the link elements within articles on the news.bbc.co.uk have a title attribute.

Update 22/01/07

I do have unlimited bandwidth on my hosting account with Streamline, however as this script could be construed as not being quite within the remit of my hosting agreement (questionable copyright etc) I’ve taken the preventative measure of starting to run the feeds via Feedburner. This also gives me improved stats over and above what my own stats script gives me for the feeds.

Oh and on another note I’d love to know who’s doing what with the feeds, I note via my own stats that along with the large number of feed readers there also various machines running custom libwww-perl scripts, if your using the output for anything interesting I’d love to know thats all!

I hope to add support for other partial BBC feeds soon such as the blogs, I’ll blog about it when I do.


A couple of people have also written about this, links to Ian and Simons blog entries are below.