convert rss to html

Most blogging engines at this moment have support for generating rss content. It is often useful to be able to include this rss streal inside a html page. For example, this is what I did for the main PROGS site.
One of the ways to do this, is by including the original rss stream and then using css to visualize this using an appropriate style.
To start, you need to include the rss feed somewhere in your document. This can be done client-side using XMLHttpRequest. A very simple way to achieve this would look like this.


<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<link href="rss.css" rel="stylesheet" type="text/css" />
<script>
function clientSideInclude(id, url) {
  var req = false;
  // For Safari, Firefox, and other non-MS browsers
  if (window.XMLHttpRequest) {
    try {
      req = new XMLHttpRequest();
    } catch (e) {
      req = false;
    }
  } else if (window.ActiveXObject) {
    // For Internet Explorer on Windows
    try {
      req = new ActiveXObject("Msxml2.XMLHTTP");
    } catch (e) {
      try {
        req = new ActiveXObject("Microsoft.XMLHTTP");
      } catch (e) {
        req = false;
      }
    }
  }
 var element = document.getElementById(id);
 if (!element) {
  alert("Bad id " + id + "passed to clientSideInclude.");
  return;
 }
  if (req) {
    // Synchronous request, wait till we have it all
    req.open('GET', url, false);
    req.send(null);
    var res = req.responseText;
    element.innerHTML = res;
  } else {
    element.innerHTML = "Sorry, your browser does not support XMLHTTPRequest objects.";
  }
}
</script>
</head>

<body onLoad="clientSideInclude('rssfeed', 'http://blog.progs.be/?feed=rss');">
<div id="rssfeed"></div>
</body>
</html>

You have the limitation that you can only access rss feeds which come from the same domain. Otherwise you would be allowed access to pages you don’t own which would be a security risk. This in itself can be solved by having a dummy html page which just renders the rss as html, and including that in your other page(s) using iframe. I have no solutions if you cannot even put the rss to html page on the same domain.
You now have the rss feed as part of your html page, but unfortunately it does not quite render as you would expect. The rss has the following format


<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/2.0.5" -->
<rss version="0.92">
<channel>
  <title>trying to solve IT problems</title>
  <link>http://blog.progs.be</link>
  <description>How I tried to fix certain programming problems.</description>
  <lastBuildDate>Wed, 26 Sep 2007 06:36:27 +0000</lastBuildDate>
  <docs>http://backend.userland.com/rss092</docs>

  <language>en</language>
  
    <title>fedora object repository</title>
    <description><![CDATA[I had a meeting with some people [...]]]></description>
    <link>http://blog.progs.be/?p=32</link>
      </item>
  <item>
    <title>install firebird 2.0.2 on Ubuntu dapper amd64</title>
    <description><![CDATA[I have been a big fan of the Firebird database [...]]]></description>
    <link>http://blog.progs.be/?p=31</link>
      </item>
</channel>
</rss>

This includes some tags which are plain html, but cause problems here because they do not need a closing tag when part of a html page. So this needs to be fixed. Fortunately, the rss feed is pulled into the page as a string, so we can easily replace the tag names with something which does not cause a clash. Similarly, it seems the CDATA sections in the item descriptions also cause problems in some browsers, so these need to be removed too.


    var res = req.responseText;
    res = res.replace(/link>/g,"rsslink>");
    res = res.replace(/title>/g,"rsstitle>");
    res = res.replace(/lastBuildDate>/g,"lbd>");
    res = res.replace(/<\!\[CDATA\[/g,"");
    res = res.replace(/]]>/g,"");
    element.innerHTML = res;

We are now getting quite close to having a rss feed which looks good as a html page. Unfortunately, the links don’t work just yet. For this, some more javascript magic will come to the rescue. A function which iterates all the items to introduce the correct html links is used for that.


function fixLinks() {
var allItems = document.getElementsByTagName("item");
for (var i=0;i<allItems.length;i++)
{
    var itemElm = allItems[i];
    var titleElm = itemElm.getElementsByTagName("rsstitle").item(0);
    var titleText = titleElm.firstChild.nodeValue;
    var linkElm = itemElm.getElementsByTagName("rsslink").item(0);
    var linkURL = linkElm.firstChild.nodeValue;

    var newLinkElm = document.createElement("a");
    var txtNode = document.createTextNode(titleText);
    newLinkElm.setAttribute("href",linkURL);
    newLinkElm.setAttribute("target","_top");
    newLinkElm.style.display = "block";
    newLinkElm.appendChild(txtNode);
    itemElm.replaceChild(newLinkElm,titleElm);
}}

An example for a fully working page can be seen here.

Some more background info (and inspiration I used to reach this result, can be found here).

81 Comments

  1. Jim says:

    I used your example and created my own html page. This was great and easy to follow. the problem I am having is that each post is displaying the entire article, not just the description. Is there a way to control what is being displayed?

    Thanks for the help!

  2. joachim says:

    Jim,

    I assume your RSS feed contains full articles instead of just the start of the article.
    The easiest solution would probably be to hide the full test using CSS.
    Alternatively, you could remove all the “description” tags (with content) from the DOM tree.

    Hope this helps you.

  3. Jim says:

    joachim,

    I think I am confused on the DOM tree, is this the DOM tree?

    var itemElm = allItems[i];
    var titleElm = itemElm.getElementsByTagName(“rsstitle”).item(0);
    var titleText = titleElm.firstChild.nodeValue;
    var linkElm = itemElm.getElementsByTagName(“rsslink”).item(0);
    var linkURL = linkElm.firstChild.nodeValue;

    var newLinkElm = document.createElement(“a”);
    var txtNode = document.createTextNode(titleText);
    newLinkElm.setAttribute(“href”,linkURL);
    newLinkElm.setAttribute(“target”,”_top”);
    newLinkElm.style.display = “block”;
    newLinkElm.appendChild(txtNode);
    itemElm.replaceChild(newLinkElm,titleElm);

    I don’t see where the “description” tag is. this method is not using an external source to convert the rss to html is it? I am learning how to do this.

    Thanks for your help!

  4. joachim says:

    Jim,

    The code is not doing anything with the desciption tag.
    By far the easiest solution is using CSS. Adding something like :

    description {
    display: none;
    }

    To remove it, something like this should work (note that I haven’t tested this code):

    var allItems = document.getElementsByTagName(“description”);
    for (var i=0;i

  5. Jim says:

    joachim,

    Thanks and I am new with css, I tried it in my file that I created and it didn’t work, so I used the file from the sample page. But the entire article is still appearing.

    Can you see what I am doing wrong. Thanks again for your help

    rss
    {
    display:block;
    margin:10px;
    }

    channel
    {
    display:block;
    overflow:auto;
    background-color:#eee;
    font: 12px verdana;
    }

    item
    {
    display: block;
    padding:10px;
    margin-bottom:10px;
    border-top:1px solid #ccc;
    border-bottom:1px solid #ccc;
    background-color:#fff;
    }

    channel>rsstitle, channel>description
    {
    display: block;
    margin-left:10px;
    margin-top:10px;
    background-color:#eee;
    font-weight:bold;
    }

    channel>rsstitle
    {
    font-size:16px;
    }

    channel>description
    {
    font-size:10px;
    margin-bottom:10px;
    }

    item>a
    {
    font-weight:bold;
    }

    language, docs, rsslink, lbd
    {
    display: none;
    }

    description
    {
    display: none;
    }

  6. Jim says:

    I can’t get the content of the article not appear, not sure what is wrong.

    Can you help!

  7. joachim says:

    I tested the CSS and it was working. Are you sure your CSS is applied at all? I would recommend using firebird and the web developers plugin. This can be used to investigate what the browser gets and make some changes.

  8. Jim says:

    The CSS is applied, but the content is still showing. Could I email you the link to the page and you can see it.

  9. Jim says:

    I think I made a mistake. the CSS is working the description is not showing. My fault, I don’t want the content to show. If they what to read the content they can click on the link to read.

    How can I do that?

    Sorry about that

Leave a Reply

Your email address will not be published. Required fields are marked *

question razz sad evil exclaim smile redface biggrin surprised eek confused cool lol mad twisted rolleyes wink idea arrow neutral cry mrgreen

*