2006-05-22

Feed2rss filter in Python

Hello,

As I am trying to get used to snownews I ran into the need to convert atoms(?) from Blogger.com in rss(?). Why? because snownews only supports RSS. I was telling to myself "YZAH, I'm gonna try out Ruby" but after 2 minutes I (re)turned to Python :)

Feedparser is excellent and is available for Python as for Ruby. It can handle any types of feeds and you will be able to output the information as you want.

The script I wrote can be used by two ways.
  • You can choose between giving it a URL:
    feed2rss.py http://mmassonnet.blogspot.com/atom.xml
  • or passing it the content of a feed through a pipe:
    curl -s http://mmassonnet.blogspot.com/atom.xml|feed2rss.py

#!/usr/bin/env python
#-*- coding: utf-8 -*-
# Parses an atom file and generates an RSS1.0
# Author: Mike Massonnet (mmassonnet) <mmassonnet at gmail dot com>
# License: GNU General Public License 2 and above
# (cf. http://www.gnu.org/licenses/gpl.html)
from feedparser import parse
import sys

if not len (sys.argv) > 1 and sys.stdin.isatty ():
sys.stderr.write ('Usage: '+sys.argv[0]+' <url>\n')
sys.exit (-1)

if sys.stdin.isatty ():
d = parse (sys.argv[1])
else:
d = parse (sys.stdin)

if d.bozo:
sys.stderr.write ('Bad URL\n')
sys.exit (-2)
print '<?xml version="1.0" encoding="utf-8" ?>\n' '<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"\n' ' xmlns:dc="http://purl.org/dc/elements/1.1/"\n' ' xmlns:admin="http://webns.net/mvcb/"\n' ' xmlns:content="http://purl.org/rss/1.0/modules/content/"\n' ' xmlns:cc="http://web.resource.org/cc/"\n' ' xmlns="http://purl.org/rss/1.0/">\n' ' <channel rdf:about="'+d.feed.title_detail.base+'">\n' ' <title>'+d.feed.title+'</title>\n' ' <link>'+d.feed.link+'</link>\n' ' <description>'+d.feed.subtitle+'</description>\n' ' <dc:language>'+d.feed.title_detail.language+'</dc:language>\n' ' <dc:creator>'+d.entries[0].author+'</dc:creator>\n' ' <dc:date>'+d.feed.modified+'</dc:date>\n' ' <admin:generatorAgent rdf:resource="'+d.feed.generator_detail.href+'"/>\n' ' <items>\n' ' <rdf:Seq>\n'
for entry in d.entries:
print ' <rdf:li rdf:resource="'+entry.link+'"/>\n'
print ' </rdf:Seq>\n' ' </items>\n' ' </channel>\n'
for entry in d.entries:
print ' <item rdf:about="'+entry.link+'">\n' ' <title>'+entry.title+'</title>\n' ' <link>'+entry.link+'</link>\n' ' <description><![CDATA['+entry.content[0].value+']]></description>\n' ' <dc:date>'+entry.updated+'</dc:date>\n' ' </item>\n'
print '</rdf:RDF>\n'

Now when I add an atom feed into snownews I apply this filter and that's all folks ;)

No comments:

Post a Comment