« Character Encoding and XML Parsers | Main | ILENN RSS feed fixed »

February 05, 2006

Changes to structure and minor downtime

Just a quick note to let people know that I'll be working on the guts of ILENN today (Sunday Feb. 5), updating the database structure and changing over to a new core RSS parser module. The site may be unavailable or screwed up for a few hours while I make the necessary adjustments. There will probably not be a big look-and-feel change today, but you should notice fewer duplications, and MUCH better support for non-English language character sets (for example, Japanese and Hungarian). If you find things still messed up after Sunday, please leave a comment here (I turned comments back on).

Posted by Kelly McKiernan at February 5, 2006 10:22 AM

Comments

How are you differentiating between character sets in the feeds? I know there's a charset declaration on HTML pages, but I don't know enough about Atom/RSS to know how that works. I've found that most Japanese sites (blogs included) use either UTF-8 or Shift JIS. Unfortunately, there are still a few sites and blogs out there that don't declare their character set, or use one of the less common charsets, like EUC and ISO 2022-JP. Not sure how you'd get around that, but it's something to take into account.

Thinking back to your previous post on ILENN internationalization, it's true that there are very few people who would want to read feeds in multiple languages, but there are a few of us who will. That said, many users whose primary language isn't English might still like to read the English posts... How about a Preferences page where you can specify the languages in which you want to view feeds? (The preferences could be stored with a cookie.) Or an area up at the top of the main page where you can select the languages to display (maybe with a Refresh button or something)... Just some thoughts. :-)

Posted by: Dunechaser at February 5, 2006 11:15 AM

Good question. I think I've found a solution for multiple languages on a page. The new RSS parser I'm using detects the feed character set and automatically converts it to properly display in UTF-8. I'll be updating all the pages to be served as UTF-8.

There will probably be instances where a feed won't be auto-detected, and we'll need to override the display character set. You're right, eventually there'll need to be a preference for that.

Posted by: Kelly at February 5, 2006 01:15 PM

Post a comment




Remember Me?

(you may use HTML tags for style)