pyblosxom to WordPress revisited

I migrated to WordPress a while back, and posted a pretty lame redirection setup for Apache. Since that initial migration, I’ve migrated Pia to WordPress as well, watched my web stats, and tweaked the redirection rules. Of course, everyone’s pyblosxom install is a bit different, but these rules will get you 80% of the way, should you decide to migrate yours. This recipe is for an in-place conversion, using the same base URL. Similar rules will work in other situations.

We’re going to be using mod_rewrite, and it’s often very helpful to watch the rewrite logs to make sure you’re doing the right thing, so I always put commented out rewrite log settings in my configurations.

RewriteEngine On
#RewriteLog /srv/
#RewriteLogLevel 5

Jaq directed me to mod_rewrite’s very cute RewriteMap feature, when I was trying to translate pyblosxom’s three-letter month strings to numeric months.

RewriteMap monthmap txt:/srv/wordpress/monthmap
RewriteRule  ^/blog/([0-9]{4})/([A-Z][a-z]{2})(/?.*)$  /blog/$1/${monthmap:$2}$3 [R=permanent]

The monthmap file looks like this:

Jan             01
Feb             02
Mar             03

… and so on.

Then we get into the meat of each post’s URL redirection. My pyblosxom posts were given UNIX timestamp names, so this ends up being very simple. I tweaked the WordPress RSS importer to make sure that each post slug (or ‘post_name’ in the database) was set to the UNIX timestamp filename I had used in pyblosxom. That means I can pretty well rely on any 10 digit string to be the name of one of my posts, so figuring out the redirects is just a matter of handling all the strange pyblosxom URL forms, and passing the 10 digit string to WordPress.

RedirectMatch permanent  /([0-9]{10})$  /blog/index.php?name=$1
RedirectMatch permanent  /([0-9]{10})\.html$  /blog/index.php?name=$1
RedirectMatch permanent  ^/blog/category/.*?/([0-9]{10}).*$  /blog/index.php?name=$1

If you didn’t use easily machine readable post names, you will probably have to change the URL base of your blog, especially if you’d used pyblosxom categories. I found it quite difficult to handle my categories, so I had to do some ugly custom redirects for each base category name:

RedirectMatch permanent  ^/blog/((issues|people|projects).*?)/index.html$  /blog/category/$1/
RedirectMatch permanent  ^/blog/((issues|people|projects).*?)/?$  /blog/category/$1/

Hopefully, like Pia, you didn’t use categories in pyblosxom at all, so you don’t have to worry about this crap. 🙂

Finally, we have to handle the RSS feed. pyblosxom (by default) uses a query string to indicate the ‘flavour’, so we have to do some funky mod_rewrite foo:

RewriteCond  %{REQUEST_URI}  ^/blog
RewriteCond  %{QUERY_STRING}  flav=rss
RewriteRule  ^(.*?)/?$  $1/feed/? [R=permanent]

(Very astute readers will understand the semantic difference between what that means in pyblosxom, and what you’d get in WordPress. I wasn’t too concerned about such a tiny edge case, myself.)

I had to use some tricks to get everything imported into WordPress in a useful way. Firstly, I modified to spit out my entire blog history. Then I modified the RSS importer to set ‘post_name’ from the imported <guid> element, allowing me to redirect based on the pyblosxom entry name.

Why bother? Because cool URIs don’t change. 🙂

This entry was posted in General. Bookmark the permalink.