« Sitecore on CM Innovation 2007 | HomePage | Sitecore Xpress Personal Developer Edition »

Friday, 11 January 2008

Converting sites to Sitecore: How to match old URL's to new URL's

When you convert an existing web site into a new (Sitecore) web site, much of the content will be similar. However, very often the older site would have different data structure and a different way to parse incoming requests. For example, a traditional Server page request could look something like http://mysite/productdetail.jsp?cat=fruits&product=ap..., while in the new structure it would look something like http://mysite/products/fruits/apple.aspx.

 

Links stored already in peoples browsers, - or links in printed materials would be under risks at breaking as they would still be linking to the page, using the older URL. Also, very often, during conversion, internal links may not be converted properly, and will also result in broken links.

 

Idea:

In order to solve this challenge I would recommend that you compose a link database when you start the conversion process storing all older items with a reference to the new item. However, instead of storing the physical path to the item, it’s recommended to store the unique ID only as these items may be moved around in the data structure after conversion.

OldURL

NewID
http://mysite/productdetail.jsp?cat=fruits&product=ap... {0d150c93-7ea9-4b39-98f3-8a64a554c56d}
http://mysite/productdetail.jsp?cat=fruits&product.ba... {6c480bd5-2ab3-496f-93f5-121310b9c67a}
: :

 

Whenever a page request is delivered to the server, Sitecore will attempt to look up the item, - if it does not exists, Sitecore will return an error code, 404 page not found. This is where you wish to hook in, - look up in the OldURL and (if it exists), direct to the new URL by looking up the GUID. It’s as simple as that.

 

How?

There are a number of approaches to do this, - but the most common one is to hook into the 404 error page and create your own page that redirects to the new page. However, this is not always desirable as it either requires a user redirect or a server redirect. Under any circumstances may it cause potential challenges with the return of correct error codes. Instead I would recommend that you hook into the Sitecore HttpRequest pipeline.

 

What’s the HttpRequest pipeline?

The Sitecore HttpRequest pipeline is a series of discrete steps that are being executed on every page request. The purpose of the pipeline is to populate the Sitecore.Context object with information that are being used elsewhere in the page assembly process. For example, the pipeline collects information on the current user object, the selected language, the site to use (you may have multiple sites in a Sitecore installation) and also the current item. The current item is the item that are being resolved from the requested URL. If the item cannot be determined, the Sitecore.Context.Item property will be null.

If you wish to set the current item to something else, the proper place to do so is after the ResolveItem step in the HttpRequest pipeline. Simply create a .NET class that checks if the Sitecore.Context.Item is null, and if it is, search the link database to see if the requested URL matches the OldUrl column. If so, look for the item that matches NewID (Sitecore.Context.Database.GetItem(ID)) and, if found, set the context item to the found item.

 

I have recently written another article on this topic, - custom 404 handler with an example to search for the item instead of using a custom link database. The article also comes with source code as a Sitecore package and could easily be used as a great start to get going implementing the “link database lookup” functionality: http://larsnielsen.blogspirit.com/archive/2007/10/17/modi...

13:35 Posted in Sitecore | Permalink | Email this

Comments

Hi Lars

While this suggestion is great for usability, it has some drawbacks ine should be aware of, and best documented by the following article: http://www.seobythesea.com/?p=212

Problem is that you get duplicate content, which may cause search engines to pick the wrong url of the two to store in their index. From an SEO perspective, consensus is that a 301 permanent redirect is the best way to handle old URL's like this. Still, your main idea about the link database is really important as you would like both users and serach bots to be redirected to the best possible page.

Cheers... Jesper

Posted by: JesperJørgensen | Wednesday, 23 January 2008

I dont think the solution should link to the old item itself, but only store the older links for a while.

Posted by: Lars Nielsen | Thursday, 24 January 2008