Friday, 11 January 2008
Converting sites to Sitecore: How to match old URL's to new URL's
When you convert an existing web site into a new (Sitecore) web site, much of the content will be similar. However, very often the older site would have different data structure and a different way to parse incoming requests. For example, a traditional Server page request could look something like http://mysite/productdetail.jsp?cat=fruits&product=ap..., while in the new structure it would look something like http://mysite/products/fruits/apple.aspx.
Links stored already in peoples browsers, - or links in printed materials would be under risks at breaking as they would still be linking to the page, using the older URL. Also, very often, during conversion, internal links may not be converted properly, and will also result in broken links.
Idea:
In order to solve this challenge I would recommend that you compose a link database when you start the conversion process storing all older items with a reference to the new item. However, instead of storing the physical path to the item, it’s recommended to store the unique ID only as these items may be moved around in the data structure after conversion.
| OldURL | NewID |
| http://mysite/productdetail.jsp?cat=fruits&product=ap... | {0d150c93-7ea9-4b39-98f3-8a64a554c56d} |
| http://mysite/productdetail.jsp?cat=fruits&product.ba... | {6c480bd5-2ab3-496f-93f5-121310b9c67a} |
| : | : |
Whenever a page request is delivered to the server, Sitecore will attempt to look up the item, - if it does not exists, Sitecore will return an error code, 404 page not found. This is where you wish to hook in, - look up in the OldURL and (if it exists), direct to the new URL by looking up the GUID. It’s as simple as that.
How?
There are a number of approaches to do this, - but the most common one is to hook into the 404 error page and create your own page that redirects to the new page. However, this is not always desirable as it either requires a user redirect or a server redirect. Under any circumstances may it cause potential challenges with the return of correct error codes. Instead I would recommend that you hook into the Sitecore HttpRequest pipeline.
What’s the HttpRequest pipeline?
The Sitecore HttpRequest pipeline is a series of discrete steps that are being executed on every page request. The purpose of the pipeline is to populate the Sitecore.Context object with information that are being used elsewhere in the page assembly process. For example, the pipeline collects information on the current user object, the selected language, the site to use (you may have multiple sites in a Sitecore installation) and also the current item. The current item is the item that are being resolved from the requested URL. If the item cannot be determined, the Sitecore.Context.Item property will be null.
If you wish to set the current item to something else, the proper place to do so is after the ResolveItem step in the HttpRequest pipeline. Simply create a .NET class that checks if the Sitecore.Context.Item is null, and if it is, search the link database to see if the requested URL matches the OldUrl column. If so, look for the item that matches NewID (Sitecore.Context.Database.GetItem(ID)) and, if found, set the context item to the found item.
I have recently written another article on this topic, - custom 404 handler with an example to search for the item instead of using a custom link database. The article also comes with source code as a Sitecore package and could easily be used as a great start to get going implementing the “link database lookup” functionality: http://larsnielsen.blogspirit.com/archive/2007/10/17/modi...
13:35 Posted in Sitecore | Permalink | Comments (2) | Email this | Tags: Sitecore, HttpRequest, pipelines
Wednesday, 17 October 2007
Modifying the HttpRequest pipeline, custom 404 handler
Today, during level 2 training I got an idea when describing the Sitecore HttpRequest pipeline. This pipeline is executed every time a request is made: If it’s a web page, if it’s web services or if it’s the AJAX layer.
The purpose of the request pipeline is to populate the Sitecore context with information on the current request, - such as visiting device, security settings and the item requested.
I decided to do a small example on pipeline usage, while the participants were working hard to complete a lab. In this example I’m inserting a class right after the pipeline has checked if the current requested item exists. If not, - I will redirect to another Sitecore item which is the “page not found” page. As a small addition, I will parse the requested document and search the content structure for matching alternative items and display a description, something like:
Page not found.
The page you were looking for “xyz” was not found. However, you may want to visit the following topics:
- Topic1
- :
This example is nowhere near perfect: You should rather use a real search engine to find the item, - and this is also something I will recommend that you do if you decide to use this approach.
HttpRequest pipeline
As described above, the HttpRequest pipeline is executed on every single request for Sitecore, - and it populates the Sitecore.Context objects with information ranging from the current user, security, devices, databases and the requested item. The following entry in the pipeline finds the requested item, and if the item does not exists, instead of returning the normal 404 page (404 page not found), my class will set the new item:
<httpRequestBegin>
:
<processor type="Sitecore.Pipelines.HttpRequest.ItemResolver, Sitecore.Kernel" />
<processor type="Sitecore.Examples.PageNotFound, PageNotFound" method="ProcessCustomPageNotFound" />
:
</httpRequestBegin>
The Sitecore.Examples.PageNotFound class contains the following code:
using Sitecore.Data;
using Sitecore.Data.Items;
namespace Sitecore.Examples {
public class PageNotFound {
public void ProcessCustomPageNotFound(
Pipelines.HttpRequest.HttpRequestArgs args) {
if (Context.Item == null) {
if (Context.Database.Name == "web") {
Item PageNotFoundPage =
Context.Database.GetItem(
"/sitecore/content/home/global/pagenotfound");
if (PageNotFoundPage!=null){
Context.Item = PageNotFoundPage;
}
}
}
}
}
}
As you can see, I simply check if Sitecore has found an item (if the context item is null), and if not, do a quick check if the database is the web database (we don’t want to show this page for internal Sitecore items). I also try to identify the page not found item in the Sitecore content structure, - that’s the page that actually displays the error code: Make sure you do have a Sitecore item (with an assigned layout) in the structure. The item is, in this example, located under /home/global/pagenotfound.
As a small treat, - I have also created a sub layout that searches the Sitecore data structure for items that may match the URL the user has entered. Now, I realize this way of searching may be nowhere optimal, - and I would therefore recommend you to refine it should you choose this approach and use this code.
The sublayout contains a single label, looking something like this:
<%@ Control Language="C#" AutoEventWireup="true"
CodeBehind="PageNotFoundDetails.ascx.cs"
Inherits="PageNotFound.layouts.WebUserControl1" %>
<%@ register TagPrefix="sc"
Namespace="Sitecore.Web.UI.WebControls"
Assembly="Sitecore.Kernel" %>
<asp:Label ID="lblAbstract" CssClass="normal"
runat="server"></asp:Label>
With this code-behind:
using System;
using System.Text;
using Sitecore.Data.Items;
using Sitecore.Web;
namespace PageNotFound.layouts {
public partial class WebUserControl1 :
System.Web.UI.UserControl {
protected void Page_Load(object sender, EventArgs e) {
Response.StatusCode = 404;
Response.Status = "404 Page not found";
string searchedfor = WebUtil.GetUrlName(0);
string sPath =
"/sitecore/content/home//*[startswith(@@key,'" +
searchedfor.Substring(0, 2) + "')]";
Item[] found =
Sitecore.Context.Database.SelectItems(sPath);
StringBuilder stringBuilder = new StringBuilder();
stringBuilder.Append(
"<h1>The requested page, '" +
searchedfor +
"' could not be found.</h1>");
if (found.Length>0){
stringBuilder.Append(
"<h3>However, you might find these interesting:</h3>");
stringBuilder.Append("<ul>");
foreach (Item item in found) {
stringBuilder.Append("<li><a href='"+
item.Paths.GetFriendlyUrl()
+ "'>" + item.Name + "</a></li>");
}
stringBuilder.Append("</ul>");
}
lblAbstract.Text = stringBuilder.ToString();
}
}
}
A single comment to this code:
You may notice that I set the Response.StatusCode and Response.Status to 404 and a matching description.
Response.StatusCode = 404;
Response.Status = "404 Page not found";
This is naturally because is a visiting spider identifies a link to a non-matching page, - and are redirected to this page, - it should be mapped as a page that does not exists, -and therefore should be removed from the index.
The following line extracts the last part of the requested URL:
string searchedfor = WebUtil.GetUrlName(0);
For example, it would return apple from this URL request: http://localhost/products/apple.aspx.
I also utilize the Sitecore path search to find a matching document in the entire content structure, by extracting the first 2 letters from the above request part, and attempts matching items with that name:
string sPath =
"/sitecore/content/home//*[startswith(@@key,'" +
searchedfor.Substring(0, 2) + "')]";
Download file (Sitecore package with project file and installation instructions):
12:52 Posted in Sitecore | Permalink | Comments (3) | Email this | Tags: Sitecore, httprequest, httpmodule, 404 handler


