Tuesday, 09 January 2007

Sitecore - Avoiding query string in dynamic URL

This article illustrates how to create an application that dynamically receives parameters from a past page without the use of query string parameters.

As most of you know, you should try to avoid those dynamically generated URLs that use a query string. Search engine robots may have difficulties with this kind of URL – they may stop at the question mark, and may not even look at the query string. This means that your web site may not be indexed with the pages that appear dynamically generated.

In some occasions this makes sense; you do not want your visiting search spider by accident indexing orders or other pages which can be personally generated. However, the search engines who do not index URLs with question marks (dynamic pages) were more dominant in the past. For example Google, MSN Search and Yahoo longer have no problems with URLs with question marks.

However, even through the challenge with non-indexing search engines for dynamic pages is less dominant now than in the past, it still makes good sense to avoid query strings. First of all, they are not really good looking, and when passing them by email they may mess up the link.

Sitecore’s default URL parsing behavior usually terminates this need for query string parameters. For example the ‘/home/products/fruits/apple’ item in the data structure can be requested by the URL ‘/products/fruits/apple.aspx’.

Nevertheless, some solutions do require the passing of variables to pages as some data may not reside in the Sitecore data structure. Tree structures are suitable for real content while unstructured data such as purchase orders may reside in a traditional relational database structure.

If we consider a common scenario: Company SC Printers web site contains traditional data structures such as company, services, contact information, as well as a product hierarchy. The web site does also have a section where it’s possible to browse the order data. Data is selected from a SQL database.

For example, the URL to generate a list of all orders in France on a specific date, 24 March 1998 may look something like:

http://localhost/Company/Orders.aspx?country=France&d...

The data structure would look something like:

medium_TraditionalDataStructure.2.JPG

The Orders node would have a rendering applied which could parse the query string, then pass it on to the SQL server, execute the request and output the data. This can be an ascx page, a web control or an XSLT rendering using a custom XSLT extension. For example, the following code in a .NET class added as a XSLT extension:

   
public System.Xml.XPath.XPathNodeIterator GetOrders(
  string country, string date) {
     
  string connection =
    "server=localhost;database=Northwind;uid=sa;pwd=''";
     
  string query =
    "SELECT * FROM ORDERS where (ShipCountry = '" +
    country + "') AND (OrderDate = '" +
    Sitecore.DateUtil.FormatIsoDate(date, "MM-dd-yyyy") +
    "') for xml auto";
     
  System.Data.SqlClient.SqlConnection con =
    new System.Data.SqlClient.SqlConnection (connection);

  System.Data.SqlClient.SqlCommand cmd =
    new System.Data.SqlClient.SqlCommand(query, con);

  con.Open();

  System.Xml.XmlReader xmlread = cmd.ExecuteXmlReader();

  System.Xml.XPath.XPathDocument xpathdoc =
    new System.Xml.XPath.XPathDocument (xmlread,
    System.Xml.XmlSpace.Preserve);     

  return xpathdoc.CreateNavigator().Select(".");


The above function executes a traditional SQL statement against the Northwind database which is distributed as example database to any MS SQL or Access installation. As it’s a XSLT extension it returns a XPathNodeIterator for parsing in a XSLT file:


<xsl:template match="*" mode="main">
  <xsl:variable name="paramDate" select="sc:qs('country')" />
  <xsl:variable name="paramCountry" select="sc:qs('date')" />

  <h1>
    Displaying all orders in
    <xsl:value-of select="$paramCountry"/>
    the following date:
    <xsl:value-of
      select="sc:formatdate($paramDate, 'MMM dd. yyyy')"/>
  </h1>

  <xsl:variable name="queryresult"
    select="webutil:GetOrders($paramCountry, $paramDate)" />

  <xsl:if test="$queryresult">
    <xsl:for-each select="$queryresult/ORDERS">
      <div class="normal">
        CustomerID:<xsl:value-of select="@CustomerID"/>
      </div>
      <div class="normal">
        OrderDate:<xsl:value-of select="@OrderDate"/>
      </div>
      <hr />
    </xsl:for-each>
  </xsl:if>

</xsl:template>


What I have shown so far is pretty standard. But what happens if you wish to eliminate the parameters? Sitecore supplies a method for passing parameters as part of the traditional URL without the use of query string. This method is called wildcard nodes.

Wildcard nodes
A wildcard node is a node in the data structure that matches all requests on the given level if not found elsewhere on that level. A Sitecore wildcard are always named *.

For example, the wildcard node in the following data structure would match all URLs on this level:

medium_WildcardnodeDatastructure.JPG

E.g. http://localhost/Company/France.aspx and http://localhost/Company/Germany.aspx would both match the * node.

Wildcard nodes can even match a hierarchy:

medium_WildcardnodeDatastructureHiearcy.JPG

I guess you know where I’m aiming at? This data structure would match the following data structure: http://localhost/Company/France/19980324.aspx

All that remains is to parse the URL rather than extract the query strings from the code. For this, I use the WebUtil class. I need to add it as an XSLT extension:


<xslExtensions>
   :
  <extension mode="on" type="Sitecore.Web.WebUtil,
  Sitecore.Kernel" namespace="
http://www.sitecore.net/webutil"
  singleInstance="true" />
   :
</xslExtensions>
 
The method GetUrlName (Int32 index) parses the current URL and returns the index specified part from right to left. My XSLT would look something like this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="
http://www.w3.org/1999/XSL/Transform"
  xmlns:sc="
http://www.sitecore.net/sc"
  xmlns:dot="
http://www.sitecore.net/dot"
  xmlns:webutil="
http://www.sitecore.net/webutil"
  exclude-result-prefixes="dot sc webutil">

<!-- output directives -->
<xsl:output method="html" indent="no" encoding="UTF-8" />

<!-- parameters -->
<xsl:param name="lang" select="'en'"/>
<xsl:param name="id" select="''"/>
<xsl:param name="sc_item"/>
<xsl:param name="sc_currentitem"/>

<!-- entry point -->
<xsl:template match="*">
  <xsl:apply-templates select="$sc_item" mode="main"/>
</xsl:template>

<!--=============================-->
<!-- main                                                  -->
<!--=============================-->
<xsl:template match="*" mode="main">

  <xsl:variable name="paramDate"
    select="webutil:GetUrlName(0)" />

  <xsl:variable name="paramCountry"
    select="webutil:GetUrlName(1)" />

  <h1>
    Displaying all orders in
    <xsl:value-of select="$paramCountry"/>
    the following date:
    <xsl:value-of
      select="sc:formatdate($paramDate, 'MMM dd. yyyy')"/>
  </h1>

  <xsl:variable name="queryresult"
    select="webutil:GetOrders($paramCountry, $paramDate)" />

  <xsl:if test="$queryresult">

    <xsl:for-each select="$queryresult/ORDERS">

      <div class="normal">
        CustomerID:<xsl:value-of select="@CustomerID"/>
      </div>

      <div class="normal">
        OrderDate:<xsl:value-of select="@OrderDate"/>
      </div>

      <div class="normal">
        ShipName:<xsl:value-of select="@ShipName"/>
      </div>

      <div class="normal">
        ShipAddress:<xsl:value-of select="@ShipAddress"/>
      </div>

      <hr />

    </xsl:for-each>

  </xsl:if>

</xsl:template>
</xsl:stylesheet>


That gives the following output:

medium_resultscreenshot.JPG

That’s really it. Mission accomplished! Or is it?


Temporary GetUrlName fix

During the writing of this article I noticed that the GetUrlName method fails when not requesting index 0. This is naturally a Sitecore bug, and will be fixed to next version. I simply solved this by extending my XSLT extension with another method (same name, but fixed):


public string GetUrlName(Int32 index) {

  string url = Sitecore.Web.WebUtil.GetRawUrl();

  string path = url.Substring(1, url.LastIndexOf(".")-1);

  string[] parts = path.Split("/".ToCharArray());

  if (index < parts.Length)
    return parts[parts.Length - 1 - index];
  else
    return "";
}


Robots.txt

A final note to this article: As I wrote previously, sometimes it does make sense to limit search engines from indexing the order data. This can easily be achieved by modifying the robots.txt (residing in root of web site). Simply add the following line:

Disallow /customers/

This article applies to Sitecore 4x series as well as Sitecore 5.3 +

Comments

Excellent post. I really did forget about wildcard nodes, and never had a chance to use them

Posted by: Alexey Rusakov | Tuesday, 09 January 2007

The comments are closed.