As you're all no doubt aware, ASP.NET MVC recently went RTM. This brings the MVC-style of coding, made very popular by Ruby-on-Rails to the ASP.NET world. I've been eager to start using MVC for months, but I've been holding off until I knew the API was locked down so I don't have to change anything.
Unfortunately, like WebForms, MVC has some "issues" with regards to duplicate content, making it not all that SEO-friendly.
What do you mean, Duplicate Content?
Duplicate content is just that - the same content repeated on multiple pages/sites. This might not sound like a big deal, but it's not something search engines like. They don't want the search results to show the same content multiple times across different websites so they often penalise or hide duplicate content. Additionally, if you have two pages with the same content, your inbound links might become split between the two - reducing the pagerank passed to either.
What's this got to do with ASP.NET MVC?
Unfortunately ASP.NET MVC makes it easy to have the same content indexed multiple times. I've listed the main problems below.
Case-Sensitivity. In ASP.NET (or rather IIS and Windows), URLs are not case sensitive. That means you can write Default.asp, default.asp or even DeFalT.aSp and still get the same page. While you'll probably stick to the same case within your website, it wouldn't be hard for someone to create links to your site with different casing (e.g. they might have CAPS LOCK turned on).
Default Documents. Most websites have a default document set up to serve when a filename is not provided in the request. E.g. http://mydomain.com/ might actually serve up http://mydomain.com/default.asp, but it won't tell the browser that's what it did. It will serve it up as if the two are different URLs.
Trailing Slashes. While the above problems are general ASP.NET/IIS issues, trailing slashes are something that only really become a problem with MVC or other URL rewriting/routing. In ASP.NET if you requested http://mydomain.com/files and you had a folder named files, IIS would issue a redirect to mydomain.com/files/. However, in ASP.NET MVC the URL routing will treat trailing slashes the same as requests without. So http://mydomain.com/controller/action is exactly the same as http://mydomain.com/controller/action/ and therefore results in duplicate content.
Query Strings. Query strings can be a big problem for duplicate content. Imagine if you can add ?sort=field to the end of your page to have a table re-ordered. To a search engine this looks like another page, but the content is mostly the same. Fortunately, ASP.NET MVC doesn't really use query strings thanks to the excellent URL routing.
So, what can we do?
Lowercase URLs. We can force all requests to our application to be lowercase by catching them in BeginRequest in Global.asax and redirecting to the lowercase version if they contain any uppercase characters.
Now if anyone requests a URL with uppercase characters, they'll be redirected with a 301 redirect. This works great, but we have a problem. All URLs generated internally by MVC will continue to use Action and Controller names in Pascal case (assuming that's how your classes are named). This means every link within our site will cause two requests (the first being a redirect). To fix this, we can override the default behaviour for creating URLs. We'll create a new extension method for the RouteCollection class called MapRouteLowercase which instead of creating a Route will create an instance of a new class, called LowercaseRoute. This class will override the GetVirtualPath method to lowercase the URL before passing it back. I can't take credit for this code, I pretty much just copied it from Graham O'Neale's blog.
You can put these classes anywhere. Because MapRouteLowercase is an extension method, you can just call it on the RouteCollection class in place of the existing MapRoute call in your Global.asax.
Default Documents. While this issue doesn't affect MVC in the same way, there's a very similar problem. In ASP.NET MVC the default routing is {controller}/{action} but it sets a default action of Index. That means on a newly-created project, both /Home/Index and /Home will serve up the same content.
To work around this, and provide some nicer URLs, I changed the routing a little so that my default actions where mapped to the root and a seperate route dealt with the homepage (which accepts pages, to allow browsing to older posts).
Trailing Slashes. To avoid trailing slashes and a few other minor issues (such as people adding /1 to a URL to get page 1, which is served up without the /1) I added some additional rules to my Global.asax as below.
This seems to stop many of the issues I came up with, however the double-slash seems to be passed through (in AbsolutePath) as a single slash here (Vista/IIS7) so doesn't work. I've left it in just in case this behaves differently on other web servers.
Is there anything else I should do?
As of February, Google, Yahoo and Microsoft Live Search support a new Canonical meta-tag. This allows you to specify on a page that this page is duplicate content and any incoming links should instead be attributed to another page. If your site has query strings or other potential for multiple requests to serve up the same content I would recommend inserting this tag to make sure the search engines choose your prefered page.