Archive for January 2010

22

January 2010

Open Sourcing Google Wave Notifier

Yesterday I finally got around to do something I've been planning on doing for a number of weeks. I uploaded Google Wave Notifier to Google Code. From today, Google Wave Notifier is Open Source!

In hindsight, I should have done this much sooner. The app is now very stable and contains all the features that I planned to implement (and more!). I'm still getting lots of feature requests, and I really wish I could implement them all. However, in reality, I just don't have time. By sharing the code with the world, all of these features need not go unimplemented!

I still intend to work through bugs and some feature requests myself, though it's unlikely to be at the rate of previous releases. With help from the community, hopefully we'll still see regular releases and new functionality.

For more info, or to download the source, please visit the Google Wave Notifier page on Google Code.

17

January 2010

Generic 301 Redirection Script for Google App Engine

Although this post is about writing a redirect script for App Engine, it doesn't require that any of the sites are hosted on App Engine, so it could be useful to you even if you're hosting .NET websites elsewhere, but need to handle redirecting old domains.

If you believe what Derek Says, I change my domain every 5 minutes. While this might be a slight exaggeration, I have moved many domains recently and needed to deal with the usual problems this brings: search engine rankings and existing inbound links.

301 vs 302 Redirects

The most common type of redirect you'll see on the web is a "302 Temporary Redirect". This is what most frameworks will output when you redirect (eg. Response.Redirect() in ASP/.NET or self.redirect() in App Engine). The "Temporary" part of this redirect means the redirect is a one-off and will not always be served. This is handy for example, for redirecting a user back to a page after logging in to your site. Since this page may be different each time, the redirect is not fixed.

The other type of redirect, 301, is a "Permanent Redirect". Most frameworks support these redirects without outputting your own headers. Eg. in App Engine you can use self.redirect(url, permanent=True). This type of redirect means the redirect will always occur to the same link, and that clients are free to bypass your script and assume it will always go to the same place. This is the redirect that we want to use when moving domains. It tells search engines that this page has moved, permanently, and they may associate any rankings for the old page, with the new page.

Generic App Engine Redirect Script

So, now we know what type of redirect to use, it's time to build a script to handle our redirects. If you have multiple domains like me, it makes sense to write a script that can handle them all in one go. I've decided to set up a new App Engine app with the sole purpose of redirects for all my domains from this point forward.

Since I recently decided to move everything from dantup.me.uk to dantup.com, I had a few different domains to redirect. blog.dantup.me.uk needs to map to blog.dantup.com, wavenotifier.dantup.me.uk needs to map to wavenotifier.dantup.com and all of the unused domains (eg. dantup.com, www.dantup.com, tuppeny.com, etc.) need to map to the root of my blog.

App.yaml Setup

If we're gong to be redirecting all incoming requests, we need to route all requests through our script. This is why it's important to use a new App Engine app rather than piggy-back onto an existing one. We'll see up our App.yaml file to route all requests into a script called main.py.

application: myapp-redir
version: 1
runtime: python
api_version: 1

handlers:

- url: /.*
 script: main.py

Next we need to define a way to hold all of the data we'll need to perform our redirects. As well as the old domain and the new domain, we need to know whether to map urls from the request onto the new domain, or just redirect to the root. Eg., when I moved my blog from blog.dantup.me.uk, I wanted blog.dantup.me.uk/mypost to redirect to blog.dantup.com/mypost. However I want tuppeny.com/anything to just redirect to the root, blog.dantup.com.

A dictionary seems to be a good way to store this data because we can perform lookups on the domain quickly, and we can store the new domain and a boolean for the url mapping as a tuple.

# Old Domain: New Domain, Map urls (else redirects to root)
urls = {
    'www.dantup.com': ('blog.dantup.com', False),
    'www.dantup.me.uk': ('blog.dantup.com', False),
    'www.tuppeny.com': ('blog.dantup.com', False),
    'dantup-redir.appspot.com': ('blog.dantup.com', False),
    'blog.dantup.me.uk': ('blog.dantup.com', True),
    'feeds.dantup.me.uk': ('feeds.dantup.com', True),
    'wavenotifier.dantup.me.uk': ('wavenotifier.dantup.com', True),
    'wavenotifier.tuppeny.com': ('wavenotifier.dantup.com', True),
    'go.dantup.me.uk': ('go.dantup.com', True),
    'go.tuppeny.com': ('go.dantup.com', True),
}

In addition to this mapping, we should declare a default domain, so if any requests make it to our script that don't have a mapping, we can redirect there. We'll use a 302 and also log and email this, since it's probably a mistake.

DEFAULT_URL = 'http://blog.dantup.com/'

This is all looking a little complicated, so it makes sense to build in a way to test our mappings without having to set up lots of entries in the hosts file. I've decided to declare a boolean that enables/disables testing. When testing is enabled, if you navigate to /test then it will output a bunch of URLs and the locations they'll redirect to. We'll keep a list of URLs to test in the code:

ALLOW_TEST = True

test_urls = [
    'http://www.dantup.me.uk',
    'http://www.dantup.me.uk/',
    'http://www.dantup.me.uk/blah',
    'http://www.dantup.com',
    'http://www.dantup.com/',
    'http://www.dantup.com/blah',
    'http://www.tuppeny.com',
    'http://www.tuppeny.com/',
    'http://www.tuppeny.com/blah',
    'http://blog.dantup.me.uk',
    'http://blog.dantup.me.uk/',
    'http://blog.dantup.me.uk/2010/mytest',
    'http://feeds.dantup.me.uk',
    'http://feeds.dantup.me.uk/',
    'http://feeds.dantup.me.uk/2010/mytest',
    'http://wavenotifier.dantup.me.uk',
    'http://wavenotifier.dantup.me.uk/',
    'http://wavenotifier.dantup.me.uk/2010/mytest',
    'http://wavenotifier.tuppeny.com',
    'http://wavenotifier.tuppeny.com/',
    'http://wavenotifier.tuppeny.com/2010/mytest',
    'http://go.dantup.me.uk',
    'http://go.dantup.me.uk/',
    'http://go.dantup.me.uk/mytest',
    'http://go.tuppeny.com',
    'http://go.tuppeny.com/',
    'http://go.tuppeny.com/mytest',
]

Now we've set the data up, it's time to write the code to handle the redirects. To allow for easy testing, we'll first create a method that takes a URL and returns where it should map to. This will be called by both the tests and the real redirects.

def get_redirect_url(url):
    scheme, netloc, path, query, fragment = urlparse.urlsplit(url)
	
    # Discard any port number from the hostname
    netloc = netloc.split(':', 1)[0]
	
    # Fix empty paths to be just '/' for consistency
    if path == '':
        path = '/'
	
    # Check if we have a mapping for this domain
    if netloc in urls:
        # Grab the redirect info tuple
        redirect_info = urls[netloc]
        # Root redirects
        if not redirect_info[1]:
            return 'http://' + redirect_info[0] + '/'
        # Redirects with paths
        else:
            return urlparse.urlunsplit(['http', redirect_info[0], path, query, fragment])
    # No mapping, so return None
    else:
        return None

This code is fairly straight forward. It uses our mappings dictionary to look up the domain to redirect to, and whether to include the path information. Next we need to write the code that actually handles incoming requests. This will check whether test mode is enabled, and if the request is '/test'. If so, it will output a table using out list of test URLs above. Otherwise it will call the same method, but actually perform a redirect. If we couldn't match a domain, we'll use a 302 redirect to the default URL, and send an email/log.

    def get(self):
        # If we're allowed to test (eg. local), and requested /test, then output the test
        if ALLOW_TEST and self.request.path == '/test':
            self.response.out.write('<h1>testing</h1>')
            self.response.out.write('<table>')
            for test_url in test_urls:
                self.response.out.write('<tr><td>' + test_url + '</td><td> </td><td>' + get_redirect_url(test_url) + '</td></tr>')
            self.response.out.write('</table>')

        # Otherwise, just go ahead and redirect
        else:
            # Perform redirect
            url = get_redirect_url(self.request.url)
			
            if url:
                logging.info('Redirecting ' + self.request.url + ' to ' + url);
                self.redirect(url, permanent=True)

            else:
                # Log that we didn't know what this was, and redirect to a good default
                logging.error('Unable to redirect this url: ' + self.request.url);
                mail.send_mail_to_admins(
                    sender='"DanTup Redirect" <myemail@mydomain.com>',
                    subject='Redirect Script Error',
                    body='Unable to redirect this url: ' + self.request.url
                )

                # Don't do permanent (301), since we don't know what this is.
                # Move it into the dictionary above if needed
                self.redirect(DEFAULT_URL)

There's a lot of code there, but it should be fairly simple to understand. We handle the test mode by just spitting out a table of our test URLs and the redirects. We can then look over this manually to make sure everything looks correct before going live. Otherwise we work out the redirect for the current request and redirect. If no URL was found, we log and email the attempt, and redirect to the default URL. When the email comes through, we can then add the domain we missed to the mappings dictionary and specify how it should be handled.

Naked Domains on App Engine

You'll notice that "naked" versions of my domains are missing from the script. This is because App Engine doesn't support naked domains, so these are all set up as redirects in my registrars control panel. They support 301 redirects with the same URL mapping options (eg. redirect all to root, or copy the path).

Conclusion

It didn't take much to write a simple generic redirect script, and now we can handle redirects for all domains in the future. This simply needs setting up on App Engine and any number of domains pointing at it. It's worth noting that you can point multiple domains from different Google Apps accounts at the same App Engine app. There is no requirement to use App Engine for hosting your sites in order for this script to be used. The fact that blog.dantup.com is hosted on App Engine doesn't change anything. You could redirect to an Azure site if you wished! Though you probably wouldn't want to ;-)

Full Listing

For convenience, you can get a copy of the full script.

16

January 2010

Google App Engine Benchmarks - db.put() Performance

Over the past few weeks as I've been using Google App Engine, I've come across people requesting benchmarks so they can compare App Engine performance to other solutions before they try it out. I don't really think comparing Google App Engine and it's Datastore to something like Azure and SQL Server is all that useful (because you'd generally structure things very different on each platform), but either way, it's interesting to see how things perform.

As well as comparing numbers with other platforms, I think it's worthwhile for App Engine developers to know how the different APIs perform (eg. the difference between fetching something from Memcache and the Datastore). Over the next few blog posts, I'm hoping to provide some numbers I gathered on the production App Engine servers. Please bear in mind that App Engine is still considered a "preview" and as things evolve, performance may change (hopefully for the better!).

The first set of data I gathered was on the performance of db.put(), and more specifically, the difference between calling db.put() multiple times with single entities vs calling it with a whole bunch of entities at once. It's very easy to call db.put() multiple times in a single request, but it's usually trivial to change your code to save all the entities in a single call. I thought that illustrating this difference with some pretty graphs might encourage people to use batch operations.

As always, the size and shape of your data will affect your timings. In my sample I used a small entity with only three string properties (and a key_name). You can grab a copy of the data I gathered in CSV format: App Engine db.Put() Benchmarks

db.put() Performance

Each test was run 10 times, and both the mean and median values plotted on a chart. I did this for varying numbers of entities from 10 to 500 (since db.put() has a limit of 500 entities).

Datastore db.put() performance comparison between individual and batch calls

As expected, in all cases, the batch method out-performs calling db.put() many times. Both operations scale very linearly (note the first 3 points are not increments of 100, which is why the line doesn't appear straight). For very small numbers (eg. 10 or less) the results are very similar, but as the number of entities increases, it becomes more important to batch up your requests.

It's worth noting that when you call db.put() with multiple entities, they are not combined into a transaction. If one of the writes fails, an error is raised, but any entities that have already been saved are not rolled back. If you want to update multiple entities are part of a transaction, you must do this the usual way by giving them the same parent and using run_in_transaction.

db.put() Performance Consistency

Since each test was run 10 times, I had enough data to draw a chat showing the consistency of the db.put() performance. The closer a line is to being completely horizontal, the more consistent the write performance is.

Individual db.put() Performance

Consistency of individual datastore db.put() calls

Batch db.put() Performance

Consistency of batch datastore db.put() calls

As you can see, the calls are fairly consistent, though the more entities you're saving, the bigger the variance. Although from the graph it looks like the batch calls are less consistent, they graph is drawn at a different scale. The batch calls vary my up to 1.5 seconds, whereas the individual calls vary by up to 5 seconds!

I hope to run some more benchmarks over the coming weeks showing the difference between other APIs such as using Memcache to avoid going to the Datastore on every request.

09

January 2010

New Domain - DanTup.com - Please Update Your Links!

After much discussion and debate, I've decided to move all of my sites from the dantup.me.uk domain to dantup.com. The .me.uk domain came free with some hosting and was never really intended to become my "main" domain, but it did. It looked like Google were ranking me lower in Google.com vs Google.co.uk because they believed my content to be more relevant to the UK, which isn't really the case.

Over the last few hours I've been slowly migrating things and setting up redirects. An unfortunate side effect is that you may get duplicate posts in my feed (again). Apologies for that, but once everything is moved over, hopefully it's now here to stay :-)

Although not necessary (old URLs will 301-redirect to new ones), if you have any links to any of my sites/feed, it'd be nice if you could update them to the .com to avoid additional redirects. Here's the new URLs:

It looks like some things that I've moved from one Google service to another (eg. Feedburner to App Engine) are taking some time to make it through the Google network, so the old domains may behave strangle until that's done. Please let me know if you notice anything strange.

06

January 2010

Entity Groups, Contention and Transactions in Google App Engine

When I started learning Google App Engine, I misunderstood a fairly fundamental part of the datastore - Entity Groups. The documentation is not very clear, and over the weeks I've seen many questions asked in videos and forums that suggest I'm not the only one that misunderstood. I thought it was worth a blog post to explain.

Entity Groups are not Tables!

The misunderstanding is that people think of entity groups like table in SQL. This is not the case. If you create two entities that are of the same kind, by default, they do not belong to the same entity group. Unless you specifically choose to put them in the same group, all of your entities will be in a seperate entity group. This means if you have 1,000 entities of the same kind, you have 1,000 entity groups containing one entity each! This is not a bad thing, you shouldn't put things into the same entity groups unless you need to, since updates to an entity lock the whole entity group.

Putting Entities in the Same Entity Group

All entity groups have a root entity. You can't put two entities into the same entity group without one of them being the root entity (parent). To put two entities into the same entity group, you set the parent property, like this:

# Create the company
my_company = Company(name='Danny\'s Company')
my_company.put()

# Create the employee
me = Employee(
    parent=my_company,
    name='Danny Tuppeny'
)
me.put()

Because updates lock an entire entity group, you should only use them if you need to. If you're not going to need to update an employee and its company in the same transaction, you don't need to put them in the same entity group.

Entity Keys Include Their Parent Entity Keys

Something worth noting, is that the key for an entity includes the key of it's parent. That means you can do some clever things for performance purposes, such as "Relation Index Entities" as described by Brett Slatkin in this video. Brett creates child entities for querying and converts their keys to their parents keys using the parent() method to then batch fetch the entities themselves.

Hopefully this clears things up a little. If it still leaves questions (or I've missed anything), please leave a comment below and I'll try to update the post.

« Older posts