As mentioned in my previous post, I've been keeping an eye on my logs after putting this blog live on App Engine. One thing that wasn't so easy to do with App Engine was monitor the datastore calls being made (like I would with Microsoft's SQL Server Profiler).
Luckily for us, Google added some API Hooks into App Engine and with some help from Nick Johnson and Jens Scheffler I managed to cobble together some scripts that allowed me to identify and solve my problems.
Today, by accident, I came across a library that completely blew my hacked scripts out of the water...
Guido van Rossum's Appstats
Guido van Rossum (a Software Engineer at Google, and the author of the Python programming language!) has released a library called Appstats that hooks and monitors the API calls your app makes and presents them in tables and graphs to aid profiling. Setting it up in your app is a breeze (especially if you already use util.run_wsgi_app to run your app).
I got it up and running on my blog via the dev server (though it works equally well on the production servers) to see how well it works. After a few clicks around the app, I navigated to /stats to see what it came up with.
The Appstats Dashboard
The Appstats dashboard is broken into three sections. The first shows stats for each API method you've called. The second shows stats by URL. The third shows recent requests. Each line in these tables can be expanded for a breakdown of the numbers, as shown below.
Request Statistics
Once you click on a request, you'll get a lot of information about the API calls made.
The first thing shown is a timeline showing not only how long each call took, but also when it started and finished (the will help identify large amounts of time being spent outside of the API calls). In the above example you can see that the most expensive part of this request was a RunQuery call. This call fetches comments for a given post when the posts HTML is not in memcache. All other lookups performed are either memcache Gets or db get_by_key_name calls, which are very fast in comparison.
If you expand one of the API calls you'll see a breakdown of what the call involved (both the request and response). You can specify how much text is included in these tables (such as the request/response) by changing the Appstats options.
Finally there's a summary table showing all API calls for the given request, much the same as on the dashboard, though this one includes timings.
Appstats looks to be a very valuable tool for any App Engine developer, and I look forward to seeing what comes out of the team in the future!
Related Reading
Since moving my blog to Google App Engine a few days ago, I've been keeping a close eye on the logs. This is my first app engine project that's using the datastore, so I wanted to make sure I hadn't done anything silly and I wasn't getting a large number of timeouts. Although it's probably overkill for my blog, to learn the APIs I use memcache to avoid hitting the datastore lots for the same data.
The caching I've implemented is fairly basic. When it generates a page using Django templates, the entire page contents are put into memcache, using a key that includes the page URL. If somebody else requests the same page, I can serve the entire thing from memcache without any processing overhead. This works well because there is no dynamic (different per user) content on any pages (unless you're an admin, but then memcache is bypassed entirely). When a comment is posted or a post is modified, the entire cache is wiped out. This is done to ensure comment counts on list pages are always up-to-date. Since posts and comments happen very infrequently on this blog (a handful per day at most) this shouldn't be an issue.
So, after putting things live, I noticed that some of my page hits were still quite expensive (in terms of CPU) with the caching. It turned out, that when someone visited a page that wasn't in the cache, I had to execute 3 or 4 datastore queries. One to get the Tag or Archive you were viewing (this step wasn't needed for homepage of direct post pages), one for the post(s), one for the tag list and one for the archive list. Since the tags and archive list don't change much, these could be cached across the application, rather than just per-page, which would result in much faster loading of uncached pages (since they would now be only 1-2 datastore hits).
So, I implemented this extra caching as follows, and all was good.
# TAGS: Check cache first
tags = memcache.get('tags')
# TAGS: Get and cache if not there
if not tags:
logging.info('Saving tags to cache')
tags = Tag.all().order('name_lower')
memcache.set('tags', tags, CACHE_TIME)
else:
logging.info('Got tags from cache')
# ARCHIVE: Check cache first
archive = memcache.get('archive')
# ARCHIVE: Get and cache if not there
if not archive:
logging.info('Saving archive to cache')
archive = Archive.all().order('-date')
memcache.set('archive', archive, CACHE_TIME)
else:
logging.info('Got archive from cache')
Or so I thought. Although I tested the code worked with no errors, there's no nice "Datastore Profiler" like the SQL Server Profile I'm used to in Microsoft-land, so I assumed everything was good.
Today, it occurred to me that the CPU values for my uncached page views was still pretty high. I was seeing up to half a second CPU time for what should be a single datastore query. This seemed pretty high even for my inefficient coding!
It turns out, I made a rookie mistake. The problem is with this code right here:
archive = Archive.all().order('-date')
memcache.set('archive', archive, CACHE_TIME)
This code does not cache the entities returned by the query. Rather, it caches the query. That means every time I grabbed it from cache and gave it into Django, the query was being executed as the template being parsed!
A quick change to call fetch() forces the query to be executed and now the entities are cached instead.
archive = Archive.all().order('-date').fetch(1000)
memcache.set('archive', archive, CACHE_TIME)
I'm not so keen on the hard-coded 1000, but until I can find a better way to force the query to execute, it'll do the job. My next job is to write some better hooks to allow better profiling of the datastore so I can better test for this kind of issue in future!
Update
After posting in the App Engine group, I was direct to the article Efficient Model Memcaching on Nick Johnson's blog, which shows a better way, avoiding some of the overhead of caching Model instances directly.
Related Reading
You probably haven't noticed, but this blog serves up different ads depending on where you're visiting from. Or at least, it'll serve Amazon UK ads if you're near the UK, and Amazon US ads otherwise. Serving up US ads to UK visitors (and vice versa) is pretty pointless, and I've always tried to avoid showing any ads unless they're relevant and at least targeted to the right country.
There are a number of ways to determine where your visitors are coming from, so I spent some time yesterday trying to find the most reliable way (and preferably one that didn't involve having a huge IP database sat alongside my site!). After much hacking and testing, I found what I believe to be the best way. Google.
Google has a JavaScript loader API, which allows developers to load JavaScript libraries from Google with various benefits. That's not really what we're interested in though, it has something more exciting:
google.loader.ClientLocation
It appears that you do not need an API key to use the JavaScript loader, you can simply reference it at http://www.google.com/jsapi. If you look at the JavaScript served up (which is incredibly fast), you'll see something like this:
google.loader.ClientLocation = { "latitude":50.123, "longitude":-2.876, "address": { "city":"Liverpool", "region":"Merseyside", "country":"United Kingdom", "country_code":"GB" } };
Not only do you get the country, but you get the county, city and even lat/lon pair. For me, the location given was within 2-3 miles of where I live, so if you wanted, you could really localise your ads!
On this site, the country is just sent to a script that will serve up some ads based on keywords I've tagged against a post. You might wish to be a bit more exciting and show your users places or people nearby. This could be especially useful for mobile applications/sites, though be sure to read any associated terms and conditions before using it!
When I started writing Google Wave Notifier last month, it was really only for my own use. I wouldn't use Google Wave if I had no way of knowing when I had messages (and keeping a browser open is not an acceptable solution!) so I hacked something together to login to Wave and parse the JavaScript objects.
I decided others might benefit from the app, so I cleaned it up a little, gave it an icon and published it on the web. I had no idea that it would get close to 10,000 downloads in little over a month! Nor did I expect so many people to send in translations!
So here we are 6 weeks down the line, and the app has been translated into 15 different languages!
- English
- Czech
- Danish
- Dutch
- French
- German
- Hungarian
- Italian
- Polish
- Portuguese
- Romanian
- Russian
- Spanish
- Swedish
- Turkish
So I'd like to say thanks to everyone that's helped; using, translating and finding bugs in the app. What was a small script written for personal use has turned into something quite useful!
For more information or to download, see the Google Wave Notifier website!
Wednesday, 30 December 2009
1 comments Google App Engine If you're running your app engine project on a custom domain (like this blog), you're probably not so happy that people can still access your app at http://appid.appspot.com.
When I started working on my blog, I realised this might be an issue, and did some investigating into how I could stop it. I found solution on SackOverflow that seemed to do what I needed, so I set it up and got it working.
Not long after implementing this code, I found a few problems:
- SSL is not supported on custom domains
- Cron jobs fail when Google invokes them with an appspot.com address and you serve a 301
So with some tweaks, I've managed to get this working as required. The only annoyance is the hard-coded '/admin' check. This is to support cron jobs, task queues etc., which are all protected ("login: admin" in app.yaml). They must work with an appspot.com address, because Google doesn't seem to follow the redirect when invoking them. It's possible you could do an IP address check here, but I'm not sure how consistent cron/task queue IP addresses are.
The code is called like this:
def main():
dantup.run_app([
("/\d*", RootHandler),
("/feed", FeedHandler),
("/tags/.*", TagHandler),
("/archive/.*", ArchiveHandler),
("/.*", PostHandler)
])
And dantup.py looks like this:
from google.appengine.ext import webapp
from google.appengine.ext.webapp.util import run_wsgi_app
def run_app(url_mapping):
application = webapp.WSGIApplication(url_mapping, debug=True)
application = redirect_from_appspot(application)
run_wsgi_app(application)
def redirect_from_appspot(wsgi_app):
"""Handle redirect to my domain if called from appspot (and not SSL)"""
from_server = "dantup-blog.appspot.com"
to_server = "blog.dantup.me.uk"
def redirect_if_needed(env, start_response):
# If we're calling on the appspot address, and we're not SSL (SSL only works on appspot)
if env["HTTP_HOST"].endswith(from_server) and env["HTTPS"] == "off":
# Parse the URL
import webob, urlparse
request = webob.Request(env)
scheme, netloc, path, query, fragment = urlparse.urlsplit(request.url)
url = urlparse.urlunsplit([scheme, to_server, path, query, fragment])
# Exclude /admin calls, since they're used by Cron, TaskQueues and will fail if they return a redirect
if not path.startswith('/admin'):
# Send redirect
start_response("301 Moved Permanently", [("Location", url)])
return ["301 Moved Peramanently", "Click Here %s" % url]
# Else, we return normally
return wsgi_app(env, start_response)
return redirect_if_needed
Hopefully you may find this useful. If you encounter any problems with it, please let me know!
Related Reading
Tuesday, 29 December 2009
4 comments Google App Engine There's nothing like testing a system after it's gone live!
After lots of frantic hacking over the last few weeks, the blog engine I've been writing for Google App Engine is up and running. It's not complete, but the important things (frontend, feed, posting, etc.) are done so it's at least usable.
Apologies if you just got a flood of old messages in your feed reader. I did investigate keeping the Blogger IDs to avoid this, but it seemed more hassle than it was worth, so I'll forgive you for clicking "Mark All as Read" this one time ;-)
I'll post more info (and code) on the blog as I get time, but now I'm off to sort some food out after a hard day's hacking!
If you notice anything messed up in the transition, please do let me know!
Over the past few weeks I've been playing with Google App Engine. I find the best way to learn a new language/framework/platform is to just jump in and write something in/on it. So that's what I'm doing. I've decided to write my blog in Python for Google App Engine.
As I may have blogged about in the past, I wrote a blog engine in Microsoft ASP.NET MVC not so long ago with the aim of moving away from Blogger. It was around 90% complete when I abandoned it for a variety of reasons (one being Azure pricing).
It's entirely possible the Google App Engine blog engine will also be abandoned, but since the hosting is free it at least stands a good chance of seeing the light of day! It'll also make an interesting comparison to the ASP.NET MVC version.
I started writing code a few nights ago, and currently the blog stands at 159 non-blank lines. I'm actually quite impressed with how little code I've had to write to get up and running. Currently there's no back-end, but the displaying of posts, comments, tags and archives are all working. Here's a quick screenshot to prove it exists! :)
Over the coming weeks I'll blog about how I've built it (including code), the pitfalls and the the experience of moving from .NET and C# to Python and App Engine!
Related Reading
Although I've been playing with App Engine for quite a few weeks now, I only found out yesterday how I can download the logs from App Engine for parsing locally. There's no export option in the dashboard, nor any option in the Windows launcher. However, you can do this yourself with appcfg.py.
If, like me, you've only used the launcher up until now, and never done anything with Python outside of GAE, you might find the thought of running python scripts a little scary. Don't worry, it actually turns out to be very easy. I'm hoping when you installed the Launcher you ticked the "Add to PATH" option... :o)
The command you want to run looks something like this:
appcfg.py request_logs appname/ output.txt
You should replace "appname" with the name of your app. This is the folder the contains the app.yaml file for the app you wish to get logs for. That means you need to be in the correct directory (or provide a full path). Output.txt is obviously where the logs should be written.
This command will retrieve everything from the last day. This might not be what you want, so you can increase this with the num_days argument.
appcfg.py --num_days=5 request_logs appname/ output.txt
Note that you can also specify 0 to get all logs:
appcfg.py --num_days=0 request_logs appname/ output.txt
And finally, if you want all logs that you haven't already downloaded, you can use --append, which will scan the last line of the existing file, and download anything since:
appcfg.py --append request_logs appname/ output.txt
However, you'll find this doesn't work on Windows, and you'll end up with a load of duplicate entries. This is in the bugtracker, but I don't know if/when it'll be fixed. For now, I'm having to just download everything (and I'm not sure if this is counting towards my bandwidth quota!).
Another option worth noting, is include_vhost, which will include the hostname used, so you can seperate requests for different versions of your app (or different custom domains). You can use this like so:
appcfg.py --num_days=0 --include_vhost request_logs appname/ "Logs/Logs.txt"
And that's all there is to it. Now you can create nice pretty charts in Microsoft Excel (since Google Spreadsheets sucks!) showing how well (or badly, in my case) your app is doing!
Related Reading
Saturday, 19 December 2009
1 comments Google App Engine While writing my comparison of Azure and App Engine pricing yesterday, I had a thought about how to increase some of your quotas without actually paying anything. I'm not sure whether Google would consider this "gaming the system" and stamp on you (and I certainly have no need to do it with my traffic levels), but I thought I'd post it in the interest of sharing.
If you check the App Engine Quotas page, you'll see the differences between the free quota, and the billing enabled quota. For example, if you don't have billing enabled, you may serve 1,300,000 requests per day. If you have billing enabled, you get 43,000,000 requests per day (other limits, such as CPU/bandwidth still apply).
So, if you've taken a look at how the billing works, you'll know that you get to decide exactly how your budget is split between resources. The minimum budget you can set is $1/day.
Consider what would happen if you enabled billing, but assigned the whole budget to something you know you won't exceed (eg., if you don't use memcache, assign it to memcache). Your app will be given the higher "billing enabled" quotas, but it won't cost you a penny!
As I said earlier - I've no idea whether Google will frown upon this behaviour, so if you do it, it's at your own risk!
As a .NET developer, I was quite excited to hear about Windows Azure. It sounded like a less painful version of Amazon's EC2, supporting .NET (less painful in terms of server management!). When I saw the pricing, it didn't look too bad either. That was, until I realised that their "compute hour" referred to an hour of your app running, not an hour of actual CPU time. Wow. This changes things. To keep a single web role running, you're looking at $0.12/hour = $2.88/day = $20.16/week = $86.40/month.
Anyone that's bought hosting for a small site/app recently will know that this is not particularly cheap!
So, recently I've been playing around with Google App Engine. It has this massive problem called Python (and an even bigger one called Java ;o)), but it's such a nice framework/engine to work with that I've somehow overlooked this and started coding with it. There's so much to like about it. Everything is so simple to deploy, and it scales "out of the box". Want Cron jobs? No worries, specify them in a file in your app, and when you deploy, App Engine will pick them up and schedule them. Want to queue up work to process later so that your pages return faster? Task queues do just that. What's more, you get a ridiculous free quota every day. It may be Python, but this sounds tempting, no?
So, I thought it'd be interesting to compare the costs of App Engine vs Azure. I understand this isn't really a like-for-like comparison, but both can achieve the same sort of things, and while all programmers will have a preferred language/framework (I'm no exception), many can be swayed by a cool framework or hosting.
First off, let's compare what you get for free. Bear in mind that Azure is free until the end of January, but since this is a CTP and won't end soon, I'm going to exclude it. Google's free quota currently has no time restrictions.
| Windows Azure | Google App Engine |
| CPU Hours | - | 6.5hrs/day |
| Bandwidth (out) | - | 1GB/day |
| Bandwidth (in) | - | 1GB/day |
| Storage (DB) | - | 1GB |
| Storage Transactions | - | 10,368,000/day |
Well, for free, I think we have a clear winner. If you can run your website/app within the limits above, then you can do it for free with Google. It's worth mentioning that Google let you have 10 apps per account (though you may not balance a single site/app across app instances - they are specifically for separate projects).
But what if you're bigger than that? What if you get Slashdotted or Dugg a lot? You might find you quickly break out of the free limits. How do prices compare?
| Windows Azure | Google App Engine |
| CPU | $0.12/hour | $0.10/hour |
| Note that MS bill "per hour app is running" whereas Google bill "per CPU hour consumed" |
| Bandwidth (out) | $0.15/GB | $0.12/GB |
| Bandwidth (in) | $0.10/GB | $0.10/GB |
| Storage (Files) | $0.15/GB/month | Pricing unavailable (Blobstore) |
| Storage (DB) | $9.99/month (SQL Server up to 1GB) | $0.15/GB/month ($0.005/GB/day) |
| Storage Transactions | $0.01/10,000 | ? |
| I can't find any prices for Google App Engine storage transactions, so it's possible there is no charge (though a limit of 140,000,000/day applies) |
Well, that's interesting. I was going to try and calculate at what point Azure would become cheaper, but looking at those prices, it just isn't going to happen. Now it's worth pointing out that not all of the comparisons are fair. Google bill per actual CPU hour (so if nobody visits your site, it's not costing you), whereas Microsoft are billing for each hour your app is live and able to respond. There's also a significant difference between SQL Server and App Engine Datastore (and depending on what you're doing, one will have advantages over the other).
I really hope Microsoft re-evaluate their pricing for small apps. It's too expensive to play around with small prototypes at those prices, whereas Google's offering will let me get started completely free, until my app is churning a considerable amount of traffic, and even then, it'll work our cheaper for the same processing/transfer.
Sorry Microsoft. I love .NET and Visual Studio, but Google App Engine is just so easy and cheap that it's going to be my "toy of choice" for my hobby coding for the immediate future!
Related Reading
« Older posts