Latest Entries

04Jan

Queued Storage Backend for Django

[UPDATE] I added this code to github http://github.com/seanbrant/django-queued-storage if anyone is interested.

Say you have a web application that allows users to store large images and then serve them back to the user. Think something like Flickr. Now lets say you want to use Amazon S3 as your image server. You might start running into slowness with uploads if you upload the image to S3 in the same request the user made to upload the image to your site. What’s happening is the image first needs to get uploaded to your servers filesystem then it needs to get sent to S3. Depending on the file size this could provide a poor user experience.

So in trying to solve this I did what any good developer would do. I googled the best way to solve this problem. I came across several approaches none of which seemed that elegant. One suggested adding two fields to your modal and a flag that would tell you which field to use, yuck. I first went this route and the boiler plate and messiness was not worth it.

What I really wanted was one interface to the file just like a normal storage backend. I wanted to keep the logic for switching storages all in one place as much as possible and I wanted it to be no harder to use then a normal storage backend.

Something like this.

image = ImageField(storage=QueuedRemoteStorage(local='django.core.files.storage.FileSystemStorage',
                   remote='backends.s3boto.S3BotoStorage'), upload_to='uploads')

I came up with what I am calling QueuedRemoteStorage, for lack of a better name. This is basically a proxy for a local and a remote storage backends that takes care of determining what backend to use depending on what state the file is in. State is maintained using Django’s caching framework, and the queue service is Celery.

The only downside is this requires you have an app created to hold the Celery tasks.py file. So create a new app or add a tasks.py file to another app you have. I won’t go into how to use Celery, have a look at their documentation.

In tasks.py you need to define a subclass of Task.

from django.core.cache import cache
from django.core.files.storage import get_storage_class

from celery.registry import tasks
from celery.task import Task

class SaveToRemoteTask(Task):
    def run(self, name, local, remote, cache_key):
        local_storage = get_storage_class(local)()
        remote_storage = get_storage_class(remote)()
        remote_storage.save(name, local_storage.open(name))
        cache.set(cache_key, True)
        return True

tasks.register(SaveToRemoteTask)

What this code is doing is defining a Task that Celery will load on start up automatically because you called the file tasks.py and put it in the root of the app. Next it uses a function that Django provides that takes the path of the storage backend as a string and returns the storage class. We then open the file from the local storage and save it to the remote storage and set the key in the cache to True (the file is on the remote server).

This will hopefully make more sense when you see the storage backend.

import urllib

from django.core.cache import cache
from django.core.files.storage import get_storage_class, Storage

from yourapp.tasks import SaveToRemoteTask

QUEUED_REMOTE_STORAGE_CACHE_KEY_PREFIX = 'queued_remote_storage_'

class QueuedRemoteStorage(Storage):
    def __init__(self, local, remote, cache_prefix=QUEDED_REMOTE_STORAGE_CACHE_KEY_PREFIX):
        self.local_class = local
        self.local = get_storage_class(self.local_class)()
        self.remote_class = remote
        self.remote = get_storage_class(self.remote_class)()
        self.cache_prefix = cache_prefix

    def get_storage(self, name):
        cache_result = cache.get(self.get_cache_key(name))
        if cache_result:
            return self.remote 
        elif cache_result is None:
            if self.remote.exists(name):
                cache.set(self.get_cache_key(name), True)
                return self.remote   
        return self.local

    def get_cache_key(self, name):
        return '%s%s' % (self.cache_prefix, urllib.quote(name))

    def using_local(self, name):
        return self.get_storage(name) is self.local

    def using_remote(self, name):
        return self.get_storage(name) is self.remote

    def open(self, name, **kwargs):
        return self.local.open(name, **kwargs)

    def save(self, name, content):
        cache.set(self.get_cache_key(name), False)
        name = self.local.save(name, content)
        SaveToRemoteTask.delay(name, self.local_class, self.remote_class, self.get_cache_key(name))
        return name

    def get_valid_name(self, name):
        return self.get_storage(name).get_valid_name(name)

    def get_available_name(self, name):
        return self.get_storage(name).get_available_name(name)

    def path(self, name):
        return self.get_storage(name).path(name)

    def delete(self, name):
        return self.get_storage(name).delete(name)

    def exists(self, name):
        return self.get_storage(name).exists(name)

    def listdir(self, name):
        return self.get_storage(name).listdir(name)

    def size(self, name):
        return self.get_storage(name).size(name)

    def url(self, name):
        return self.get_storage(name).url(name)

Most of this code is just providing a proxy to the actual storage methods as determined by get_storage. The heart and soul of this class is found in get_storage and save. get_storage checks for the key in the cache if it finds it, it can assume that the file is on the remote server and it returns the remote storage class instance. If cache_result is None we check the remote backend for the existence of the file if found we update the cache and return the remote backend. All else fails we return the local backend. Now the save method is responsible for queuing up the remote transfer. It first sets the cache to False then saves the file locally. Next it sends a job to the queue and returns the name of the file.

Hopefully this is pretty straight forward. At this point this is nothing more than proof of concept that has not been tested in a production setting. I hope that this will at least give other’s some ideas.

12Aug

Google App Engine

Today I launched my first Google App Engine site Ross Gload Is Everywhere. So far I’m pretty impressed with what Google has to offer. The coding part is plan old Python with some basic framework tools. The ORM is usable, more on that in a minute and the development environment is what you would except. The shinny star IMHO is deployment and management. With a simple command in Terminal your fresh code is deployed and running live. No messing with SSH, version control, stopping or restarting services. I dig it. Plus they give you nice dashboard to view log files, request information and lot more. Really other hosting companies could learn a thing or two from the dashboard.

read more

14May

If you have one foot in the past and one foot in the future, you’re pissing on the present. — via Mirificam Press

Jeff Croft

01May

django-piston

Piston is a Django mini-framework creating APIs.” This looks really cool and will likely be used on a upcoming project I am currently working on.

11Apr

Setting and deleting cache in Django with tags

The Django cache system is real help when you need to keep your site as fast as possible. It offers multiple back-ends and is super easy to get started using. I just wish it had the ability to tag content when you are setting it in the cache. If you could do that then you could do bulk deleting of content based on tags. One use-case that comes up again and again for me is caching paginated results. If you have blog-entry1-page1 and blog-entry1-page2 both stored in the cache and want to delete them when a new entry is added what do you do? It’s hard if not impossible you know which paged set is currently in cache and we need to know the keys in order to delete them.

I searched and searched for a solution to this problem with not much help from google. I did find this post by Eric Florenzano which is close to the approach I took, but I wanted something more generic. So I decided to add two new methods cache.set_with_tags and cache.delete_by_tags to whatever cache backend you are using. Just to note this is only useful if you intend to utilize the low-level cache framework.

read more

07Apr

Behind the scenes of EveryBlock.com

Adrian Holovaty, bad-boy YouTube guitar star (search for him, if you dare!) and co-author of the Django web framework, takes you under the hood of EveryBlock.com, a Knight Foundation News Challenge startup which rounds up local news and information, and is powered 100% by Python and Django.

06Apr

And so it goes...

Hello this is my first post in my first blog, weird. I have been a hold out for a long time on this blogging stuff and to be honest I’m not really sure why. I mean I’m by no means a good writer, my thoughts are extremely random at best, and for the most part I live a normal and relatively uninteresting life. Seeing how blogging was all the rage a few years ago it’s my time to have my hand in it. I’m always late to the party, however I’m usually the last one standing attempting to drink whatever is left in fridge and clutching onto whatever remaining youth I still posses.

So please be patient with me as I attempt to share my thoughts on code, life, new music I discover, why I find certain things annoying, and whatever I feel like spewing onto the internets. With all this said I will issue the following disclaimer. I suck, suCK, SUCK at writing, grammar, spelling and generally anything evolving words. Oh well, I don’t give a shit, this is my blog dammit if you don’t like it go read one of the other 70+ million blogs on the web.

About Steps and Numbers
avatar

Steps and Numbers is the personal blog of web developer/designer Sean Brant. I enjoy coding and designing wonderful user experiences for the web. I spend my spare time hanging out in the beautiful city of Chicago, coding random projects most of which never see the light of day, and attempting to resurrect my failed carrier as a rock musician. You can contact me if you want and I’ll try and respond.

Flickr Photos
IMG_0970.JPG
IMG_0991.JPG
IMG_1002.JPG
IMG_0990.JPG
IMG_1015.JPG
IMG_1001.JPG
Search
Feeds
Post, Links, and Quotes
Just Posts
Just Links
Just Quotes