Latest Entries
Queued Storage Backend for Django
[UPDATE] I added this code to github http://github.com/seanbrant/django-queued-storage if anyone is interested.
Say you have a web application that allows users to store large images and then serve them back to the user. Think something like Flickr. Now lets say you want to use Amazon S3 as your image server. You might start running into slowness with uploads if you upload the image to S3 in the same request the user made to upload the image to your site. What’s happening is the image first needs to get uploaded to your servers filesystem then it needs to get sent to S3. Depending on the file size this could provide a poor user experience.
So in trying to solve this I did what any good developer would do. I googled the best way to solve this problem. I came across several approaches none of which seemed that elegant. One suggested adding two fields to your modal and a flag that would tell you which field to use, yuck. I first went this route and the boiler plate and messiness was not worth it.
What I really wanted was one interface to the file just like a normal storage backend. I wanted to keep the logic for switching storages all in one place as much as possible and I wanted it to be no harder to use then a normal storage backend.
Something like this.
image = ImageField(storage=QueuedRemoteStorage(local='django.core.files.storage.FileSystemStorage',
remote='backends.s3boto.S3BotoStorage'), upload_to='uploads')
I came up with what I am calling QueuedRemoteStorage, for lack of a better name. This is basically a proxy for a local and a remote storage backends that takes care of determining what backend to use depending on what state the file is in. State is maintained using Django’s caching framework, and the queue service is Celery.
The only downside is this requires you have an app created to hold the Celery tasks.py file. So create a new app or add a tasks.py file to another app you have. I won’t go into how to use Celery, have a look at their documentation.
In tasks.py you need to define a subclass of Task.
from django.core.cache import cache
from django.core.files.storage import get_storage_class
from celery.registry import tasks
from celery.task import Task
class SaveToRemoteTask(Task):
def run(self, name, local, remote, cache_key):
local_storage = get_storage_class(local)()
remote_storage = get_storage_class(remote)()
remote_storage.save(name, local_storage.open(name))
cache.set(cache_key, True)
return True
tasks.register(SaveToRemoteTask)
What this code is doing is defining a Task that Celery will load on start up automatically because you called the file tasks.py and put it in the root of the app. Next it uses a function that Django provides that takes the path of the storage backend as a string and returns the storage class. We then open the file from the local storage and save it to the remote storage and set the key in the cache to True (the file is on the remote server).
This will hopefully make more sense when you see the storage backend.
import urllib
from django.core.cache import cache
from django.core.files.storage import get_storage_class, Storage
from yourapp.tasks import SaveToRemoteTask
QUEUED_REMOTE_STORAGE_CACHE_KEY_PREFIX = 'queued_remote_storage_'
class QueuedRemoteStorage(Storage):
def __init__(self, local, remote, cache_prefix=QUEDED_REMOTE_STORAGE_CACHE_KEY_PREFIX):
self.local_class = local
self.local = get_storage_class(self.local_class)()
self.remote_class = remote
self.remote = get_storage_class(self.remote_class)()
self.cache_prefix = cache_prefix
def get_storage(self, name):
cache_result = cache.get(self.get_cache_key(name))
if cache_result:
return self.remote
elif cache_result is None:
if self.remote.exists(name):
cache.set(self.get_cache_key(name), True)
return self.remote
return self.local
def get_cache_key(self, name):
return '%s%s' % (self.cache_prefix, urllib.quote(name))
def using_local(self, name):
return self.get_storage(name) is self.local
def using_remote(self, name):
return self.get_storage(name) is self.remote
def open(self, name, **kwargs):
return self.local.open(name, **kwargs)
def save(self, name, content):
cache.set(self.get_cache_key(name), False)
name = self.local.save(name, content)
SaveToRemoteTask.delay(name, self.local_class, self.remote_class, self.get_cache_key(name))
return name
def get_valid_name(self, name):
return self.get_storage(name).get_valid_name(name)
def get_available_name(self, name):
return self.get_storage(name).get_available_name(name)
def path(self, name):
return self.get_storage(name).path(name)
def delete(self, name):
return self.get_storage(name).delete(name)
def exists(self, name):
return self.get_storage(name).exists(name)
def listdir(self, name):
return self.get_storage(name).listdir(name)
def size(self, name):
return self.get_storage(name).size(name)
def url(self, name):
return self.get_storage(name).url(name)
Most of this code is just providing a proxy to the actual storage methods as determined by get_storage. The heart and soul of this class is found in get_storage and save. get_storage checks for the key in the cache if it finds it, it can assume that the file is on the remote server and it returns the remote storage class instance. If cache_result is None we check the remote backend for the existence of the file if found we update the cache and return the remote backend. All else fails we return the local backend. Now the save method is responsible for queuing up the remote transfer. It first sets the cache to False then saves the file locally. Next it sends a job to the queue and returns the name of the file.
Hopefully this is pretty straight forward. At this point this is nothing more than proof of concept that has not been tested in a production setting. I hope that this will at least give other’s some ideas.
Google App Engine
Today I launched my first Google App Engine site Ross Gload Is Everywhere. So far I’m pretty impressed with what Google has to offer. The coding part is plan old Python with some basic framework tools. The ORM is usable, more on that in a minute and the development environment is what you would except. The shinny star IMHO is deployment and management. With a simple command in Terminal your fresh code is deployed and running live. No messing with SSH, version control, stopping or restarting services. I dig it. Plus they give you nice dashboard to view log files, request information and lot more. Really other hosting companies could learn a thing or two from the dashboard.
If you have one foot in the past and one foot in the future, you’re pissing on the present. — via Mirificam Press
— Jeff Croft
django-piston
“Piston is a Django mini-framework creating APIs.” This looks really cool and will likely be used on a upcoming project I am currently working on.
Setting and deleting cache in Django with tags
The Django cache system is real help when you need to keep your site as fast as possible. It offers multiple back-ends and is super easy to get started using. I just wish it had the ability to tag content when you are setting it in the cache. If you could do that then you could do bulk deleting of content based on tags. One use-case that comes up again and again for me is caching paginated results. If you have blog-entry1-page1 and blog-entry1-page2 both stored in the cache and want to delete them when a new entry is added what do you do? It’s hard if not impossible you know which paged set is currently in cache and we need to know the keys in order to delete them.
I searched and searched for a solution to this problem with not much help from google. I did find this post by Eric Florenzano which is close to the approach I took, but I wanted something more generic. So I decided to add two new methods cache.set_with_tags and cache.delete_by_tags to whatever cache backend you are using. Just to note this is only useful if you intend to utilize the low-level cache framework.
Behind the scenes of EveryBlock.com
Adrian Holovaty, bad-boy YouTube guitar star (search for him, if you dare!) and co-author of the Django web framework, takes you under the hood of EveryBlock.com, a Knight Foundation News Challenge startup which rounds up local news and information, and is powered 100% by Python and Django.
And so it goes...
Hello this is my first post in my first blog, weird. I have been a hold out for a long time on this blogging stuff and to be honest I’m not really sure why. I mean I’m by no means a good writer, my thoughts are extremely random at best, and for the most part I live a normal and relatively uninteresting life. Seeing how blogging was all the rage a few years ago it’s my time to have my hand in it. I’m always late to the party, however I’m usually the last one standing attempting to drink whatever is left in fridge and clutching onto whatever remaining youth I still posses.
So please be patient with me as I attempt to share my thoughts on code, life, new music I discover, why I find certain things annoying, and whatever I feel like spewing onto the internets. With all this said I will issue the following disclaimer. I suck, suCK, SUCK at writing, grammar, spelling and generally anything evolving words. Oh well, I don’t give a shit, this is my blog dammit if you don’t like it go read one of the other 70+ million blogs on the web.






