I dedicated a couple of hours during the weekend to (gently) spidering a well-known online jobs site in Argentina for Python-related positions, and then running hierarchical cluster analysis on hand-selected keywords using Pycluster.

According to this analysis (with all the caveats about rushed work, low n, etc), there are roughly three differentiated “domains of competence” according to the people who write posts in job boards:

  • A “narrow web domain”: ajax, dhtml, hibernate, apache, tomcat, spring, corba, rails, java, ruby, perl, php
  • A “wide web domain”:html, javascript, css, xml, mysql, cms, xhtml, c, cctv, ethernet, django, turbogears, flex, flash, coldfusion, xslt, lamp, mssql, soap, clusters, hpc, jboss, jetty, subversion, snmp, samba, excel, sybase, smarty, postgresql, rpc, plone, openerp, zope
  • A “server domain” (more sharply distinct from the rest): c++, boost, dns, firewalls, jython, unix, oracle, sql, solaris, ip, api, tcp, openssl, linux, svn

The labels I chose are of course largely arbitrary, but the grouping itself is less so, and not obviously derived from technological reasons. The “server domain” is more or less self-explanatory (although not devoid of weirdness), but why are the two first categories grouped as they are? Off the top of my head, I think that this reflects the existence of a more programming-oriented web domain among Python-mentioning jobs where dynamic languages are notorious and relatively large deployments are expected (hence the “enterprise” java technologies), and a large and heterogeneous “bag domain” of web-related technologies where everything goes. Needlessly to say, this description falls apart quite quickly (“clusters” and “hpc” belong to this bag domain, where they should logically go somewhere else), but, still, it seems to be a workable first approximation.

Looking at a finer granularity, things become much clearer. You have, for example, a “dynamic languages” cluster, and a very well defined “classic websites” cluster (html, javascript, css, xml, mysql).

It would be interesting to see how well these clusters (specially at the finer granularity levels) correlate with actual demands during work, but I’m not sure where to get that data from.

A little page straddling the line between toy and tool: mage. Coded quite quickly with Django’s help, it allows you to explore a bit of the combinatorial space of designs.

Functional programming is not about lack of side effects. Functional programming is about manipulating code (really, computations/functions/algorithms) instead of, or at least as much as, data. Lack of side effects in functions makes it easier to think about them, but that’s all. Here’s a good example of what thinking functionally looks like.

For a much worse example, here is a quick and brain-dead version of “apply” I just wrote that allows you to do things like


dataparser = [ stripped, split(','), (str, int, capitalized) ]
for line in datafile:
    print apply(dataparser, line)

The important lines are the last three; once you have your basic functions (and leaving aside questions of arity, debugging, types, and such), you shouldn’t need to write scaffolding code in order to compose them in simple ways. In this example, the dataparser = [ stripped, split(','), (str, int, capitalized) ] line simply says “this is the function that applies stripped, then split(‘,’), and then str, int, and capitalized to each of the three elements of the result tuple.” There’s a large number of simple scripts that use this sort of process a lot, and I feel that even if it’s not really that much of an issue to explicitly define the composed function, it compounds over a large number of tasks, and subtly discourages reuse of components and strategies and, most importantly, makes it subtly harder to work over the process.

Hacking the Django admin is both bad form and, at times, a great time saver. As this post shows, subclassing admin.ModelAdmin can get you very far, specially if (unlike in that post), you go beyond save_model and start wrapping and replacing other methods of the base class (e.g., you can wrap form generation to preload useful dynamic values — and why the hell don’t default callables take arguments?). The obvious use case is creatively restricting what admin users can do (yes, I know, not what the admin is there for… or so we are told), but you can do very arbitrarily funny things.

Let’s implement the classic “Maybe” monad, for simple functions.


#!/usr/local/bin/python

class Undefined:
    pass

class Maybe:
    def __init__(self, x=Undefined):
        self.value = x

    def apply(self, f):
        if not self.is_defined:
            return Maybe()
        else:
            return Maybe(f(self.value))

    def __repr__(self):
        if not self.is_defined:
            return '<Undefined>'
        else:
            return repr(self.value)

    @property
    def is_defined(self):
        return not self.value is Undefined

def monad_aware(f):
    def fprime(x):
        return x.apply(f)
    return fprime

if __name__ == '__main__':

    @monad_aware
    def double(x):
        return 2*x

    x = Maybe()
    print double(x)
    assert(not double(x).is_defined)

    x = Maybe(2)
    print double(x)
    assert(double(x).is_defined)
    assert(double(x).value == 4)

Pastebin link

One of the craziest things I’m working on right now is a Python interpreter for a Forth-inspired mini-language I call pyf. It’s not a practical programming tool in almost any sense, being much less useful than the language I wrote the interpreter in, but I’ve found that the combination of Forth’s very simple syntax (in its way, it’s as elegant as Lisp’s) with Python’s rich semantics (pyf’s stack holds Python objects, not bytes, making it horrendously slow by Forth standards, but also saving me the need to write a bunch of utility words) makes it a potentially useful tool for creative programming/doodling with code/however you might want to call it. When done right — and I’m not there yet — Forth-style stack-oriented programming is a surprisingly interesting cognitive tool, because it seems to mirror and take advantage of cognitive heuristics that are more awkwardly modeled by other languages (and, of course, vice versa).

I might even end up using it as a macro language in other projects, I guess.

Inspired by a Frontier Economy article on email clients, and doing a Frankenstein on mail-trends, I created a Python script that analyzes your Gmail habits and makes heuristic recommendations on things that you should move to RSS feeds, and, perhaps more interestingly, people you might want to get back in touch with (a la Facebook, but for those of us that are still email-centered).

Question is, what would be the best way to deploy this? I’d like the feature to be available to the least-technical audience possible, and either contributing back to mail-trends or distributing it as a separate Open Source program (but still a Python command line script) wouldn’t be very effective in this sense, I feel. I thought about setting it up as a web service, but it does need Gmail password, and that’s probably a privacy no-no, specially for an unknown site. So I guess I should do it as a well-packaged and user-friendly small desktop utility… pretty much the kind of project I really, really don’t enjoy doing.

Ideas? Comments? Suggestions?