I dedicated a couple of hours during the weekend to (gently) spidering a well-known online jobs site in Argentina for Python-related positions, and then running hierarchical cluster analysis on hand-selected keywords using Pycluster.
According to this analysis (with all the caveats about rushed work, low n, etc), there are roughly three differentiated “domains of competence” according to the people who write posts in job boards:
- A “narrow web domain”: ajax, dhtml, hibernate, apache, tomcat, spring, corba, rails, java, ruby, perl, php
- A “wide web domain”:html, javascript, css, xml, mysql, cms, xhtml, c, cctv, ethernet, django, turbogears, flex, flash, coldfusion, xslt, lamp, mssql, soap, clusters, hpc, jboss, jetty, subversion, snmp, samba, excel, sybase, smarty, postgresql, rpc, plone, openerp, zope
- A “server domain” (more sharply distinct from the rest): c++, boost, dns, firewalls, jython, unix, oracle, sql, solaris, ip, api, tcp, openssl, linux, svn
The labels I chose are of course largely arbitrary, but the grouping itself is less so, and not obviously derived from technological reasons. The “server domain” is more or less self-explanatory (although not devoid of weirdness), but why are the two first categories grouped as they are? Off the top of my head, I think that this reflects the existence of a more programming-oriented web domain among Python-mentioning jobs where dynamic languages are notorious and relatively large deployments are expected (hence the “enterprise” java technologies), and a large and heterogeneous “bag domain” of web-related technologies where everything goes. Needlessly to say, this description falls apart quite quickly (“clusters” and “hpc” belong to this bag domain, where they should logically go somewhere else), but, still, it seems to be a workable first approximation.
Looking at a finer granularity, things become much clearer. You have, for example, a “dynamic languages” cluster, and a very well defined “classic websites” cluster (html, javascript, css, xml, mysql).
It would be interesting to see how well these clusters (specially at the finer granularity levels) correlate with actual demands during work, but I’m not sure where to get that data from.
Functional programming is not about lack of side effects. Functional programming is about manipulating code (really, computations/functions/algorithms) instead of, or at least as much as, data. Lack of side effects in functions makes it easier to think about them, but that’s all. Here’s a good example of what thinking functionally looks like.
For a much worse example, here is a quick and brain-dead version of “apply” I just wrote that allows you to do things like
dataparser = [ stripped, split(','), (str, int, capitalized) ]
for line in datafile:
print apply(dataparser, line)
The important lines are the last three; once you have your basic functions (and leaving aside questions of arity, debugging, types, and such), you shouldn’t need to write scaffolding code in order to compose them in simple ways. In this example, the dataparser = [ stripped, split(','), (str, int, capitalized) ] line simply says “this is the function that applies stripped, then split(‘,’), and then str, int, and capitalized to each of the three elements of the result tuple.” There’s a large number of simple scripts that use this sort of process a lot, and I feel that even if it’s not really that much of an issue to explicitly define the composed function, it compounds over a large number of tasks, and subtly discourages reuse of components and strategies and, most importantly, makes it subtly harder to work over the process.
I’ve been playing with newLISP. So far so good; it’s a relatively simple LISP with more batteries included than the usual. Python is still my main development language, but I’m willing to sink some time into newLISP, to see if I can get properly enlightened. For all of the language’s elegance in most contexts, Python metaprogramming is indeed kludgy.
By the way, I think Ruby is more powerful in this regard, although still not elegant at all. And, as EWD would say, elegance is nothing else but another word for conceptual usability; inelegant tools lead to solutions that cannot be understood by either designers or implementators; lack of understanding leads to undying bugs, unsound algorithms, and multi-decade technological dead-ends.
Hacking the Django admin is both bad form and, at times, a great time saver. As this post shows, subclassing admin.ModelAdmin can get you very far, specially if (unlike in that post), you go beyond save_model and start wrapping and replacing other methods of the base class (e.g., you can wrap form generation to preload useful dynamic values — and why the hell don’t default callables take arguments?). The obvious use case is creatively restricting what admin users can do (yes, I know, not what the admin is there for… or so we are told), but you can do very arbitrarily funny things.
Let’s implement the classic “Maybe” monad, for simple functions.
#!/usr/local/bin/python
class Undefined:
pass
class Maybe:
def __init__(self, x=Undefined):
self.value = x
def apply(self, f):
if not self.is_defined:
return Maybe()
else:
return Maybe(f(self.value))
def __repr__(self):
if not self.is_defined:
return '<Undefined>'
else:
return repr(self.value)
@property
def is_defined(self):
return not self.value is Undefined
def monad_aware(f):
def fprime(x):
return x.apply(f)
return fprime
if __name__ == '__main__':
@monad_aware
def double(x):
return 2*x
x = Maybe()
print double(x)
assert(not double(x).is_defined)
x = Maybe(2)
print double(x)
assert(double(x).is_defined)
assert(double(x).value == 4)
Pastebin link
Long story short: When you apply a function to a value, the function controls the process. When you apply a function to a monad, the monad controls the process. That is what allows monads to extend the semantics of functions (e.g., adding side effects, or whatever).
And why not? Why should values be passive? (Note that accessors are a bit like half-monads; but full monads can do whatever the hell they want with functions, way beyond playing with what they see.)
Quoth the master:
As I have now said many times and written in many places: program testing can be quite effective for showing the presence of bugs, but is hopelessly inadequate for showing their absence.
— Dijkstra, A Discipline of Programming
This is much less powerful than it should be when taken out of context. Clearly, test suites no more “prove” that a program is valid than a predicate is “proved” by manually checking a few cases. Test suites are actually an improvement over customary practices, but only because customary practices are abysmally primitive. I’m not talking about pre-Internet primitive. I’m talking about pre-Russell, not to say pre-Descartes primitive.
Dijkstra’s writings are still among the most clear and solid examples of reasoning I’ve read, let alone reasoning about computer programming, and the fact that one can make a living as a “programmer” without thinking formally about anything ever, is as damning an indictment of the industry as I can imagine.
One of the craziest things I’m working on right now is a Python interpreter for a Forth-inspired mini-language I call pyf. It’s not a practical programming tool in almost any sense, being much less useful than the language I wrote the interpreter in, but I’ve found that the combination of Forth’s very simple syntax (in its way, it’s as elegant as Lisp’s) with Python’s rich semantics (pyf’s stack holds Python objects, not bytes, making it horrendously slow by Forth standards, but also saving me the need to write a bunch of utility words) makes it a potentially useful tool for creative programming/doodling with code/however you might want to call it. When done right — and I’m not there yet — Forth-style stack-oriented programming is a surprisingly interesting cognitive tool, because it seems to mirror and take advantage of cognitive heuristics that are more awkwardly modeled by other languages (and, of course, vice versa).
I might even end up using it as a macro language in other projects, I guess.
Inspired by a Frontier Economy article on email clients, and doing a Frankenstein on mail-trends, I created a Python script that analyzes your Gmail habits and makes heuristic recommendations on things that you should move to RSS feeds, and, perhaps more interestingly, people you might want to get back in touch with (a la Facebook, but for those of us that are still email-centered).
Question is, what would be the best way to deploy this? I’d like the feature to be available to the least-technical audience possible, and either contributing back to mail-trends or distributing it as a separate Open Source program (but still a Python command line script) wouldn’t be very effective in this sense, I feel. I thought about setting it up as a web service, but it does need Gmail password, and that’s probably a privacy no-no, specially for an unknown site. So I guess I should do it as a well-packaged and user-friendly small desktop utility… pretty much the kind of project I really, really don’t enjoy doing.
Ideas? Comments? Suggestions?