I dedicated a couple of hours during the weekend to (gently) spidering a well-known online jobs site in Argentina for Python-related positions, and then running hierarchical cluster analysis on hand-selected keywords using Pycluster.
According to this analysis (with all the caveats about rushed work, low n, etc), there are roughly three differentiated “domains of competence” according to the people who write posts in job boards:
- A “narrow web domain”: ajax, dhtml, hibernate, apache, tomcat, spring, corba, rails, java, ruby, perl, php
- A “wide web domain”:html, javascript, css, xml, mysql, cms, xhtml, c, cctv, ethernet, django, turbogears, flex, flash, coldfusion, xslt, lamp, mssql, soap, clusters, hpc, jboss, jetty, subversion, snmp, samba, excel, sybase, smarty, postgresql, rpc, plone, openerp, zope
- A “server domain” (more sharply distinct from the rest): c++, boost, dns, firewalls, jython, unix, oracle, sql, solaris, ip, api, tcp, openssl, linux, svn
The labels I chose are of course largely arbitrary, but the grouping itself is less so, and not obviously derived from technological reasons. The “server domain” is more or less self-explanatory (although not devoid of weirdness), but why are the two first categories grouped as they are? Off the top of my head, I think that this reflects the existence of a more programming-oriented web domain among Python-mentioning jobs where dynamic languages are notorious and relatively large deployments are expected (hence the “enterprise” java technologies), and a large and heterogeneous “bag domain” of web-related technologies where everything goes. Needlessly to say, this description falls apart quite quickly (“clusters” and “hpc” belong to this bag domain, where they should logically go somewhere else), but, still, it seems to be a workable first approximation.
Looking at a finer granularity, things become much clearer. You have, for example, a “dynamic languages” cluster, and a very well defined “classic websites” cluster (html, javascript, css, xml, mysql).
It would be interesting to see how well these clusters (specially at the finer granularity levels) correlate with actual demands during work, but I’m not sure where to get that data from.