Python Provocation

Posting slowed down a bit recently. A few life difficulties took precedent followed by almost two solid weeks of chairing the Astronomy Grants Panel. Doesn’t feel right to write about that in any detail for obvious reasons, but I will just say that this article in THES suggesting that we would better off with a lottery is a complete pile of dingo’s kidneys. Maybe I will work up on a post on that, as it made me cross.

But maybe I am just in tetchy mood.  I am even falling out of love with Python.

Being a chap of a certain age, I spent a long time stubbornly persisting with Fortran until it got too embarassing to admit. So I mugged up C. Pretty good, but didn’t feel right. Next up Java. Hello World etc. Java was just too strict and boring and pernickety. Engineer’s language really, not a scientist’s language or a hacker’s language. All that object oriented stuff. Mystical mumbo jumbo. I like algorithms ! Give me procedures !!!

I was getting fed up. Then someone said “try Python” and reluctantly I did. Then, lo, all was warmth and happiness, and the light shone upon the face of the deep. It was easy and flexible, and object oriented but not so you really noticed. You could use it interactively, or write simple scripts, or build massive symphonies if you wished. It came with all sorts of internet goodies built in. Most important, it was extensible and had community backing. Numpy/Scipy seemed the right thing to back, and astro stuff was appearing.

But it still seems a bit ugly, and I have to keep a big notes file full of tricks and reminders of how to do things, cos somehow it doesn’t stick from one month to the next unless you keep using it. Installing and updating stuff is getting gradually easier, but still clunky. There are weird incompatibilities between one version and the next within 2.x, whereas you’d have thought the 2.x world should have been completely backwards compatible, with 3.x becoming a new world. I hear from some developer contacts that this is even more of a problem for key packages like Scipy, with quite frequent changes to the API which mean that your scripts keep breaking.

Then of course there is the speed thing. One of my favourite packages is Pyxplot, a kind of re-imagining of Gnuplot. (I will be writing a post about this and other plotters sometime soon..) Pyxplot used to be written in Python, but the latest version has been completely re-implemented in C. I asked Dominic Ford why, and he explained that it was now ten times faster and took a tenth of the memory. Hard to argue with that.

Python occupies a strange territory between the easy peasy world of “download this app and start clicking” and the stern world of “if you don’t know what a makefile is, you’d better look somewhere else mate”. At first I thought this was precisely its strength : grown up stuff for busy people. But now I ain’t so sure. Neither use nor ornament, as EG used to say.

Over to you Ross.

32 Responses to Python Provocation

  1. Neil says:

    I suspect there will never be a single programming language that is suitable for all programming tasks. Python certainly isn’t suitable for everything, and it has quite a few annoyances. But it’s applicable over a much wider domain and has many fewer annoyances than any other language I’ve tried. If there’s a better language for science, please tell me about it 🙂

    I’ve never heard of Pyxplot, I’ll have a look.

  2. Ross Collins says:

    It’s all true, but I know of no alternative that can compete with Python for both speed of implementation and clarity. An interpreted language isn’t going to be as fast as Fortran or C, but it’s very easy to write a C library that can be accessed as a Python module, as Pyxplot has done. So why not write all the code quickly in Python first, then farm out the time consuming parts to C or Fortran? Numpy/Scipy provide faster means within the Python language to do heavy computation, but a purpose-built C-module is often still faster in these cases. You can even call Fortran functions and subroutines with SciPy’s F2py.

    As for changes to the language… well, no language is perfect from day one. Unhindered by the demands of backwards compatibility each new release improves the language, forcing existing developers to use clearer methods with possibly faster implementations whilst the older deprecated methods are removed, which prevents confusion for new developers. It may be annoying, but it’s the only way to make the language less ugly and more intuitive.

  3. Emil Thorsen says:

    As much as I see the weaknesses of Python, I am very happy I made the jump. The tool of choice where I work is normally IDL, and to that, I see Python/Numpy/SciPy as a big step forward to all these tasks that are not too computationally heavy – ie. all the cases where implementation, not clock cycles, is the time consumer.

    I’ve been using Matplotlib for visualization so far, but PyXplot definitely looks interesting.

  4. Aaron Robotham says:

    I’ll have to admit to a none-too-subtle campaign of converting everyone I work with to R (Simon is a work in progress). I gave Perl a whirl, but never really got on with Python at all. I guess I was sold on R because at heart I know I’m a huge stats geek! check out the list of contributors, it’s no pool of teenager programmers that’s for sure.

    • andyxl says:

      Thought it was just a stats package. Is it more general purpose ?

      • Aaron Robotham says:

        It is actually a fully featured language, but predominantly made by computer scientists and statisticians (hence the geekiness!) I started using it because of the plots too, and just kept going from there. There’s also an R package to call Python, so I guess you can make an infinite loop of language calls if you can’t make your mind up 🙂

        I guess I saw it more as a full replacement for IDL, but it fulfils my Perl (and by extension Python) needs too. The complaint tends to be the learning curve is brutal, but the students in St Andrews seemed to get along fine. I have a plan to do an introductory course in January, with an idea in mind to make it into a SUPA course in the future. Edinburgh isn’t that far away…

      • Ross Collins says:

        R is designed by mathematicians for mathematicians and has the best plotting capabilities I’ve seen amongst all the free languages I know. I wouldn’t choose to use it over Python for any other task though as in my humble opinion its syntax is uglier. Also it’s still an interpreted language so if you want FORTRAN speed for your simulations then you are back to writing C modules – and I don’t know how easy this is in R. Is there an equivalent of Python’s ctypes library where you can just call the compiled C module directly? BTW, I looked into SciPy’s weave once – it just seems like an ugly hack to me – I’d rather compile a C-module myself.

      • Ross Collins says:

        My wife, an avid R fan, disagrees with me even mentioning the word mathematicians in the same sentence as R: “Statisticians, computer scientists, yes. But most mathematicians barely know how to use a computer to check email, let alone develop a programming language.” So I apologise for grouping statisticians in with mathematicians. She also offers this New York Times article on the power of R:

        http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html

      • Aaron Robotham says:

        I’m with your wife on this one. The way you call C/ C++/ fortran is quite simple, you write your subroutine and call it within R. I do this for the linking stage of my FoF code because that’s very loopy (and of course you’re right- very slow in R). We’ve even got all the CFITSIO stuff working, but haven’t got round to releasing it as an official package.

      • Ross Collins says:

        I guess what I mean is that like Fortran and Matlab, R is a great programming language for academics & scientists, but doesn’t offer much in the way of transferable skills for the wider world. You won’t find people using R outside of academia to manage databases or create applications for desktop GUIs, mobile phones or social networks. Whereas Python is truly multi-purpose – you can do maths, statistics and science, but once you’ve learned the language it’s useful for other applications too.

      • Aaron Robotham says:

        That’s definitely true, but programming is really a lesson in problem solving and algorithmic thinking. I think with core skills in how you approach a task, a grounding in any language is enough to get going. It’s fairly common now for IT companies to offer graduate jobs to people without the most relevant programming skills if they can demonstrate the most important abilities. Logic puzzles and lateral thinking tests are quite common now.

      • Ross said “You won’t find people using R outside of academia to manage databases or create applications for desktop GUIs, mobile phones or social networks.”

        Maybe, but you *will* find it heavily used in finance, bioinformatics, and other stat-heavy fields. I call that transferrable, at least in terms of non-astro employment for PhDs.

      • Ross Collins says:

        Sure, R is gradually replacing SAS for professional statistics owing to its fantastic stats modules developed by experts. That doesn’t make it a multi-purpose language – it’s still just as specialised as Matlab and will remain in that niche market along with Fortran and IDL. It’s by far the language of choice for students of statistics, but that doesn’t make it a better language per se. To me, R is to Fortran what Python is to C. R & Fortran were both designed specifically for numerical computing and are very good at it, whereas Python & C try to be useful for everything. As such R has caught the attention of the numerical library developers much more than Python has, which is a shame – as the divide between numerical computing and everything else continues. Is there any intrinsic reason why the fantastic numerical resources for R can’t exist for Python? Is there a good reason why Python is #7 in the Tiobe Index, when R doesn’t feature? Sure, learning any programming language will give you transferable skills, but it takes years to master a language and as a student of physics I would have much preferred to have entered the graduate job market with a couple of years of C or Python on my CV than Fortran, Matlab or R.

    • Jim Geach says:

      There’s a Python wrapper for R, called Rpy (http://rpy.sourceforge.net/). I’m not familiar with R as a language, but I’m impressed with the standard of plots it produces, and so trying to give the pythonic interface a try. Up until now I’ve been using the Python wrapper of pgplot for plotting, but I want to try something new.

      For speed, there is the ‘weave’ option of scipy that allows you to put inline C code right in the python code… this is invaluable for those bottlenecks that cannot be widened by any numpy/scipy alternative. That way you don’t have to revert completely back to C/Fortran – you really can just pick the best of both worlds.

      There’s also psyco (http://psyco.sourceforge.net/) which is meant to speed up python code, but I’ve never found that it works all that well for me.

  5. Paul says:

    Yeah, I’m a python convert from perl, but I do miss the rigorousness of the CPAN crowd; the python module crowd seem a lot more sloppy than the perl module crowd in terms of documentation being up to date and stuff just working and not breaking backwards compatibility and stuff.

    On the other hand, the way perl did argument passing just seemed *naff* to me. 🙂

    Either way, I end up doing all my heavy lifting in compiled C or C++ – mostly that’s numpy, but if I’ve got something really heavy and I feel like reminding myself to put semicolons on the end of every line, I’ll brew my own in C or even C++ if I’m feeling really keen.

    I look forward to the article on plotters (and I don’t mean ye olde X-Y table pen plotters). I often hear people asking each other what they use to generate complex publication quality plots, and no one ever seems to have an answer they’re not embarrassed about, let alone something they’d actually recommend to their friends.

  6. I’ll write here to plug my Python based GUI / scripted plotting package Veusz. I think it gives a nice crossover of a friendly interface combined with scripting if you need it:
    http://home.gna.org/veusz/
    Works on Unix/Linux, Windows and Mac OS X.

  7. John Peacock says:

    I remember reading a possible urban myth that for years the Japanese ran a contest: every time their electronics community came up with a swanky new calculator, they gave it some numerical challenge to compute as a race against a guy with an abacus. The abacus always won.

    No doubt it’s self delusion, but I’ve always fancied that one could knock up a working piece of fortran to accomplish some realistic challenge in numerical computing faster than with any of the newer alternatives. As the author of “real programmers don’t use pascal” said, all those years ago: “If you can’t do it in FORTRAN, do it in assembly language. If you can’t do it in assembly language, it isn’t worth doing”. The Pascal fashion came and went (in physics, at any rate), but fortran persists. Probably it will be the only one of the current languages still in use in another 20 years (and I don’t mean just by me…).

  8. I agree with John. Andy, what was the last version of Fortran you used? IV? 77? 90? 95? 2003? 2008?

    I liked the chap who defined “legacy applications” as “stuff that works”. 🙂

  9. “this article in THES suggesting that we would better off with a lottery is a complete pile of dingo’s kidneys”

    I agree with you, except that the author might be right when he says applicants would prefer a lottery to decisions based on impact.

    The author seems to misunderstand statistics:

    “The latest available research council figures, for 2008-09, show that grant applications are at record levels but success rates are at an all-time low of 23 per cent.”

    To first order, funds are constant, so the first statement implies the second. Duh! Reminds me of Eisenhower being shocked when an aide informed him that fully half the US population were below average in intelligence. (On a par with the spam “promote your web pages” emails: “We’ve moved thousands of sites into the top 10”.

  10. Simon says:

    Very glad to see others cheering on R – for a long while now I’ve been try to convert co-workers and colleagues. I agree that code can look ugly, and it can be harder to learn than other languages of a similar level (perhaps IDL is a reasonable comparison).

    But what finally won me over was the optimisers. I needed to choose a language for my final year UG class on data analysis. The biggest (significant) difference I found was in the quality of the optimisers – how easy they are to set up, how often they crash, get stuck in local minima, etc. Of all the languages I tried or asked about R won hands down. No optimiser is perfect, but nlm() in R is the best all-purpose optimiser I have found so far, the most student-proof. And it’s fast.

    Now my classes can set up a fit statistic, define a model, load some data and optimise the fit in R very quickly – each step can take a single line if you do it properly.

    Is it transferable? Well it’s based on S-PLUS (its commercial predecessor), but as far as I can tell that not so popular now. But I’m sure you could convert to SPSS, SAS (possibly IDL) without too much effort.

    • I’m also a huge fan of R, and have used it for a long time. It has replaced many of the languages/tools I previously used (perl, python, much iraf, and certainly the despicable supermongo). I don’t even find the code looks ugly, but I guess that’s subjective. For astronomy it’s worth noting that the FITSio library is now pretty good.

      • Aaron Robotham says:

        On that note, we’ve written a better implementation in St Andrews- it uses the CFITSIO libraries so it is *much* faster than the CRAN package, and it handles generic FITS files in a much more flexible way.

  11. jz says:

    I love Python for several of the reasons you hint at Andy – one can use (and learn) it interactively, and all the OO stuff is there if one wants it. What I really like is the development cycle – learn/test new functions/syntax interactively, add them to your script and hey presto.

    I’m sure python is often slower than C, but to be honest, most of the time, does it really matter much? But a further refinement to the cycle is to factor out slow parts of the script into C or F. Plus – and this is annoying – there are several ways to speed up Python by being a bit cleverer (i.e. reading relevant blog posts) about one’s syntax.

    PyFITS is a particular highlight, especially now it’s back under active development. Imagine the equivalent Fortran…

    Inevitably we find we need to use numpy/scipy, and I must admit I find its syntax clunky and unpythonic.

    I am disappointed by the plotting options available to Python though, having recently learned matplotlib or some such (again a bit unpythonic). I have resisted PyXPlot, not because of Dominic’s ability to wield a laser pointer, but because I see it as yet another plotting program and am not sure it (yet) has critical mass (would that it did!).

    Python is at its weakest for very short scripts – something Perl would do in one line – where the overhead is rather large for Python.

    Could I persuade you to post your list of syntactic gotchas, not least to compare to the one I keep in my head?

    Now if one could drive Topcat from Python….

  12. andyxl says:

    Jz – what you want is STILTS, which is the command line version of TOPCAT. The commands can be run from the unix command line, or from inside Jython, which is maybe just what you want.

    • jz says:

      Yes.. I have used STILTS somewhat, but I agree Jython is worth looking into (but see http://www.astrobetter.com/atpy-and-asciitable/#comment-4295). Have you used Jython?

      • Mark Taylor says:

        Jython interprets python source code exactly like normal python (CPython). The only practical difference is that you can’t use C-based modules (and you can use Java-based ones, which is why there’s a JyStilts not a PyStilts). Unfortunately, this means that SciPy and NumPy, amongst others, are out, which means it’s no good for most astro/scientific python users :-(.

      • Ross Collins says:

        To add to Mark’s comments. I’m not entirely sure Jython ever got around to fully implementing the CPython standard library (let alone having the third-party libraries such as NumPy/SciPy) and has subsequently fallen a long way behind CPython releases. The current Jython implements version 2.5 of the language whilst the current CPython is up to the 3.1 (and 2.7). However, I heard rumours that several big players were interested in contributing to Jython development to bring it up to the CPython standard, so it may catch up quickly in the near future.

    • The Herschel software suite, “HIPE”, is written in Java (under the bonnet) and provides Jython as the scripting language for processing Herschel data. So it is getting used. A bit.

      http://herschel.esac.esa.int/Data_Processing.shtml

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: