sky blue trades

Haskell data analysis: Reading NetCDF files

I never really intended the FFT stuff to go on for as long as it did, since that sort of thing wasn’t really what I was planning as the focus for this Data Analysis in Haskell series. The FFT was intended primarily as a “warm-up” exercise. After fourteen blog articles and about 10,000 words, everyone ought to be sufficiently warmed up now…

Instead of trying to lay out any kind of fundamental principles for data analysis before we get going, I’m just going to dive into a real example. I’ll talk about generalities as we go along when we have some context in which to place them.

All of the analysis described in this next series of articles closely follows that in the paper: D. T. Crommelin (2004). Observed nondiffusive dynamics in large-scale atmospheric flow. J. Atmos. Sci. 61(19), 2384–2396. We’re going to replicate most of the data analysis and visualisation from this paper, maybe adding a few interesting extras towards the end.

It’s going to take a couple of articles to lay out some of the background to this problem, but I want to start here with something very practical and not specific to this particular problem. We’re going to look at how to gain access to meteorological and climate data stored in the NetCDF file format from Haskell. This will be useful not only for the low-frequency atmospheric variability problem we’re going to look at, but for other things in the future too.

Link Round-up

Here’s a mixed bag of interesting links, some sciencey, some mathsy, some miscellany:

  1. Network Rail Virtual Archives: OK, this might not, at first sight, sound like something interesting, but it really is. This site has original Victorian-era engineering drawings for a whole range of British railway infrastructure. Bridges, viaducts, stations, tunnels. All rendered in lovely 19th Century penmanship. The Forth Bridge is particularly nice.

  2. open.NASA: A couple of years ago, NASA started a project to open-source code and data from their Earth observing and planetary missions. Open.NASA is gateway to these resources. I’ve not had a chance to look at it in huge detail yet, but there is a lot of stuff there. The list of projects on the code.NASA part looks particularly entertaining.

  3. Game of Primes: Giganotosaurus is a science fiction site that publishes one (longish) short story each month. They’re often very good, and this one was particularly striking – it’s quite beautifully done, full of mystery, and feels like it could be a part of something much larger and deeper.

  4. Surprising connections in mathematics: This one is a bit more technical, from the Math Overflow Q&A website. A lot of the connections people mention are very technical, but some are more accessible, for instance the link between algebra and geometry developed by Descartes and others in the 17th Century. This is something we learn about in school, and something that we don’t think about too much because it seems “obvious”. Only obvious in retrospect, of course, since it took hundreds of years for the connection to be discovered!

  5. De Bruijn grids and tilings: Another technical one, but very interesting. Aperiodic tilings of the plane, like Penrose tilings, are slightly mysterious. This article gives a really clear description of one systematic method for generating such tilings. It’s a very odd and intriguing little bit of mathematics.

  6. Atul Gawande on end-of-life care: Atul Gawande is one of my favourite writers on medical and ethical issues. This article is quite long, but well worth a read.

Command and Control

by Eric Schlosser

My reading list recently has been chock-full of light-hearted and mood-lifting material: some Irvine Welsh novels (always guaranteed to shed a gentle light on all that’s best about the human condition), a long book about clinical depression, M. R. Carey’s interesting sort-of-zombie apocalypse/extreme mycology novel, The Girl With All The Gifts, de Becker’s The Gift Of Fear, a book all about fear and violence, and Piper Kerman’s prison memoir, Orange Is The New Black (which did spoil the mood a little having a few sparks of hope in among the gloom).

Among all this bleakness and blackness, Command and Control somehow manages to stand out as a particularly grim monument to human folly and our collective crimes against all sense and reason. It’s a book about nuclear weapons, so it never really had much chance of being too jolly, but even so, Schlosser’s decision to focus in parallel on US nuclear doctrine and nuclear weapons safety makes for some horrifying reading. It’s something of a mystery how we made it through the Cold War without either a “hot” war or at least some sort of unintended detonation of a nuclear weapon.

Many Books & Their Reviews #2

Second round of “many books”…

Many Books & Their Reviews #1

I’ve been doing quite a bit of reading lately, so I have 28 novels to review! All but one are from series of novels, so that’s not quite as daunting as it sounds. Still, I’ll split this into two posts to make it manageable.

Getting From Here To There

In particular, getting from where you are now to where you want to be, in terms of your career.

As a result of an email I sent to the Haskell-Cafe mailing list a couple of weeks ago looking for someone to take over a contract I had been working on, someone contacted me asking for career advice. Clearly not someone who knew me at all, otherwise they would have known what a crazy idea that was. Anyway, this person was asking about one of the fundamental problems when you’re starting out in more or less any profession: how do you acquire the experience you need to apply for jobs that say “experience required”, which is more or less all of them?

They asked: “What is the path to getting involved in this stuff? How do I bridge the gap from just playing around with these technologies to having real world experience? It seems that most opportunities are for people with experience.” And this is exactly right. Particularly for contracting, no-one wants to hire someone they think will have to learn on the job. You need to know what you’re doing, which means getting experience somehow. And it would of course be nice to be able to eat and have a life while getting that experience.

I wrote an epic email in reply, and was told that it would have worked better as a blog post (or perhaps a short novel). So here I am, turning it into a blog post!

Involuntary Hiatus Hiccup

It’s more than two months since I last wrote a blog article. I’ve been ridiculously busy since then and things are only just now calming down. It now looks as though I’m going to try something new, at least for three months or so, and that should provide more time for blogging. I had to drop more or less all of my personal projects for the last couple of months, which has been frustrating (no work on my data analysis book, no work on arb-fft, very little work on C2HS, a huge backlog of technical reading piling up and up and up like some Tower of Techno-Babel). Things should get back to something more like normal from now on though.

One benefit of working like a donkey for the last couple of months is that I now have a bit of money in the bank, and I’m planning to use that financial window to push some personal projects forwards. I have a few ideas, starting with “finishing” arb-fft and getting back to some work on my book. I’ll do a couple of days of paid work a week, do a bit of open-source stuff (C2HS and Hackage mostly) and work on those personal projects. And blogging. There will be blogging.

Starting tomorrow. Now though, I’m going to go outside and lie myself down in the sunshine.

Fatherland and HHhH

by Robert Harris; Laurent Binet & Sam Taylor

These two books are tied together by the name of Reinhard Heydrich. I can’t think of a polite way of describing Heydrich. He was one of the architects of the Holocaust, a fervid Nazi, and an all-round total bastard. Hitler called him “the man with the iron heart”, which gives you some kind of idea of what kind of a git he was.

In the real world, Heydrich dies in 1942 from injuries sustained during an assassination attempt by Czech and Slovak commandos while he was “Acting Reich Protector of Bohemia and Moravia”. In the world of Fatherland, he survives, the Germans win the Second World War and Europe languishes under Nazi rule, with a sympathetic administration in the USA led by Kennedy (père rather than fils) providing no effective check on their activities. Heydrich continues in his role as second-in-command of the SS and goes on being as much as a bastard as ever.

Fatherland was published in 1992, so I’m a little late to the party (for a change). Alternative history novels set around WW2 are a popular genre, but Harris does something interesting and different. He manages to avoid any of the obvious missteps in representing a Nazified Europe by writing what starts out as a straight police procedural. The Thousand Year Reich still needs plods, apparently. Well, it’s a police state, so you do need some police. The slight twist is that the Kripo (Kriminalpolizei) was subsumed into the SS in 1939, thus becoming a sister agency to the Gestapo and so under the overall control of Himmler and his sidekick Heydrich. Our hero, Xavier March, wears the feared uniform of the SS, rationalising that he’d rather do some good wearing the uniform than no good at all.

The plot isn’t terribly unpredictable, although it’s uncovered in stages, so it’s not immediately obvious what’s going on. An upcoming summit between Hitler and Kennedy precipitates mild discomfort within the Nazi hierarchy over the Final Solution: everyone in Germany talks about the Jews “going to the East”, if they talk about the Jews at all, but that’s not quite euphemistic enough a cover-up for the kind of high-level negotiations that are coming up. Heydrich conceives a spectacularly brilliant and quintessentially Nazi solution: let’s destroy all the documents (well, that bit isn’t quite so typical Nazi, but this is a serious enough problem that we can tolerate a little disruption in the paper trail), and murder all the high-level bureaucrats involved (basically everyone who was at the Wannsee Conference, apart from Heydrich and the higher higher-ups, of course). Cue slapstick confusion between Gestapo on the one hand (killing inconvenient bureaucrats) and Kripo on the other (trying to figure out why all these fat old Nazis are getting bumped off).

As one might imagine, things don’t end so well for March. At the end of the book, after a good going-over from Heydrich and his Gestapo buddies, it looks like he won’t be drawing a whole lot of that SS pension. However, he has succeeded in getting information about the Holocaust out (not called that in Alternative Earth Germany, of course), which will soon lead to widespread condemnation of the Nazi regime, a breakdown in negotiations between the US and Germany and a new world order. You think? Well, perhaps not. If experience on Real Earth is any guide, clear and detailed documentation of atrocities usually leads to, well, not much. A stiff editorial in the Guardian. Questions in the Lords. That sort of thing. Real change, not so much.

Fortunately, of course, Heydrich didn’t survive. The evil shit died in 1942. HHhH (Himmlers Hirn heißt Heydrich) is real history, rather than alternative history, and tells the story of how that happened.

Now, a straight “history of the killing of Reinhard Heydrich” might be of interest to historians, but HHhH goes well beyond that.

Historical fiction, alternative history and “real” history have something of a funny relationship. Writing good historical fiction is exceedingly difficult. If you write a conventional novel, you get to choose the plot, the characters, the events you portray, how you render the dialogue, basically everything that goes into your book. To a great extent, you can do the same in alternative history. You choose a jumping off point where your alternative world diverges from ours and off you go. You can use historical characters without worrying too much about historical authenticity – who can say exactly how Heydrich would have responded in any given situation, if it’s a situation that never existed in the real world and thus to which there can be no witnesses? You can just make it up.

For a historical novel, you certainly don’t get to choose the plot, and you’re constrained as far as characters and events go too. You can do what C. J. Sansom does in his Matthew Shardlake novels or Patrick O’Brien in the Aubrey-Maturin novels and invent incidental characters to focus on, using the history more as a backdrop than an integral part of the novel. That gives you a great deal of freedom to write what you will while still exploiting the atmosphere and mores of the period you’re setting the novel in to add a bit of glamour. Or, you can do what Hilary Mantel did in her Wolfe Hall novels. This involves thousands of index cards recording the most trivial of recorded events in the lives of the most minor of characters of the period and an effort to weave all of those strands into a sort of “maximally historical” narrative. The fact that the Wolfe Hall novels are successful as novels within these constraints is a testament to Mantel’s genius.

And then there’s real history, where you may have to forego some of the requirements of good writing in order to present the known facts in sufficient detail to support whatever thesis you want to present. At this level, you care about details. Counting the buttons on uniforms in archive photographs may tell you which factory produced those uniforms and when, giving insight into logistics and supply. Trawling through thousands of pages of agricultural production records may help you to identify a hidden famine. Of course, you then take away the scaffolding, hiding the details. The process of tracking down all of those facts, the worries that you may have missed something, the controversies, the lacunae, the not-quite-justified leaps of logic, all of those are swept under the carpet.

HHhH doesn’t do this sweeping away. The scaffolding is there in plain sight in the form of short “writing of” chapters interspersed between the conventional narrative chapters describing the assassination attempt. It works really well, mostly because of the worried, slightly paranoid tone that Binet has. He worries about the number of buttons. He worries about what person X said to person Y on occasion Z. He doesn’t want to say that a certain German officer on the Eastern Front was driving an Opel if he doesn’t have incontrovertible evidence that said officer really was driving an Opel.

Now, handled badly, this kind of thing could descend into a sort of annoying historical pettifoggery that would serve only to distract from the real story. Here though, it’s handled well, and serves more to emphasise the malleability of history and the importance of rigour. On the face of it, whether it was an Opel or a Volkswagen doesn’t matter. But where do you stop? If you write, definitively, that it was an Opel, then some future historian researching German industrial production during WW2 may take that as evidence that Opel were producing a given vehicle at a given time from a given factory. Which may be untrue. And which may lead this hypothetical historian to draw incorrect conclusions. So that would be bad history.

What makes all this the exact opposite of annoying, indeed incredibly engaging, is that Biney truly appears to care about the people whose stories he has taken it upon himself to tell. Getting things wrong due to inattention or lack of diligence would be an insult to the people of Lidice, and to the memories of the Czech and Slovak soldiers (and many others) who gave their lives in the effort to remove Heydrich’s boot from their peoples’ shoulders.

Binet was writing HHhH around the time when Jonathan Littell was being lauded to the skies for Les Bienveillantes (in English, The Kindly Ones). Dealing as it does with the Holocaust, the war in Russia and other matters WW2-ish, this was of great interest to Binet. There are some very funny, although eventually unpublished (you can find them on the web with a little Googling) sections of HHhH where Binet frets about the likely reception of his book after the success of Littell’s book, and he has a good little bitching session about Littell’s slapdash approach to historical accurary. Slapdash in comparison to Binet, that is, so probably well within the bounds of factual accuracy any sane person would expect from a historical novel. The Opel example comes from this stuff – Littell talks about some officer’s car and Binet starts to worry that he’s missed a source that describes the car that this (real) officer was driving at the time. And if he’s missed that, what ever else might he have missed?

Now, I thought that The Kindly Ones was a brilliant novel, and the voice of the narrator was really strong and interesting (the phrase “unreliable narrator” doesn’t really do justice to secretly homosexual ex-SS officers with direct responsibility for the killing of Jews during the Holocaust living in hiding in France…), but HHhH really is something sui generis. Binet pulls off a great trick, in making us care about counting buttons (metaphorically) by tying that quest for historical fidelity to a respect for and love of the people whose stories he tells.

Using Images in Haddock Documentation on Hackage

A useful but little used feature of Haddock is the ability to include inline images in Haddock pages. Here are a few examples. You can use images for diagrams or for inserting mathematics into your documentation in a readable way. In order to use an image, you just put <<path-to-image>> in a Haddock comment. The image can be of any format that browsers support: PNG, JPEG, SVG, whatever.

Until the most recent (1.18) version of Cabal, using this feature was kind of a pain because there was no way to ensure that your image files ended up in a reasonable place in the documentation tree. That meant either hosting the images somewhere other than Hackage, or inserting the image data directly into your Haddock comments, which is painful in the extreme.

Now though, Cabal has a new extra-doc-files option you can specify in the Cabal file for your package, which lists files that should be copied from your source tree into the documentation tarball. That means you can get your image files into just the right place with no pain at all.

As an example, in the arb-fft package I use a couple of SVG images to render some equations in the documentation. In the source tree there’s a directory called doc-formulae that contains a couple of bits of LaTeX and a Makefile that processes them and uses dvisvgm to turn the resulting DVI output into SVG files. The Cabal file contains a line saying extra-doc-files: doc-formulae/*.svg which ensures that the SVG images end up in the Haddock documentation tarball. I can then refer to them in Haddock comments as something like <<doc-formulae/fft-formula.svg>>, which is really convenient.

This should now work for any new uploads to Hackage after a fix last night by Duncan Coutts.

Functional Differential Geometry

by Gerald Jay Sussman & Jack Wisdom (with Will Farr)

This book follows very much in the mould of Sussman & Wisdom’s Structure and Interpretation of Classical Mechanics in that it’s an attempt to take a body of mathematics (differential geometry here, classical mechanics in SICM) and to “computationalise” it, i.e. to use computer programs to make the mathematical structures involved manifest in a way that a more traditional approach to these subjects doesn’t.

This computational viewpoint is a very powerful one to adopt, for a number of reasons.

Site content copyright © 2011-2013 Ian Ross       Powered by Hakyll