sky blue trades

A Haskeller in Pythonland

I recently did some work for Andy Ridgwell, an old colleague from Bristol, writing a build and configuration system and GUI for a medium-sized climate model called GENIE. GENIE is an EMIC, an Earth system Model of Intermediate Complexity. It’s about 55,000 lines of Fortran and includes models of the atmosphere and ocean plus models of atmospheric chemistry and biogeochemistry in the ocean and ocean sediments.

This model had been in use for some years by different groups, and the infrastructure around it had become quite baroque. Andy wanted this tidied up and made nice (i.e. rewritten…) to make the model easier to set up and use. He also wanted a cross-platform GUI for configuring and running the model, allowing you to keep track of the model state in real-time, to pause and restart model runs, changing the model configuration in between, and so on.

A major consideration for this work was that as well as being easy to use the new system had to be easy to install (on both Linux and Windows) and easy for scientists to hack on. That ruled out Haskell, my usual tool of choice. I decided to use Python instead, for a couple of reasons.

C2HS Tutorial Ideas

One of the things that C2HS is lacking is a good tutorial. So I’m going to write one (or try to, anyway).

To make this as useful as possible, I’d like to base a large part of the tutorial on a realistic case study of producing Haskell bindings to a C library. My current plan is to break the tutorial into three parts: the basics, the case study and “everything else”, for C2HS features that don’t get covered in the first two parts. To make this even more useful, I’d like to base the case study on a C library that someone actually cares about and wants Haskell bindings for.

The requirements for the case study C library are:

  1. There shouldn’t already be Haskell bindings for it – I don’t want to duplicate work.

  2. The C library should be “medium-sized”: big enough to be realistic, not so big that it takes forever to write bindings.

  3. The C library should be of medium complexity. By this, I mean that it should have a range of different kinds of C functions, structures and things that need to be made accessible from Haskell. It shouldn’t be completely trivial, and it should require a little thought to come up with good bindings. On the other hand, it shouldn’t be so unusual that the normal ways of using C2HS don’t work.

  4. Ideally it should be something that more than one person might want to use.

  5. It needs to be a library that’s available for Linux. I don’t have a Mac and I’m not that keen on doing something that’s Windows-only.

Requirements #2 and #3 are kind of squishy, but it should be fairly clear what’s appropriate and what’s not: any C library for which you think development of Haskell bindings would make a good C2HS tutorial case study is fair game.

If you have a library you think would be a good fit for this, drop me an email, leave a comment here or give me a shout on IRC (I’m usually on #haskell as iross or iross_ or something like that).

Q1 2015 Review

I’ve started doing a new thing this year to try to help with “getting things done”. I normally have a daily to-do list and a list of weekly goals from which I derive my daily tasks, but I’ve also now started having a list of quarterly goals to add another layer of structure. Three months is a good timespan for medium-term planning, and it’s very handy to have that list of quarterly goals in front of you (I printed it out and stuck it to the front of my computer so it’s there whenever I’m working). Whatever you’re doing, you can think “Is this contributing to fulfilling one of my goals?” and if the answer is “No, watching funny cat videos is not among my goals for this quarter”, it can be a bit of a boost to get you back to work.

So, how did I do? Not all that badly, although there were a couple of things that fell by the wayside.

Non-diffusive atmospheric flow #15: Wrap-up

OK, so we’re done with this epic of climate data analysis. I’ve prepared an index of the articles in this series, on the off chance that it might be useful for someone.

The goal of this exercise was mostly to try doing some “basic” climate data analysis tasks in Haskell, things that I might normally do using R or NCL or some cobbled-together C++ programs. Once you can read NetCDF files, a lot of the data manipulation is pretty easy, mostly making use of standard things from the hmatrix package. It’s really not any harder than doing these things using “conventional” tools. The only downside is that most of the code that you need to write to do this stuff in Haskell already exists in those “conventional” tools. A bigger disadvantage is that data visualisation tools for Haskell are pretty thin on the ground – diagrams and Chart are good for simpler two-dimensional plots, but maps and geophysical data plotting aren’t really supported at all. I did all of the map and contour plots here using UCAR’s NCL language which although it’s not a very nice language from a theoretical point of view, has built-in capabilities for generating more or less all the plot types you’d ever need for climate data.

I think that this has been a reasonably useful exercise. It helped me to fix a couple of problems with my hnetcdf package and it turned up a bug in hmatrix. But it went on a little long – my notes are up to 90 pages. (Again: the same thing happened on the FFT stuff.) That’s too long to maintain interest in a problem you’re just using as a finger exercise. The next thing I have lined up should be quite a bit shorter. It’s a problem using satellite remote sensing data, which is always fun.

Non-diffusive atmospheric flow #14: Markov matrix calculations

This is going to be the last substantive post of this series (which is probably as much of a relief to you as it is to me…). In this article, we’re going to look at phase space partitioning for our dimension-reduced Z500Z_{500} PCA data and we’re going to calculate Markov transition matrices for our partitions to try to pick out consistent non-diffusive transitions in atmospheric flow regimes.

C2HS 0.25.1 "Snowmelt"

I took over the day-to-day support for C2HS about 18 months ago and have now finally cleaned up all the issues on the GitHub issue tracker. It took a lot longer than I was expecting, mostly due to pesky “real work” getting in the way. Now seems like a good time to announce the 0.25.1 “Snowmelt” release of C2HS and to summarise some of the more interesting new C2HS features.

Constraint kinds and associated types

This is going to be the oldest of old hat for the cool Haskell kids who invent existential higher-kinded polymorphic whatsits before breakfast, but it amused me, and it’s the first time I’ve used some of these more interesting language extensions for something “real”.

Non-diffusive atmospheric flow #13: Markov matrix examples

(There’s no code in this post, just some examples to explain what we’re going to do next.)

Suppose we define the state of the system whose evolution we want to study by a probability vector 𝐩(t)\mathbf{p}(t) – at any moment in time, we have a probability distribution over a finite partition of the state space of the system (so that if we partition the state space into NN components, then 𝐩(t)N\mathbf{p}(t) \in \mathbb{R}^N). Evolution of the system as a Markov chain is then defined by the evolution rule

𝐩(t+Δt)=𝐌𝐩(t),(1) \mathbf{p}(t + \Delta{}t) = \mathbf{M} \mathbf{p}(t), \qquad (1)

where 𝐌N×N\mathbf{M} \in \mathbb{R}^{N \times N} is a Markov matrix. This approach to modelling the evolution of probability densities has the benefit both of being simple to understand and to implement (in terms of estimating the matrix 𝐌\mathbf{M} from data) and, as we’ll see, of allowing us to distinguish between random “diffusive” evolution and conservative “non-diffusive” dynamics.

We’ll see how this works by examining a very simple example.

Non-diffusive atmospheric flow #12: dynamics warm-up

The analysis of preferred flow regimes in the previous article is all very well, and in its way quite illuminating, but it was an entirely static analysis – we didn’t make any use of the fact that the original Z500Z_{500} data we used was a time series, so we couldn’t gain any information about transitions between different states of atmospheric flow. We’ll attempt to remedy that situation now.

What sort of approach can we use to look at the dynamics of changes in patterns of Z500Z_{500}? Our (θ,ϕ)(\theta, \phi) parameterisation of flow patterns seems like a good start, but we need some way to model transitions between different flow states, i.e. between different points on the (θ,ϕ)(\theta, \phi) sphere. Each of our original Z500Z_{500} maps corresponds to a point on this sphere, so we might hope that we can some up with a way of looking at trajectories of points in (θ,ϕ)(\theta, \phi) space that will give us some insight into the dynamics of atmospheric flow.

Non-diffusive atmospheric flow #11: flow pattern visualisations

A quick post today to round off the “static” part of our atmospheric flow analysis.

Now that we’ve satisfied ourselves that the bumps in the spherical PDF in article 8 of this series are significant (in the narrowly defined sense of the word “significant” that we’ve discussed), we might ask what to sort of atmospheric flow regimes these bumps correspond. Since each point on our unit sphere is really a point in the three-dimensional space spanned by the first three Z500Z_{500} PCA eigenpatterns that we calculated earlier, we can construct composite maps to look at the spatial patterns of flow for each bump just by combining the first three PCA eigenpatterns in proportions given by the “(x,y,z)(x, y, z)” coordinates of points on the unit sphere.

Site content copyright © 2011-2013 Ian Ross       Powered by Hakyll