This lesson is in the early stages of development (Alpha version)

Python for Atmosphere and Ocean Scientists

Python is rapidly emerging as the programming language of choice for data analysis in the atmosphere and ocean sciences. By consulting online tutorials and help pages, most researchers in this community are able to pick up the basic syntax and programming constructs (e.g. loops, lists and conditionals). This self-taught knowledge is sufficient to get work done, but it often involves spending hours to do things that should take minutes, reinventing a lot of wheels, and a nagging uncertainty at the end of it all regarding the reliability and reproducibility of the results. To help address these issues, these Data Carpentry lessons cover a suite of programming and data management best practices that aren’t so easy to glean from a quick Google search.

The skills covered in the lessons are taught in the context of a typical data analysis task: creating a command line program that plots the precipitation climatology for any given month, so that two different CMIP5 models (ACCESS1-3 and CSIRO-Mk3-6-0) can be compared visually.

raster vs vector data

These lessons work with raster or “gridded” data that are stored as a uniform grid of values using the netCDF file format. This is the most common data format and file type in the atmosphere and ocean sciences; essentially all output from weather, climate and ocean models is gridded data stored as a series of netCDF files.

The other data type that atmosphere and ocean scientists tend to work with is geospatial vector data. In contrast to gridded raster data, these vector data are composed of discrete geometric locations (i.e. x, y values) that define the shape of a spatial point, line or polygon. They are not stored using the netCDF file format and are not covered in these lessons. Data Carpentry have separate lessons on working with geospatial vector data.

Prerequisites

Participants must already be using Python for their data analysis. They don’t need to be highly proficient, but a strong familiarity with Python syntax and basic constructs such as loops, lists and conditionals (i.e. if statements) is required.

Participants should also read this post prior to the workshop, to familiarise themselves with the most commonly used Python libraries in the atmosphere and ocean sciences and how they relate to one another.

Schedule

Setup Download files required for the lesson
00:00 1. Software installation using conda What are the main Python libraries used in atmosphere and ocean science?
How do I install and manage all the Python libraries that I want to use?
00:25 2. Visualising CMIP data How can I create a quick plot of my CMIP data?
01:10 3. Functions How can I define my own functions?
01:45 4. Command line programs How can I write my own command line programs?
02:35 5. Version control How can I record the revision history of my code?
03:10 6. GitHub How can I make my code available on GitHub?
03:30 7. Vectorisation How can I avoid looping over each element of large data arrays?
04:00 8. Defensive programming How can I make my programs more reliable?
04:35 9. Data provenance How can keep track of my data processing steps?
05:15 Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.