Plotting and Programming in Python: Lesson Design

Help Wanted

We are filling in the exercises below in order to make the lesson plan more concrete. Contributions (both in the form of pull requests with filled-in exercises, and comments on specific exercises, ordering, and timings) are greatly appreciated.

Process Used

Michael Pollan’s advice if he taught R or Python programming:

Write code.

Not too much.

Mostly plots.

— Michael Koontz

This lesson was developed using a slimmed-down variant of the “Understanding by Design” process. The main sections are:

Assumptions about audience, time, etc. (The current draft also includes some conclusions and decisions in this section - that should be refactored.)
Desired results: overall goals, summative assessments at half-day granularity, what learners will be able to do, what learners will know.
Learning plan: each episode has a heading that summarizes what will be covered, then estimates time that will be spent on teaching and on exercises, while the exercises are given as bullet points.

Stage 1: Assumptions

Audience
- Graduate students in numerate disciplines from cosmology to archaeology
- Who have manipulated data in spreadsheets and with interactive tools like SAS
- But have not programmed beyond CPD (copy-paste-despair)
Constraints
- One full day 09:00-16:30
  - 06:15 class time
  - 0:45 lunch
  - 0:30 total for two coffee breaks
- Learners use native installs on their own machines
  - May use VMs or cloud resources at instructor’s discretion
  - But must keep native local install as an option
- No dependence on other Carpentry modules
  - In particular, does not require knowledge of shell or version control
- Use the Jupyter Notebook
  - Authentic tool used by many instructors
  - There isn’t really an alternative
  - And means that even people who have seen a bit of Python before will probably learn something
Motivating Example
- Creating 2D plots suitable for inclusion in papers
- Appeals to almost everyone
- Makes lesson usable by both Carpentries
  - And means that even people who have seen a bit of Python before will probably learn something
Data
- Use the gapminder data throughout
- But break into multiple files by continent
  - To make display of output from examples tidier (e.g., use Australia/New Zealand, which is only two lines)
  - And allow examples showing use of multiple data sets
Focus on Pandas instead of NumPy
- Makes lesson usable by both Data Carpentry and Software Carpentry
- Genuine novices are likely to want data analysis
- And people with some prior experience:
  - will accept data analysis as an authentic task,
  - and are unlikely to have encountered Pandas, so they’ll still get something useful out of the lesson
Challenges will mostly not be “write this code from scratch”
- Want lots of short exercises that can reliably be finished in allotted time
- So use MCQs, fill-in-the-blanks, Parsons Problems, “tweak this code”, etc.

Stage 2: Desired Results

Questions

How do I…

…read tabular data?
…plot a single vector of values?
…create a time series plot?
…create one plot for each of several data sets?
…extra data from a single data set for plotting?
…write programs I can read and re-use in future?

Skills

I can…

…write short scripts using loops and conditionals.
…write functions with a fixed number of parameters that return a single result.
…import libraries using aliases and refer to those libraries’ contents.
…do simple data extraction and formatting using Pandas.

Concepts

I know…

…that a program is a piece of lab equipment that implements an analysis
- Needs to be validated/calibrated before/during use
- Makes analysis reproducible, reviewable, shareable
…that programs are written for people, not for computers
- Meaningful variable names
- Modularity for readability as well as re-use
- No duplication
- Document purpose and use
…that there is no magic: the programs they use are no different in principle from those they build
…how to assign values to variables
…what integers, floats, strings, NumPy arrays, and Pandas dataframes are
…how to trace the execution of a for loop
…how to trace the execution of if/else statements
…how to create and index lists
…how to create and index NumPy arrays
…how to create and index Pandas dataframes
…how to create time series plots
…the difference between defining and calling a function
…where to find documentation on standard libraries
…how to find out what else scientific Python offers

Stage 3: Learning Plan

Summative Assessment

Midpoint: create time-series plot for each file in a directory.
Final: extract data from Pandas dataframe and create comparative multi-line time series plot.

Running and Quitting Interactively (9:00)

Teaching: 15 min (because setup issues)
- Launch the Jupyter Notebook, create new notebooks, and exit the Notebook.
- Create Markdown cells in a notebook.
- Create and run Python cells in a notebook.
Challenges: 0 min (accounted for in teaching time - no separate exercise)
- Creating lists in Markdown
- What is displayed when several expressions are put in a single cell?
- Change an existing cell from code to Markdown
- Rendering LaTeX-style equations

Variables and Assignment (9:15)

Teaching: 10 min
- Write programs that assign scalar values to variables and perform calculations with those values.
- Correctly trace value changes in programs that use scalar assignment.
Challenges: 10 min
- Trace execution of code swapping two values using an intermediate variable.
- Predict final values of variables after several assignments.
- What happens if you try to index a number?
- Which is a better variable name, m, min, or minutes?
- What do the following slice expressions produce?

Data Types and Type Conversion (09:35)

Teaching: 10 min
- Explain key differences between integers and floating point numbers.
- Explain key differences between numbers and character strings.
- Use built-in functions to convert between integers, floating point numbers, and strings.
Challenges: 10 min
- What type of value is 3.4?
- What type of value is 3.25 + 4?
- What type of value would you use to represent:
  - Number of days since the start of the year.
  - Time elapsed since the start of the year.
  - Etc.
- How can you use // (integer division) and % (modulo)?
- What does int("3.4") do?
- Given these float, int, and string values, which expressions will print a particular result?
- What do you expect 1+2j + 3 to produce?

Built-in Functions and Help (09:55)

Teaching: 15 min
- Explain the purpose of functions.
- Correctly call built-in Python functions.
- Correctly nest calls to built-in functions.
- Use help to display documentation for built-in functions.
- Correctly describe situations in which SyntaxError and NameError occur.
Challenges: 10 min
- Explain the order of operations in the following complex expression.
- What will each nested combination of min and max calls produce?
- Why don’t max and min return None when given no arguments?
- Given what we have seen so far, what index expression will get the last character in a string?

Coffee: 15 min (10:20)

Libraries (10:35)

Teaching: 10 min
- Explain what software libraries are and why programmers create and use them.
- Write programs that import and use libraries from Python’s standard library.
- Find and read documentation for standard libraries interactively (in the interpreter) and online.
Challenges: 10 min
- What function from the standard math library could you use to calculate a square root?
- What library would you use to select a random value from data?
- If help(math) produces an error, what have you forgotten to do?
- Fill in the blanks in code below so that the import statement and program run.

Reading Tabular Data (10:55)

Teaching: 10 min
- Import the Pandas library.
- Use Pandas to load a simple CSV data set.
- Get some basic information about a Pandas DataFrame.
Challenges: 10 min
- Read the data for the Americas and display its summary statistics.
- What do .head and .tail do?
- What string(s) should you pass to read_csv to read files from other directories?
- How can you write CSV data?

DataFrames (11:15)

Teaching: 15 min
- Select individual values from a Pandas dataframe.
- Select entire rows or entire columns from a dataframe.
- Select a subset of both rows and columns from a dataframe in a single operation.
- Select a subset of a dataframe by a single Boolean criterion.
Challenges: 15 min
- What expression will find the Per Capita GDP of Serbia in 2007?
- What rule governs what is (or isn’t) included in numerical and named slices in Pandas?
- What does each line in the following short program do?
- What do idxmin and idxmax do?
- Write expressions to get the GDP per capita for all countries in 1982, for all countries after 1985, etc.
- Given the way its borders have changed since 1900, what would you do if asked to create a table of GDP per capita for Poland for the Twentieth Century?

Plotting (11:45)

Teaching: 15 min
- Create a time series plot showing a single data set.
- Create a scatter plot showing relationship between two data sets.
Exercise: 15 min
- Fill in the blanks to plot the minimum GDP per capita over time for European countries.
- Modify the example to create a scatter plot of GDP per capita in Asian countries.
- Explain what each argument to plot does in the following example.

Lunch (12:15): 45 min

Lists (13:00)

Teaching: 10 min
- Explain why programs need collections of values.
- Write programs that create flat lists, index them, slice them, and modify them through assignment and method calls.
Challenges: 10 min
- Fill in the blanks so that the program produces the output shown.
- How large are the following slices?
- What do negative index expressions print?
- What does a “stride” in a slice do?
- How do slices treat out-of-range bounds?
- What are the differences between sorting these two ways?
- What is the difference between new = old and new = old[:]?

Loops (13:20)

Teaching: 10 min
- Explain what for loops are normally used for.
- Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration.
- Write for loops that use the Accumulator pattern to aggregate values.
Challenges: 15 min
- Is an indentation error a syntax error or a runtime error?
- Trace which lines of this program are executed in what order.
- Fill in the blanks in this program so that it reverses a string.
- Fill in the blanks in this series of examples to get practice accumulating values.
- Reorder and indent these lins to calculate the cumulative sum of the list values.

Looping Over Data Sets (13:45)

Teaching: 5 min
- Be able to read and write globbing expressions that match sets of files.
- Use glob to create lists of files.
- Write for loops to perform operations on files given their names in a list.
Challenges: 10 min
- Which filenames are not matched by this glob expression?
- Modify this program so that it prints the number of records in the shortest file.
- Write a program that reads and plots all of the regional data sets.

Writing Functions (14:00)

Teaching: 10 min
- Explain and identify the difference between function definition and function call.
- Write a function that takes a small, fixed number of arguments and produces a single result.
Challenges: 15 min
- This code defines and calls a function - what does it print when run?
- Explain why this short program prints things in the order it does.
- Fill in the blanks to create a function that finds the minimum value in a data file.
- Fill in the blanks to create a function that finds the first negative value in a list. What does your function do if the list is empty?
- Why is it sometimes useful to pass arguments by naming the corresponding parameters?
- Fill in the blanks and turn this short piece of code into a function.

Variable Scope (14:25)

Teaching: 10 min
- Identify local and global variables.
- Identify parameters as local variables.
- Read a traceback and determine the file, function, and line number on which the error occurred.
Challenges: 10 min
- Trace the changes to the values in this program, being careful to distinguish local from global values.

Coffee (14:45): 15 min

Conditionals (15:00)

Teaching: 10 min
- Correctly write programs that use if and else statements and simple Boolean expressions (without logical operators).
- Trace the execution of unnested conditionals and conditionals inside loops.
Challenges: 15 min
- Trace the execution of this conditional statement.
- Fill in the blanks so that this function replaces negative values with zeroes.
- Modify this program so that it only processes files with fewer than 50 records.
- Modify this program so that it always finds the largest and smallest values in a list no matter what the list’s values are.

Programming Style (15:25)

Teaching: 15 min
- How can I make my programs more readable?
- How do most programmers format their code?
- How can programs check their own operation?
Challenges: 15 min
- Which lines in this code will be available as online help?
- Turn the comments in this program into docstrings.
- Rewrite this short program to be more readable.

Wrap-Up (15:55)

Teaching: 20 min
- Name and locate scientific Python community sites for software, workshops, and help.
Challenges: 0 min
- None.

Feedback (16:15)

Teaching: 0 min
Challenges: 15 min
- Collect feedback

Plotting and Programming in Python: Lesson Design

Help Wanted

Process Used

Stage 1: Assumptions

Stage 2: Desired Results

Questions

Skills

Concepts

Stage 3: Learning Plan

Summative Assessment

Running and Quitting Interactively (9:00)

Variables and Assignment (9:15)

Data Types and Type Conversion (09:35)

Built-in Functions and Help (09:55)

Coffee: 15 min (10:20)

Libraries (10:35)

Reading Tabular Data (10:55)

DataFrames (11:15)

Plotting (11:45)

Lunch (12:15): 45 min

Lists (13:00)

Loops (13:20)

Looping Over Data Sets (13:45)

Writing Functions (14:00)

Variable Scope (14:25)

Coffee (14:45): 15 min

Conditionals (15:00)

Programming Style (15:25)

Wrap-Up (15:55)

Feedback (16:15)

Finish (16:30)