"Scripting" vs. "Programming" in python, and the relation to Jupyter Notebooks

Hi all,

I’ve been teaching now for several years with Jupyter notebooks, and also using them myself. In my efforts to convince more people to use notebooks, I have encountered a lot of confusion about when / how to use them, and also why people should use them at all and not just use “scripting” and an “IDE”.

I’ve spent some time thinking about this and have collected some thoughts on this, and
in particular even what the word “scripting” means, which I would like to share here. Any thoughts / comments / discussion is welcome!

Cheers,
Gary

Scripting vs. coding

What is scripting? I was trying to explain this to the TAs of my first year python course. To try to help, I did some googling, but then after my google searching, I was myself even more confused about how to define “scripting” and how it is different than “programming” or “coding”…and it seems it’s not so easy to define…

https://stackoverflow.com/questions/20829541/scripting-vs-coding

Some of this is also historical. In the olden days, there was a clear difference: “bash” was scripting, “C” was programming. Interpreted was “scripting”, compiled was “programming”.

But developments in high level interpreted languages combined with massively faster computers has changed all of that in practice. So what is a good description for the difference between “scripting” and “programming” for python?

Here are some of my thoughts:

I would use the word “scripting” for the case in which I have a single flat python file that I run non-interactively by executing it in one go on the python interpreter:

$ python  my_script.py

It would take inputs and produce outputs. You could run it by pushing a button in spyder, or by using the command line.

A script of course contains lines of python code

What things would I not use the word “script” for?

I would definitely not use the word “script” for code that defines “library” functions. For example, a python file that I then import functions from. This code would typically not “do” anything other than define functions. It would make no sense to run it on the command line (although you could).

I would also not typically use the word “script” for a python file that interacts with the user. For example, if the code opens up a window, allow the user to plot stuff, add labels, save a file, open another file, etc, and continues until the user pushed an “exit” button, then I would not call this a “python script” but instead a “python program”

Jupyter Notebooks

And then finally, we have jupyter notebooks. In general, the code that we put in the code cells of jupyter notebooks are small and “do stuff”. Like a script. But they are not, in my picture above, a “script”: you cannot execute them on the command line. (Although you can run a notebook non-interactively from the command line if you are a nerd…but not a typical usage.) The code block in a jupyter notebook can be built to run independently (a single code block that you push the run button and it does everything). In this case, each code block is “like” a script. And the notebook then can contain many scripts.

But also, in jupyter notebooks, you can “split” you script so that you can run each separate bit one step at a time. This is actually really useful as you can then interleave debugging, take a look at variables on the way, etc, as you write and develop your code. This is why I quite like developing new code inside notebooks: it’s a nice way to build up a complicated script / function or even class step-by-step.

When NOT to use Jupyter Notebooks for your code

Finally, a comment on when NOT to use jupyter notebooks: once your code blocks become really long in a jupyter notebook, the notebooks concept loses it’s usefulness and becomes unmanageable. The notebooks are just not a good tool for large code blocks. In this case, though, once you start writing long code blocks, the proper thing to do is to write your code into functions (or even better, classes) in a plain-text .py file. (And then, at some point, it becomes useful to edit this code inside an IDE that is more sophisticated than the simple text editor that the notebook server provides…).

Once you have abstracted your code into functions / classes, then you can “use” it in both “script” .py files and in Jupyter notebooks by importing the functions from your longer, abstracted code. Then in your script, or in your cells of your Jupyter notebook, you can then just directly “call” the functions / classes you define in your “library” .py file. In this case, I also like using Jupyter notebooks for running code from my library, since I am able to document what I am doing more robustly using the markdown cells, and also because the outputs of my cells are recorded directly below the cell that produced them.

2 Likes

I think the script and program need to defined with respect to the execution environment and not whether the code is interpreted or compiled as, for example, C++ code in CERN was called a script when it was intended to be executed in a ROOT REPL.

Personally, I would use a word script for a code which is intended to be executed in a specific environment, bash, python, julia, etc. REPL to mutate its state, for example, to do interactive introspection or to prepare the system in a state necessary for the next script. On the other hand, I would call a program a code whose execution does not depend on the state of the running environment, for example, when all simulation parameters are set in the program or passed by an argument. Thus if a code which is executed does not commute with other code with respect of the environment of interest, it is a script and otherwise a program.

I had a question from a student in my course about if a certain way of programming in a notebook was “hard coding” or not. In my answer, I realised that this is also quite inter-related with what notebooks are and how people use them, which I commented on in my reply.

I find that there is often a lot of confusion (and emotional discussion) surrounding coding in notebooks, and so I add below here some of these thoughts as I think they help express my thoughts on how one can use notebooks for coding and some important things to remember when you do.

======

It is a good question! It’s not super easy, cut-and-dry to answer this when working in notebooks.

First, the fact that you are using variables suggests already it’s not too serious a case of hard coding: hard coding would be where you have a parameter that is used in multiple places in a piece of code with them coded as numbers (like 10001 as an array length) and have to “search replace” them 10 times to synchronise them to make your code work if you change them. This is usually pretty clear when you are working in single scripts / libraries.

But working in notebooks is a bit more subtle. It’s also a bit of how you consider the code in the notebook: is each code cell a “program” on its own? Or is a combination of all the cells a “single program”? Or is the combination of some of the cells “one program”, and a combination of another bunch of cells a “second program”?

It’s a bit of a fuzzy line, and it depends on how you use the notebook.

If code cells have interdependencies, like if I need to run cell 1 to be able to run cell 2, and if that is clear to the user of the notebook, then probably it’s best not to redefine a,b in each cell since if you decide to change a,b at some point, you have to edit it in all the interdependent code cells for things to be consistent.

But if code cells are non-interdependent, then you should definitely redefine your variables. For example, in our assignments, this is for example often the case: all the code cells of Q1, for example, may need to be run sequentially to get the correct result. But the code cells in Q2 are not interdependent on the ones in Q1: you can run them on their own without running any of the Q1 cells.

One of the big flexibilities of jupyter notebooks is that you can use them in many different ways, storing and documenting either a single “program” (group of interdependent code cells that you run step-by-step), but also multiple independent, related or unrealed, “programs”). This “power” comes at a price though: it then becomes your responsibility as a programmer to make sure you communicate to the reader of your notebook which cells are “interrelated” and which are not. With power comes responsibility :slight_smile:

A bit of a long ramble, but useful I think to explain.

Cheers,
Gary

1 Like