> I want to also store/attach intermediates (say a named numpy array) into the notebook for further analysis.
Is the issue that you do not want to save the data and report into a folder and distribute that? That is, you want an entirely self-contained notebook? Or is there something else going on here?
I'm sure that's possible but it seems kind of wrong to put your binary "data" in with your analysis and presentation-making code.
Somewhat. I want it to be difficult to separate the data from the report. Basically I want the report itself to be ingestible as input to other steps, so it's more of a "documented data" with the analysis results available. The inputs are documented but the final results can be restored without reevaluating the entire notebook. I don't want the entire workspace saved, just the final results. I hoped there was some magic to inject and restore a python object from a cell using some form of introspection.
The testing I do is annual equipment performance evaluation. I'd like to be able to process each test and then feed the results into longitudinal monitoring. One thing I am considering is adding a library or extension to papermill that automatically creates a workspace hdf5 or dill or whatever that I can store individual variables into. After studying the ipynb JSON it just seems odd that you can't just store blobs as attachments. But what I understand is it has to do with the kernels and notebooks running as separate processes and passing things around as notifications. So basically the kernels don't have any access to the cells or any sorts of introspection.
With papermill you have parameters, there's just not any "return values" in the processed notebooks. If it existed you could treat "reports" more easily as cached function evaluations.
(I just got it by googling “pickle an object in jupyter,” so sorry if this is something obvious that you’ve already seen and doesn’t quite solve your problem).
Yeah it got my hopes up when I found it. But when I was testing it, I didn't find the data actually made it into the .ipynb. It turns out that's actually a global storage in your home directory and doesn't go into the notebook at all. So different notebooks each overwrite the value if they use the same variable name.
Is the issue that you do not want to save the data and report into a folder and distribute that? That is, you want an entirely self-contained notebook? Or is there something else going on here?
I'm sure that's possible but it seems kind of wrong to put your binary "data" in with your analysis and presentation-making code.