I have been using VSCode notebooks with .ipynb file extensions, this gives me many advantages as I am able to configure things I'm not able in JupyterLab. I also have access to a very rich ecosystem of plugins. If there is anyone aware of VSCode as a solution but keeps using JupyterLab, could they explain why?
I too use VS Code as my Jupyter platform (running remotely on a powerful EC2 instance with 32 CPUs and 256GB RAM — my own desktop is a 7 year old Intel core i7 with 8GB RAM). VS Code’s Remote extension is amazing, works over any SSH host and seamlessly blends local and remote. It’s also fast since the UI is local while the filesystem and execution is remote.
The experience is a lot better than JupyterLab (which I am forced to use from time to time on SageMaker). The VS Code UI is cleaner plus I get a full language server which means I can rename variables and refactor fearlessly.
Really interesting setup. What kind of monthly expense does this run?
And on a separate but related note, does it change the way you think about how you spend your time coding? (Assuming the costs do ramp up with usage such that time literally does equal money?)
I’ve no idea what the cost is since the company pays for it — I do need the horsepower to run some really large models and I suspect most people don’t need this kind of spec. But for my company it’s just part of the cost of doing business.
There’s no IT and I can provision instances of any type (subject to limits) at any time.
VSCode is free if you self-host it. There are corporate tiers with virtual desktops etc, and you can pay for services as GitHub pilot if you want.
Anyhow there a wealth of free extensions to customize it and the setup is really straightforward. I have version management git in a private GitHub project for version management. You can add extensions for rendering graphs in good quality and importing and exporting stuff is easy.
I have not been able to figure out why some people prefer to use Jupiter notebook as it is.
32 vCPU / 256 GB instances like r6a.8xlarge is about $900/month (r6ad which has local disk is about $100/month more), I don't see there being much other major costs with such setup?
Unlike individuals, large enterprises rarely pay sticker price but a heavily discounted negotiated rate for software and services. I can’t say how much exactly but it’s less than that.
I'm curious about your setup as I might have to do something similar soon due to my machine's performance constraints. I understand you can connect to a Jupyter server remotely, but how do you sync your code? Do you have your Git repo cloned on the remote host and just run Git commands over SSH? Or does VS Code have some kind of integration for remote file systems with version control?
Check out the remote SSH (https://code.visualstudio.com/docs/remote/ssh) extension. It makes everything really painless. I'm currently developing a website that resides in a docker container in a digitalocean server, and editing the code in VSCode feels literally no different at all to running it locally. If you have the SSH keys set up it's completely painless.
The other nice thing about VSCode is that you can extend it with VSCode Neovim (https://marketplace.visualstudio.com/items?itemName=asvetlia...), which runs a headless version of Neovim and allows you to do all the wonderful things that that entails, including stuff like VSCode's native multiple cursor implementation (and Lua config files!). All in all it's a great workflow, it's pretty light, and if you're paying for (or self-hosting) a beefy server it can turn any laptop into a powerhouse.
My code resides entirely on the remote EC2 instance while VS Code UI runs locally. There is no sync as such. You’re working off the remote copy via SSH. (I do back up my code periodically)
VS code takes care of spinning up the remote Jupyter server. All I have to do is create a new .ipynb file and everything happens automatically. Execution and disk are remote, only the UI is local. This is the magic.
It’s exactly like SSH except you have a rich client IDE in VS Code. The only data that moves over the network are your keystrokes and pastes and what is needed to display output in VS Code. You have to try it to see.
That sounds amazing. I am going to try this today. I have been irrational to the point of not trying this because I just love the notebook so much even though I love VS Code too. I imagine you just would never go back to notebooks.
That is nice the company pays for all that cloud compute but for an individual it would just seem more practical to build a beast of a machine.
VSCode Jupyter notebooks + Github Copilot is my favorite way to interact with notebooks. The autocomplete is super helpful for assisting with discovery of matplotlib or numpy operations.
Yeah, I'm looking forward to Copilot chat for matplotlib stuff. Right now I have to wrangle Copilot to do what I want with comments, but with Chat you can just ask it to write the whole cell of code.
Nice! That's the Matlab way of doing things. I used to miss the Matlab workflow a lot when I was transitioning from Matlab to python. Although somewhat surprised they didn't just go with "## title" to match Matlabs' "%% title" (difference is comment character, and "# %%" reads like a python comment of a Matlab cell, I suppose it makes sense if you start with a m-file, python comment the whole thing and then work your way down translating cells from Matlab to numpy/scipy).
Personally I've since gone full literate programming mode to the point that I care far more about the narrative and documentation (of methods and results) that I will build and modify tools rather than go back to the Matlab way. I have been looking at Quarto but haven't had the time to see if I can transition my existing (and target/ideal) workflows.
I know it gets a lot of hate but ipynb have a lot of advantages as a format for building small custom tools for modification/transformation. Most of the complaints ultimately seem to boil down to not having tools that do what you want. Only want to diff the code cells? That's easy in a python utility that loads the notebook and looks at it intelligently. You can also use pre-commit to modify the notebook and strip out things that don't belong in git.
(Also nbdev... exists... and is a good example of how tools can help. Unfortunately it's too tied to GitHub functionality and the developer is a GitHub zealot who is oddly brittle and takes offense and demands justification if anyone mentions not wanting to rely on GitHub)
I must be missing something because I am not immediately seeing the value-add. Do you prefer the separation of input/output or is it something else. I believe all of the debug, extensions, and hinting work the same as the standard notebook.
I'm glad you asked, because you made me think about why. Initially, I guess I just thought it was kinda neat and stuck with it, but on reflection this is what I personally feel I get out of it:
- Same interface for analysis, scripting, and building more complex multi-file pipelines. I can also use the #%% notation to break up and debug scripts, which is probably teaching me all sorts of bad habits but it's something I find helpful.
- Similarly, as another commenter in this thread notes, .ipynbs just don't play as nicely with the other dev tools (e.g., Git, Black) and generally feel like second-class citizens in VSCode.
- I much prefer having the VSCode interactive window on the right, as opposed to having my output dumped out below my code block. I now find using the classic notebook style makes the document much longer and harder to navigate, particularly as I work with text a lot and I'm often outputting large chunks of text for inspection.
This noted, I think this is all possible because I'm rarely producing my final products in notebook format. Neither my boss nor the stakeholders I typically present to can (or have any inclination to) read code, so I don't really need a format others can execute or inspect. I just take the charts and figures and dump to presentations and other normie-friendly documents.
I am torn on the separation of the output from input. I expect half the time I would be happy for the extra vertical space and the other half I world be annoyed I could not immediately correlate code with output.
Anyway, thanks for spreading the workflow, and I will definitely try it out in the coming weeks.
I wish more data scientists used light percent format notebooks `#%%`. It can be combined now with other powerful tools (linting, formatting and git) that is impossible with the `ipynb` format
I always like this better than true notebooks for a lot of purposes, but it's long overdue that we standardize on a format here. Knitr and RMarkdown never caught on, and Org Mode and VS Code both just do their own thing. It's a shame there isn't something more "portable".
I like to have the relevant code and output side-by-side, and dislike scrolling past outputs to get at code. Again, pure preference.
My screen copes fine with two tabs and the sidebar hidden most of the time, but more real estate would be nice.
What I'd love would be to pull tabs out into separate windows, like in a browser, and have the Jupyter output and variable inspector on a second screen. If anyone knows a way to do this (not new window) I'd love to hear. Last time I looked seriously this wasn't possible.
Maybe because the Jupyter is setup on the computing server, where everyone needs to log in to do their works? This was the case for my last 4 companies.
I use VSCode (with the remote extension) for the situation you’re describing, and find it works extremely well. There are a couple little pain points, but none related to the remote part of the equation.
Big positives are how it integrates with the rest of the IDE so go to definition, debug cell, and data explorer just work.
Some negatives are a possibly onerous setup if not already using VSCode as your IDE (to get some of the IDE-like stuff to work), and how there isn’t exact parity on hot keys so muscle memory fails you occasionally.
The VSCode remote access is IMO better than just accessing files remotely; it’s s a large part of why I use it instead of Pycharm.
It splits the editor into a UI that is run locally, and a server that does the heavy lifting on the remote machine. Conceptually it’s very similar to Jupiter, where you have a user facing front end with the UI run on JavaScript and rendered by your browser, and a python kernel backend, and the two communicate over pipes that can be run over the internet.
What it effectively means for VSCode is that you get a more seamless experience than I experienced with Pycharm remote development.
Would be curious how long ago you tried this and what your setup is? The lowest spec machine I have is a Windows 10 machine with an i7 and 8 GB RAM and it is super responsive on the latest version of VS Code.
A few weeks ago with Ubuntu 22.04 on i7 with 16GB RAM. Perhaps it's related to using the devcontainer feature of VSCode, although I run jupyterlab in the same container
That might be it. I’m not sure about devcontainers (I don’t use them) but as a data point I dual boot into kubuntu on an 7 year old i7 with 8 gigs of RAM and there are no responsive issues.
I would love to use vs code for notebooks, but I just cannot get the interactive console to work as I would like. Right know I don’t exactly recall the problem, but it had to do with the keyboard shortcuts and running the piece of code in the interactive console. For some reason other shortcuts took precedence, even if I disabled them, so I couldn’t get my code to reliably run on the console.
Well, I resetted my vs code environment in order to start fresh and check if my problem was still there. My issue is that VS Code interactive windows do not share the same "working space" as the cells. I usually use the interactive window to try stuff, which then gets crystallized in the notebooks. The interactive VS Code consoles are disjointed from the notebook, making it a pain to test stuff with already-defined variables and libraries.
For me it's a cleaner UI for experimentation and when you run a cell it doesn't jump to weird places depending on the output. I've been using jupyter for so long that I find it very familiar and easy to use. With that said, VSCode has improved so much for notebooks that I'll usually reach for it over jupyter.
Ctrl+enter executes a vscode ipynb cell without jumping to the next cell. Although it doesn’t solve the issue of jumping when you re-run all cell’s with figures in your notebook.
Does anyone else have issues with the cells overlapping or shrinking in vscode? I've had this issue for months. Created an issue on GitHub, but there's been no improvements. It can be very frustrating, sometimes I have to completely close vscode and reopen to get it to stop glitching constantly. I tried switching the GPU to the nvidia card, but it somehow uses 30% of the GPU when a cell is running.
I’ve tried to convince junior researchers to do make this jump in the past and they have not done so. I think its a combination of lack of time and familiarity. A lot of researchers only use jupyter notebooks occasionally between their more time intensive lab work or possibly using similar R tooling instead.
VSCode interactive notebooks are amazing, I think that should be how it is for all environments. I only dabble with notebooks but I dream of the day I can easily have all of my functions be interactive in a REPL as I code them with VSCode.
Only because I can use it anywhere that way, including on machines that I am not allowed to install things on. But if it works in vscode.dev then I am out of excuses, because you’re correct, it’s much much better in vscode.