Remote working
I really like working on a remote server. I find there are a number of benefits over coding locally that are worth taking the time to set it all up.
- Work wherever I want. Either on my desktop machine or on my laptop and I can always access the the same environment. I don’t have to worry if I installed something new on one machine or if I moved the correct data around. It all stays in the same place and I can access it as I need.
- Running long programs. If something takes a couple of hours or days, you don’t need to worry about shutting down or moving your laptop. You especially don’t have to deal with lag or memory issues while you wait for it to finish (I am very impatient).
In summary: more ram, more memory, more fun.
VSCode
The main tool I use to work remotely is vscode. VSCode is an IDE developed by microsoft. It is lightweight, there are a lot of extensions that make coding nice and it has a great setup for remote work. I believe Pycharm has a similar setup and as students we can get a licence but honestly my main problem is it takes ages to open up and I am a true believer in making things as smooth as possible to start work (otherwise procrasination can take over).
Using vscode to work remotely
Running vscode on a remote computer or server is very simple. There are 3 steps
- install remote server extension
- add server details to
.ssh/config
. The simplest way is toctrl+shift+p
Remote-SSH: Add New SSH Host- type ssh connection e.g.
ssh username@server.address
- An entry has now been made to your
ssh config
(will explain below)
- Connect to server using
ctrl+shift+p
Remote-SSH: Connect to Host
After that you just open a folder/file and can code away with all code running remotely.
Tips for Remote Work
Some tips I find helpful.
Backup code: write code in github repos and commit regularly! It is so important to have a backup. If something goes wrong the code is gone forever! Plus if you don’t like changes you make you can revert to code you wrote 3 months ago.
Passwordless ssh connections: Add your public key from your local computer to the accepted keys on the server. You won’t have to to type passwords again and again. Do same thing for git, add public key to accepted keys on github. 2
Create source code: I like doing analysis in jupyter notebooks but I often find they get cluttered, messy and it is tedious to look back through notebooks and access functions or scripts that I wrote. Good practice is to move functions/classes you use regularly to
.py
as source code and create scripts that accept command line arguments for specific analysis/tasks you will want to do regularly. 3Move data: Use
rsync
to move data from your computer to the server and vice versa.rsync -azP source destination
-z
flag compresses the data as it is copied so is very efficient.-a
flag,which stands for archive, syncs recursively and allows to transfer folders. Is similar to-r
flag but keeps permissions, timestamps, ownership and other things (more likely to stop things breaking).-P
gives progress bar and resumes interrupted transfers.
- The format is:
rsync -azP local_path_2_folder username@server.address:path_on_remote_server
e.g.rsync -azP ~/dir1 amarnane@myserver.inf.ed.ac.uk:~/docs/
(flip order to move from remote to local) - There are many more options that I haven’t explored yet but this a nice tutorial
You will not remember this [code/analysis/function]: Some general PhD advice but make sure to write out explanations in notebooks, comment regularly in your code and try to datestamp or include copy of the code used to generate any figures or models or data that you publish/use in a presentation/report. At some point you will drop what you are currently working on and will come back to it after a couple of months/years. You will not remember what exactly it was you did. Make it easier for yourself! 4
Next step/Advanced/I have not done this:
- Use docker instances so that you can easily share your environment/model and have it run right away. It allows you to save a copy of your setup so people can just run your code e.g. run a model developed on linux in a windows machine.
- Use a workflow system like snakemake or nextflow to combine scripts and make and end to end pipline for your project. If you have multiple steps like preprocess data -> create data stucture -> train model -> analyse output you can combine them easily using a workflow system and pass the output from one section to the other easily. It is more modular and adaptable than writing an end to end script (if your code is python only) and allows you to pass output from one language to a script using different software e.g. python train model, R analyse and plot. Can also add checkpoints to start the pipeline from any intermediate step e.g. change model and skip rerunning preprocessing.
ssh connections
ssh
or Secure shell is a protocol used to access remote computers over a network. The link will describe it better than I can but the main thing to know is that it is a way to access other computers in a safe way. It is very secure and as a phd student the main thing to know is you can use it work remotely, create passwordless connections and passwordless git repos. If you ever want to use a cluster or connect to a remote server, ssh
is the way to do it. On linux the connections are done through the terminal (or vscode as I described above). On windows, it used to be a bit trickier but you can now access it through powershell and vscode.
To connect to a remote computer you go to the terminal and type the simple command
ssh username@server.addresss
As an example suppose I have an account amarnane
on myserver.inf.ed.ac.uk
. To connect to myserver
I type
ssh amarnane@myserver.ed.ac.uk
You will then receive a prompt to enter my password and are connected.
Other links
- Guide to add ssh key to your github account (passwordless git commit, push, pull etc)
- More in depth guide to add ssh key to authorized keys to allow passwordless ssh connections
ssh config
I discovered the ssh config
through vscode (it is how it handles remote connections) but it is incredibly useful. It is particularly useful for complex ssh
commands that involve a number of options like portforwarding or use a proxy to connect to a server. It really slims down how much you have to type and how many usernames server address etc you have to remember. The format is very simple. You write out the options for your connections and give it a name.
So instead of
ssh username@server.address
You can type
ssh myserver
By adding
Host myserver
HostName server.address
User username
to the ssh config file.
Creating the config file
First check the config file exists, your config should be located at ~/.ssh/config
ls ~/.ssh
If not create it using touch
touch ~/.ssh/config
The config file is simply a text file that has a specific format
Host hostname1
SSH_OPTION value
SSH_OPTION value
Host hostname2
SSH_OPTION value
Example connection
Here is an example of my setup to allow me to connect to myserver.inf.ed.ac.uk
Host myserver
HostName myserver.inf.ed.ac.uk
User amarnane
ServerAliveInterval 30
Note: I have made ServerAliveInterval
smaller than default to try reduce the number of disconnects that happen while I work.
So now if I want to connect via ssh
or copy files from my computer to the server I can just use myserver
. For example moving public key to the server
scp ~/.ssh/id_ed25519.pub myserver:~/