First Notes from a UNIX, Git and R Workshop
Though I had expressed in my first notebook, in late 2018 at nerve.fancy.chained, that I wanted to "learn terminal" (I obviously didn't know what "terminal" was...), and though I had used ample Excel over the summer of 2018--2019 working in Sue Schenk's lab., I didn't actually do any "proper" programming until relatively recently.
I only really went to it for R
, but I am so glad I did. I guess it was the push-start I needed. After this, I talked to D. about Git
and we ended up starting to make a Minecraft mod together. The rest was history, I suppose. I started using Julia in late September, 2019, after using Java with D. for a month or so. Those were great times. It was getting warmer, after August, and D. and I were still at university together. We would come home and work on the mod, sometimes with a drink of Coke, and stay doing so till it was dark and we were hungry. During that time, I did a lot of programming in Bash
. I created my scripts
repository, which hosted all of these very buggy programmes written in bash. It was in this playground where I also learned a little about other languages: namely Perl
, Ruby
, Rust
, and Python
.
Transcribed here are my initial, very shorthand notes from a workshop I went to at V.U.W. on UNIX
, Git
and R
.
UNIX
, Git
, and R
workship —: Day 1
Bash (Bourne Again Shell)
Bash is programmable!
- The GUI is intuitive and user-friendly
- Bash can do the same things, but automatically, > 1,000 times.
ls
= listing command- The file system arranges things in hierarchy
pwd
= print working directory Note: thels
command, unless specified otherwise, will list the contents of the present working directorycd
= takes you back to your home directory-
= tells it is an option for a command (also called a "flag")cd 〈path〉
= sets working directoryls 〈path〉
= lists contents of pathman ls
= find theman
(ual) page- Press q to quit.
- Some systems may require you to use
ls --help
.
ls -l
= long listing (in bytes)ls -l -h
≡ls -lh
= human readible formls ~
≡ls $HOME
&equivls
Paths usually start with/
cd
= change directorycd ..
= moves to parent directorycd ../../
= parent of parentls -a
= list all (including hidden files).
= shortcut for current directory~/〈path〉
= expands to home directory + pathcd -
= previous working directory
Relative vs. absolute paths
mkdir
= make directorymv
= movetouch
= make filecp
= copyrm
= removerm -i
= are you sure you want to delete?more 〈file〉
= shows file- '
*
' = names ending with*〈name end〉
- '
?
' = single character wild-card clear
= clear terminal (note: can still scroll up unless configured otherwise)
Note: A note on spaces in path names: they need to be "escaped" using a backslash:
cd Victoria\ University/
open <path>
= opens a pathwc
= word countwc -c
= characterswc -l *.pdb > lengths.txt
= makes lengths file and puts lengths of pdb files in therecat
= concatenate [similar tomore
]- '
>
' = redirects output into file - '
>>
' appends file (adds to it instead of overwriting it) head
= prints n head lines of each filetail
= ... bottom lineshead 2 *.pdb
= first 2 lines of all pdb files- '
|
' = "pipe" (kind of like subset)
E.g.
sort -n lengths.txt | head -n 1
Sorts lengths.txt and then of those, prints the first line
man
= manualcut -d
= separate each line-f
= give back fields-d
= delmiter-d ,
= comma as your delimiterman <command>
is your friend!
uniq
= fills out adjacent matching lines (only unique commands)
Note: After day 1 of the tutorial, I actually went home and changed my bash prompt. The following code, I put in my
.bashrc
:# get current branch in git repo function parse_git_branch() { BRANCH=$(git branch 2> /dev/null | sed -e '/^[^*]/d' -e 's/* \(.*\)/\1/') if [ ! "${BRANCH}" == "" ] then STAT=$(parse_git_dirty) echo "[${BRANCH}${STAT}]" else echo "" fi } # get current status of git repo function parse_git_dirty { status=$(git status 2>&1 | tee) dirty=$(echo -n "${status}" 2> /dev/null | grep "modified:" &> /dev/null; echo "$?") untracked=$(echo -n "${status}" 2> /dev/null | grep "Untracked files" &> /dev/null; echo "$?") ahead=$(echo -n "${status}" 2> /dev/null | grep "Your branch is ahead of" &> /dev/null; echo "$?") newfile=$(echo -n "${status}" 2> /dev/null | grep "new file:" &> /dev/null; echo "$?") renamed=$(echo -n "${status}" 2> /dev/null | grep "renamed:" &> /dev/null; echo "$?") deleted=$(echo -n "${status}" 2> /dev/null | grep "deleted:" &> /dev/null; echo "$?") bits='' if [ "${renamed}" == "0" ]; then bits=">${bits}" fi if [ "${ahead}" == "0" ]; then bits="*${bits}" fi if [ "${newfile}" == "0" ]; then bits="+${bits}" fi if [ "${untracked}" == "0" ]; then bits="?${bits}" fi if [ "${deleted}" == "0" ]; then bits="x${bits}" fi if [ "${dirty}" == "0" ]; then bits="!${bits}" fi if [ ! "${bits}" == "" ]; then echo " ${bits}" else echo "" fi } # make prompt pretty PS1="\n\[\033[0;31m\]\342\224\214\342\224\200\$()[\[\033[1;38;5;2m\]\u\[\033[0;1m\]@\033[1;33m\]\h: \[\033[1;34m\]\W\[\033[1;33m\]\[\033[0;31m\]]\[\033[0;32m\] \[\033[1;33m\]\`parse_git_branch\`\[\033[0;31m\]\n\[\033[0;31m\]\342\224\224\342\224\200\342\224\200\342\225\274 \[\033[0;1m\]\$\[\033[0;38m\] " export PS1
Also in this time I decided on a colour scheme for my terminal. Choose something nice to look at that makes you comfortable.
Loops
>
: when pressed enter, this is telling you that you need more information to complete the command.
cd ~/Desktop/Git_Unix_R_Workshop/Day_1/data-shell/creatures for filename in *.dat do head -n 2 $filename | tail -n 1 done
"Don't name your variables
cheesecake
; you'll have no idea what they're doing when you come back to it in three weeks' time."
Note that the following code excerpts are NOT equivalent:
for datafile in *.pdf; do ls *.pdb; done
for datafile in *.pdf; do ls $datafile; done
<cmd> && <cmd 2>
= run<cmd>
; if successful, run<cmd 2>
grep
= Global Regular Expression Print (finding text)- E.g.,
grep <word> <file>
-i
= case insensitive-w
= whole word-n
= line numbers-v
= when it doesn't match (invert)- E.g.,
grep -E '^.o' haiku.txt
find
= finding files- E.g.,
find . -type [f∨d]
find . -name '*.txt'
find . -type f -mtime -1 -user jakeireland
= finds files updated in past day but userjakeireland
'$'<name>
= call a variable named<name>
git
Version control system: track changes of file over time
Git has become popular in version control; and scalable! Originally developed by Linux kernel guy.
Note: since this workshop, I have learned that, the Linux guy (whose name is Linus) has two tools named after him: the Linux operating system (similar-sounding to his name), and
git
, because he is one (self-proclaimed; I am not insulting him)!
(Android = Linux kernel! So yes, you have heard of Linux).
A kernel is the part of the operating system that talks to the device
We first need to configure out git
environment, for our terminal to know our git
credentials. We run
git config --global user.name "<name>"
If colaborating, be carefule of differing operating systems; line endings can cause merge issues.
Now we want to say
git init
to initialise the repository.
Running ls -la
will give us a long listing and include hidden files within the directory (files beginning with .
). Some files need to be there but aren't useful to use (only the computer), so they often remain hidden.
We can run
git status
to see the commits.
Note: You can actually write a git repo anywhere you have access.
rm -rf .git
Will remove any trace of the git repository, recursively and forcefully.
Tracking changes
If you are using the terminal-based text editor nano
, Ctrl + O = write Out = save.
git add <filename>
The previous command gives tracking file. It tells git that yo u are "staging" that file. Staging is the area where you're telling git
to track.
git commit
actually tracks the file. Each commit has a unique hash code, often shortened because it can be, for ease of reference.git log
helps to see what you have committed in the past.nano <file>
= edit filegit diff
= difference (looks at changes)-a /<file>
(initial)+
[added]b /<file>
(final)
git diff --stages
git log -1
= last log notes- E.g.,
git log --oneline --graph --all --decorate
So what we do:
- We write some changes to our code
git log
git diff
git commit -m "commit message (something helpful to read later)"
git log
mkdir <dir>
→git add <dir>
- This does not track the directory! Low key because it doesn't actually need to.
-
I didn't elaborate on this at the time, but I beleive this is because it tracks all of the files within the directory.
HEAD
= last change you committedgit diff HEAD <file>
= last changegit diff HEAD ~n <file>
= number of changes you want to go back to
But what if you have lots of these changes? This is where hash codes are useful!:
-
git diff <hash code>
-
git checkout -- mars.txt
= goes back to a previous version (similar torm
, so be careful!). -
git <cmd> --help
is equivalent toman <git cmd>
.
Recall that touch
creates an empty file. nano .gitignore
creates a hidden file (by .
).
git add -f <file>
to overwrite ignored files (in.gitignore
).git status --ignored
tells us what we have ignored.
An important note: git
≠ GitHub.
git
is the tool that we have been using- GitHub is a web service that allows people to collaborate through somewhere. Other such web services include GitLab, BitBucket, etc.
These are some notes on changing my bash prompt, and other things, after day 2... To change your prompt:
sudo nano /etc/bashrc # or use your favourite text editor
This is actually wrong, in hindsight. You change the
$HOME/.bashrc
file... Then typeexport PS1="<desired prompt>"
To list colour codes in their respective colours, I ran this loop:
for colour in {1..255} # this is a sequence of integers from 1 to 255 inclusive do echo -en "\033[38;5;${colour}m38;5;${colour}\n" done | column -x
Let me attempt to explain this.
echo
prints "". The -n
option for echo tells the command not to print the trailing new line characters. The-e
option for echo tells the echo command that within the argument there is an escape code. In our case, our escape code is\033
, which in turn tells bash that whatever succeeding that, between[
andm
, should be ignored as a string. In our case, we get thatecho -en "\033[<u> </u>m<u> </u>"
This is our text formatting code, which tells whatever follows after
m
and before the closing"
what colour to be. Finally, (*) writes out the colour code [sic].We have this embedded in a loop for all numbers in 1–255.
The
\n
at the end of echo creates a new line.Note: I now realise that the
-n
is redundant when we are adding\n
anyway...I also discovered the command
tput
for colours. I'm not sure how this works with bold, but I have the following:
for colour in {1..256} do echo -en "$(tput setaf ${colour})\$(tput setaf ${colour})\n" done | column -x echo
Git continued (with Wes Harrell, now)
GitHub allows you to share your changes with other people.
pull
requests exist (allows people to suggest changes).- Password-less identification
git push
git pull
- Merge conflicts can sometimes happen! Conflict resolution is not easy—sometimes it takes hours, or even days—but
git
gives you the tools to do this.
Note: these are some notes I made about the night before, and changing my prompt
I had a bit of trouble last night when trying to change
.bashrc1 to include
PS1` bynano .bashrc
and restarting terminal, it would only update typing
exec bash
So then I went into
/etc/bashrc