Sha (Joe) Zhu | How to version control your code?

A recent conversation with a colleague about svn took me down to the memory lane of version control. I decided to write this for the future me, to update again when git becomes obsolete — a brief history of how I have been versioning my code in the last decade.

The dumb version
SVN
GIT
Summary

The dumb version

Back in 2007 to 2008 summer (yes, summer, I was in New Zealand at the time) I took an 8-week long summer project in computational mathematics and focused on floating point arithmetic. It was not the first time that I wrote lots of code on one project, but it was the first time I decided to put more structure to the code and started versioning.

In the first a few weeks, I laterally made a new copy of the entire project at the end of the week, name the folder with date. My supervisor told me to keep a reseat diary (a neat skill that l learned, and still use it until this very day), which I also use it to keep log of the changes. The code started growing quite rapidly and structure started to look messy, I paused the work and started to use soft links to take care of the repeated part. At the end of the 8th week, this project directory was no where near where I planned it to be. My first versioning project was a total disaster. Luckily the course was only assessed based on the report and presentation, so it wasn’t a total failure.

A valuable lesson that I have learned from this is that how important is to keep a clear log of major changes and assumptions. Small change logs are not very useful.

In the following years, I started to do more research work and postgrad studies, when I started doing proper version controls with svn, this was when google code was still a thing.

SVN

svn is a centralized version control system. What does this mean? Suppose that that I am collaborating with person B and C. We must have a central server to host all the files and change logs. Person B, C and I have to commit and submit our changes to the central server in order for the version control to work. svn adapts a tree structure: stuff on the central server is known as the trunk. Developers work on their own branches, which is a local copy of the trunk. So how does it work in practice?

Assuming a remote repo has been set up, we can copy the remote version of REV to make a local copy by checkouout


svn checkout URL_OF_A_REV LOCAL_DIR

Note: svn co is short for svn checkout.

Saving changes

Suppose you have created an R script, namely SOME_FILE.R, and you want to start tracking changes in this file. You can add this file to your svn records by:


svn add SOME_FILE.R

Note: you can always use svn status to check the current status before you commit. A newly added a file will have status A, otherwise can be M for modifications and D for deletion.

One important thing before you commit, always pull the remote changes first before you commit and publish your changes:


svn update

This will avoid conflict and overwriting each other’s code.

Once changes have been made, and you want to save all the changes into the history / log.

svn commit -m `I just made some changes`

Type svn log, then you will be able to see this message

GIT

Working with local repository.

Since 2013, I switched to git. Git is much more powerful than svn. Firstly, it decentralizes the repository, i.e. you don’t need to have internet to use git, whereas svn relies on communication with the remote master.

To start a local git repo, you can simply do the following:

mkdir MY_LOCAL_REPO
cd MY_LOCAL_REPO
git init

Once we have done this, we can use git remote set-url origin REMOTE_URL to link the repo, where REMOTE_URL is an uninitialized git repo url (on github or gitlab). If you have github repo initialized by github already. Instead of git init, you should

mkdir MY_LOCAL_GIT_REPO
cd MY_LOCAL_GIT_REPO
git clone REMOTE_URL

The normal work flow with git and svn are very similar, new files need to be added then can be committed. Difference is that in git, once tracking has started, you can make further changes to the file, however these changes are not staged, unless you do git add again. For simplicity, you can do git add -u, which will help you stage multiple files that are tracked already.

For svn, one should always ‘update’ then ‘commit’. For git, one should always commit and pull or stash, pull, stash pop depends on the team workflow.

Branching and merging

Branches are effective pointers of your changes. From time to time, it is easier to implement new features or fixing bugs on the branch, without affecting the rest of the core code. For svn, branching will require copy the trunk, it is very time consuming. It is much simpler and clearer in git.

Suppose the main branch is called the Master branch. From left to right, each circle is a commit entry, after two commits (white circles), the first branch is created by


git branch -c Feature

This will allow you to make changes independently on both Master and Feature branches. To switch between branches, you can use git checkout BRANCH_NAME. Suppose you have finished implementing new features, and there are two commits on the master branch, you can use git merge ONTO_BRANCH FROM_BRANCH to merge changes from Master to Feature, i.e. git merge Feature Master.

A good exercise of branching can be found at learngitbranching.

Rebase and squash

Rebasing allows you to move your HEAD node around the tree, which makes git very powerful when modifying changes. Take the previous example, Instead of merges from the Master branch on to the end of the Feature branch. A rebase step on branch Feature will allow you to detach the branch Feature and add to the end of branch Master. This will then allow to maintain much cleaner working records. Suppose you are rebasing Feature to Master:

git checkout Master
git pull
git checkout Feature
git rebase -i Master

Try to squash all commits into one: in the editor, leave pick in the first line and enter s for the rest. Go through the rebase process by resolving any conflicts, then git add . and git rebase –continue. Once all conflicts are done, do git push origin Feature -f.

After this you can create a pull request on GitHub and ask someone to approve the merge into Master.

After rebasing, git will allow you to revise the changes (commits) on the branch. At this stage, you can choose option squash to squash multiple commit records into one. And keep the log smaller.

Submodules

Another powerful advantage of git compare to svn is submodule, which can be used to link multiple git repositories. Each repository can be developed independently.

To initialize a submodule, can simply use the following:

git submodule init
git submodule add REPO_PATH_OR_URL

If a git repo (with submodule) is freshly cloned. One would need to use git submodule update –init --recursive to make sure all submodules are also checked out and updated.

Also refer to the Github git cheatsheet.

Summary

To summarize:

Operation	`svn`	`git`
To create a repo	`svnadmin create`	`git init`
To clone repo	`svn checkout`	`git clone`
Update from remote	`svn update`	`git pull`
Stage a change/commit	`svn add`	`git add`
Check repo status	`svn status`	`git status`
Make a commit	`svn commit`	`git commit`
Diff	`svn diff`	`git diff`
Stop tracking / Remove	`svn delete`	`git rm`
Make a branch	`svn copy`	`git branch -c`
Revert changes	`svn revert`	`git checkout`
To view the log	`svn log`	`git log`

Git has so many more features compare to svn: submodule, rebase, pre-commit settings. So … use git.

Migrate from svn to git.

If you ever fancy to move from svn to git. This blog gives a very clear and simple instruction.