Git is the most used version control software today. It lets you track the different versions of your files on your computer over time. Every time the user decides to create a version is called commit. Unlike working in Dropbox (for example) each commit is a discrete moment and decided by the user. There is no continuous update. This allows you to know/decide which are the relevant changes and separate the work into stages.
A git repository is a folder where the latest version of each file is visible, but the entire commit history of the files is available, can be explored, rolled back. With git the user knows what was added, modified, deleted in each commit, and therefore does not need to create duplicate versions of files, or rename along the way.
Git works locally, but it also allows you to establish remote repositories. This allows working from different computers, with different users and having a backup remotely. In this sense, git is said to be a distributed version control system, where the loss of a “central” computer or user does not imply the loss of the entire work.
Currently, GitHub (www.github.com) is the most popular and used git repository storage system. However, institutions can implement servers so that they serve as remotes, and there are other similar services such as Gitlab (www.gitlab.com, our favorite <3) and Bitbucket (www. bitbucket.com).
Git can be used in any folder on the computer and is a system independent of the R workflow. It was developed by Linus Torvalds to be able to collaborate with different linux authors and to be able to work offline (between commits). In this tutorial we will configure the computer so that data analysis projects can take advantage of git and workflows are more organized.
This tutorial is inspired by the R course from Page Piccinini
First let’s do the git configuration on the computer. To do this, open a terminal window in RStudio.
Every git command in the terminal starts with git
;) Let’s enter name and email for identification:
Type it:
git config --global user.name
git config --global user.email
The first time nothing should appear, if something appears it has already been shot.
If there is no response or if there is an error in the return, run:
git config --global user.name [your name]
git config --global user.email [your github email!]
git config --global user.name "Andrea Sánchez-Tapia"
git config --global user.email katori@gmail.com
The quotes in the name allow git to understand that the full name with spaces is user.name
.
When checking, the data entered should appear, type again:
git config user.name
git config user.email
The variables you entered should appear
So far git is configured on the computer and it knows who you are.
There are several ways to create the local git repository that can communicate remotely with GitHub, GitLab or Bitbucket. In this case, we already have a local folder so we just need to start git locally and create a remote repository and add it locally.
In other workflows, you may want to create the repository directly on GitHub and clone it to your computer, and only add content later.
In general, read the instructions available on the hosting services :) The GitHub, GitLab and Bitbucket (Atlassian) help are very useful.
git status
this is not yet a git repository:
fatal: not a git repository (or any of the parent directories): .git
git init
Initialized empty Git repository in /Users/andreasancheztapia/Desktop/project_work_area/.git/
git remote -v
Nothing, right?Let’s create and add a remote repository created on GitHub
Always remember to check:
git status
At this point the message in the terminal should be:
On branch master
No commits yet
Untracked files:...
Let’s make a modification to the README.md, add the changes (add
), commit (commit
) and push (push
).
Edit your README.md in an interesting and meaningful way -
Add your README.md: this means git will start monitoring this file.
git add README.md
Always do git status
between steps to understand what is happening
git commit -m "I made the changes because it felt good"
[master b9cdaf7] I made the changes because it was good
1 file changed, 1 insertion(+)
YOU ONLY NEED TO DO THIS ONCE ON EACH COMPUTER
To do this, we generate a security key that identifies the computer and copy it to GitHub.
This key is for each individual computer. You can only have one GitHub account but work on different computers, and each will have its own key.
Preferences > git/svn
git.exe
on windows, mac and linux /usr/bin/git
Create RSA Key
. If you already have something, go to the next step.So far github and your computer can communicate :D This key configuration only needs to be done once on each computer. The rest needs to be run every time you create a repository
We are going to add the remote, copy the SSH option from the top frame
git@github.com:AndreaSanchezTapia/blah.git
Go back to the local terminal and add this remote:
git remote add origin
+ paste content with ctrl + v
git remote add origin git@github.com:AndreaSanchezTapia/blah.git
You just added the remote you created on GitHub
Check if it exists:
git remote -v
The response should be something similar to this:
$ origin git@github.com:AndreaSanchezTapia/blah.git (fetch)
$ origin git@github.com:AndreaSanchezTapia/blah.git (push)
So far you have a remote and a local repository
git push -u origin master
The -u
marks an “upstream”: any changes on the remote can be retrieved locally
The push message should look like this:
Warning: Permanently added the RSA host key for IP address '18.228.52.138' to the list of known hosts.
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Writing objects: 100% (3/3), 313 bytes | 313.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To github.com:AndreaSanchezTapia/blah.git
af45751..b9cdaf7 master -> master
Make one more edit to the README.md and repeat steps 2 to 4: add
, commit
, push