Detailed Table of Contents
Guidance for the item(s) below:
Given this is a first course in SE, tradition demands that we start by defining the subject. However, let's not spend a lot of time going through lengthy/formal definitions of SE. Instead, let's look at an extract from the very first chapter of a very famous SE book, with the aim of providing some inspiration, but also an appreciation of the challenges ahead.
The following description of the Joys of the Programming Craft was taken (and emphasis added) from Chapter 1 of the famous book The Mythical Man-Month, by Frederick P. Brooks.
Why is programming fun? What delights may its practitioner expect as his reward?
First is the sheer joy of making things. As the child delights in his mud pie, so the adult enjoys building things, especially things of his own design. I think this delight must be an image of God's delight in making things, a delight shown in the distinctness and newness of each leaf and each snowflake.
Second is the pleasure of making things that are useful to other people. Deep within, you want others to use your work and to find it helpful. In this respect the programming system is not essentially different from the child's first clay pencil holder "for Daddy's office."
Third is the fascination of fashioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning. The programmed computer has all the fascination of the pinball machine or the jukebox mechanism, carried to the ultimate.
Fourth is the joy of always learning, which springs from the nonrepeating nature of the task. In one way or another the problem is ever new, and its solver learns something: sometimes practical, sometimes theoretical, and sometimes both.
Finally, there is the delight of working in such a tractable medium. The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by the exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures....
Yet the program construct, unlike the poet's words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.
Programming then is fun because it gratifies creative longings built deep within us and delights sensibilities you have in common with all men.
Not all is delight, however, and knowing the inherent woes makes it easier to bear them when they appear.
First, one must perform perfectly. The computer resembles the magic of legend in this respect, too. If one character, one pause, of the incantation is not strictly in proper form, the magic doesn't work. Human beings are not accustomed to being perfect, and few areas of human activity demand it. Adjusting to the requirement for perfection is, I think, the most difficult part of learning to program.
Next, other people set one's objectives, provide one's resources, and furnish one's information. One rarely controls the circumstances of his work, or even its goal. In management terms, one's authority is not sufficient for his responsibility. It seems that in all fields, however, the jobs where things get done never have formal authority commensurate with responsibility. In practice, actual (as opposed to formal) authority is acquired from the very momentum of accomplishment.
The dependence upon others has a particular case that is especially painful for the system programmer. He depends upon other people's programs. These are often maldesigned, poorly implemented, incompletely delivered (no source code or test cases), and poorly documented. So he must spend hours studying and fixing things that in an ideal world would be complete, available, and usable.
The next woe is that designing grand concepts is fun; finding nitty little bugs is just work. With any creative activity come dreary hours of tedious, painstaking labor, and programming is no exception.
Next, one finds that debugging has a linear convergence, or worse, where one somehow expects a quadratic sort of approach to the end. So testing drags on and on, the last difficult bugs taking more time to find than the first.
The last woe, and sometimes the last straw, is that the product over which one has labored so long appears to be obsolete upon (or before) completion. Already colleagues and competitors are in hot pursuit of new and better ideas. Already the displacement of one's thought-child is not only conceived, but scheduled.
This always seems worse than it really is. The new and better product is generally not available when one completes his own; it is only talked about. It, too, will require months of development. The real tiger is never a match for the paper one, unless actual use is wanted. Then the virtues of reality have a satisfaction all their own.
Of course the technological base on which one builds is always advancing. As soon as one freezes a design, it becomes obsolete in terms of its concepts. But implementation of real products demands phasing and quantizing. The obsolescence of an implementation must be measured against other existing implementations, not against unrealized concepts. The challenge and the mission are to find real solutions to real problems on actual schedules with available resources.
This then is programming, both a tar pit in which many efforts have floundered and a creative activity with joys and woes all its own. For many, the joys far outweigh the woes....
Guidance for the item(s) below:
Now, let's switch our focus to the project management aspect of SE.
Broadly speaking, there are two approaches to doing a software project. Those two approaches are also highly relevant to the way this course is run, and how it is different from most SE courses elsewhere.
Let's learn about those two approaches early so that we can better understand how this course works.
Software development goes through different stages such as requirements, analysis, design, implementation and testing. These stages are collectively known as the software development lifecycle (SDLC). There are several approaches, known as software development lifecycle models (also called software process models), that describe different ways to go through the SDLC. Each process model prescribes a 'roadmap' for the software developers to manage the development effort. The roadmap describes the aims of the development stages, the outcome of each stage, and the workflow i.e. the relationship between stages.
The sequential model, also called the waterfall model, views software development as a linear process, in which the project is seen as progressing through the development stages. The name waterfall stems from how the model is drawn to look like a waterfall (see below).
When one stage of the process is completed, it produces some artifacts to be used in the next stage. For example, the requirements stage produces a comprehensive list of requirements, to be used in the design phase.
A strict sequential model project moves only in the forward direction i.e., each stage is completed before starting the next. For example, once the requirements stage is over, there is no provision for revising the requirements later.
This model can work well for a project that produces software to solve a well-understood problem, in which case the requirements can remain stable and the effort can be estimated accurately. Furthermore, as each stage has a well-defined outcome, it is easy to track the progress of the project because one can gauge the project progress by monitoring which stage the project is in.
However, real-world projects often tackle problems that are not well-understood at the beginning, making them unsuitable for this model. For example, target users of a software product may not be able to state their requirements accurately at the start of the project, if they have not used a similar product before.
The iterative model advocates producing the software by going through several iterations. Each of the iterations could potentially go through all the stages of the SDLC, from requirements gathering to deployment.
Each iteration produces a new version of the product, building upon the version produced in the previous iteration. Feedback from each iteration is factored into the subsequent iterations. For example, if an implementation task took longer than expected, the effort estimate for a similar tasks in future iterations can be adjusted accordingly. Similarly, if a feature introduced in the current iteration was not well-received by target users, it can be removed or tweaked in the next iteration.
The iterative model can be done in breadth-first or depth-first approach.
Taking a Minesweeper game as an example,
A project can be done as a mixture of breadth-first and depth-first iterations i.e., an iteration can contain some breadth-first work as well as some depth-first work, or, some iterations can be breadth-first while others are depth-first.
Follow up notes for the item(s) above:
Scanning a TLDR version of a topic: As mentioned in 'Using this Website' page, the more important layer of information is given in bold text. For example, you can quickly scan the essential points of a topic by reading the bold text only (this could be useful when you want to quickly recap a previous topic, or to get an idea of what a topic covers without reading all the details).
Guidance for the item(s) below:
This week, you are starting your individual project (iP). As you are adding code to the iP in rapid succession, you'll need a way to keep track of all the changes you do. The tool we are going to use for that is called Git, and we need to learn Git basics pretty quickly.
Let's jump in and learn how to get started using Git in your own computer.
Destination: To be able to use Git to systematically record the history of a folder in your own computer. More specifically, to use Git to save a snapshot of the folder at specific points of time.
Motivation: Recoding the history of files in a folder (e.g, code files of a software project, case notes, files related to an article/book that you are authoring) can be useful in case you need to refer to past versions.
Lesson plan:
→ Lesson: Introduction to Revision Control covers that part.
→ Lesson: Preparing to Use Git covers that part.
→ Lesson: Putting a Folder Under Git's Control covers that part.
→ Lesson: Specifying What to include in a Snapshot covers that part.
→ Lesson: Saving a Snapshot covers that part.
Before learning about Git, let us first understand what revision control is.
Given below is a general introduction to revision control, adapted from bryan-mercurial-guide:
Revision control is the process of managing multiple versions of a piece of information. In its simplest form, this is something that many people do by hand: every time you modify a file, save it under a new name that contains a number, each one higher than the number of the preceding version.
Manually managing multiple versions of even a single file is an error-prone task, though, so software tools to help automate this process have long been available. The earliest automated revision control tools were intended to help a single user to manage revisions of a single file. Over the past few decades, the scope of revision control tools has expanded greatly; they now manage multiple files, and help multiple people to work together. The best modern revision control tools have no problem coping with thousands of people working together on projects that consist of hundreds of thousands of files.
There are a number of reasons why you or your team might want to use an automated revision control tool for a project.
Most of these reasons are equally valid, at least in theory, whether you're working on a project by yourself, or with a hundred other people.
A revision is a state of a piece of information at a specific time that is a result of some changes to it e.g., if you modify the code and save the file, you have a new revision (or a new version) of that file. Some seem to use this term interchangeably with version while others seem to distinguish the two -- here, let us treat them as the same, for simplicity.
Revision Control Software (RCS) are the software tools that automate the process of Revision Control i.e. managing revisions of software artifacts. RCS are also known as Version Control Software (VCS), and by a few other names.
Git is the most widely used RCS today. Other RCS tools include Mercurial, Subversion (SVN), Perforce, CVS (Concurrent Versions System), Bazaar, TFS (Team Foundation Server), and Clearcase.
Github is a web-based project hosting platform for projects using Git for revision control. Other similar services include GitLab, BitBucket, and SourceForge.
Before you start learning Git, you need to install some tools in your computer.
First, install Git.
Next, ensure you have a suitable terminal app. Our instructions assume you use a Bash terminal.
Optionally, install a Git client. e.g., Sourcetree (installation instructions), which is Git + a GUI for Git.
If you are new to Git, we recommend you learn both the GUI method and the CLI method -- The GUI method will help you visualize the result better while the CLI method is more universal (i.e., you will not be tied to any GUI) and more flexible/powerful.
It is fine to learn the CLI way only (using a GUI is optional), especially if you normally prefer to work with CLI over GUI.
To be able to save snapshots of a folder using Git, you must first put the folder under Git's control by initialising a Git repository in that folder.
Normally, we use Git to manage a revision history of a specific folder, which gives us the ability to revision-control any file in that folder and its subfolders.
To put a folder under the control of Git, we initialise a repository (short name: repo) in that folder. This way, we can initialise repos in different folders, to version-control different clusters of files independently of each other e.g., files belonging to different projects.
You can follow the hands-on practical below to learn how to initialise a repo in a folder.
What is this? HANDS-ON panels contain hands-on activities you can do as you learn Git.
1 First, choose a folder. The folder may or may not have any files in it already. For this practical, let us create a folder named things
for this purpose.
cd my-projects
mkdir things
2 Then CD into it.
cd things
3 Run the git status
command to check the status of the folder.
git status
fatal: not a git repository (or any of the parent directories): .git
Don't panic. The error message is expected. It confirms that the folder currently does not have a Git repo.
4 Now, initialise a repository in that folder.
Use the command git init
which should initialise the repo.
git init
Initialized empty Git repository in things/.git/
The output might also contain a hint about a name for an initial branch (e.g., hint: Using 'master' as the name for the initial branch ...
). You can ignore that for now.
Note how the output mentions the repo being created in things/.git/
(not things/
). More on that later.
Windows: Click File
→ Clone/New…
→ Click on + Create
button on the top menu bar.
Enter the location of the directory and click Create
.
Mac: New...
→ Create Local Repository
(or Create New Repository
) → Click ...
button to select the folder location for the repository → click the Create
button.
Initialising a repo results in two things:
To confirm, you can run the git status
command. It should respond with something like the following:
git status
On branch master
No commits yet
nothing to commit (create/copy files and use "git add" to track)
Don't worry if you don't understand the output (we will learn about them later); what matters is that it no longer gives an error message as it did before.
.git
inside the things
folder. This folder will be used by Git to store meta-data about this repository.What is this? UNDER-THE-HOOD panels explain how a certain Git feature works under the hood i.e., some implementation details.
They can be skipped the first time you are taking a tour. But we recommend that you delve into some of them at some point. Reason: While Git can be used without knowing much about its internal workings, knowing those details will allow you to be more confident when using Git, and harness more of its awesome power.
UNDER-THE-HOOD: How Git stores meta-data about the repository
A Git-controlled folder is divided into two main parts:
.git
subfolder, which contains all the metadata and history.What is this? EXERCISE panels contain a Git-Mastery exercise that you can download using the Git-Mastery app, and you can use the same app to verify that your solution is correct.
EXERCISE: under-control
What is this? DETOUR panels contain related directions you can optionally explore. We recommend that you only skim them the first time you are going through a tour (i.e., just to know what each detour covers); you can revisit them later, to deepen your knowledge further, or when you encounter a use case related to the concepts covered by the detour.
DETOUR: How to undo a repo initialisation
When Git initialises a repo in a folder, it does not touch any files in the folder, other than create the .git
folder its contents. So, reversing the operation is as simple as deleting the newly-created .git
folder.
git status #run this to confirm a repo exists
rm -rf .git #delete the .git folder
git status #this should give an error, as the repo no longer exists
To save a snapshot, you start by specifying what to include in it, also called staging.
Git considers new files that you add to the working directory as 'untracked' i.e., Git is aware of them, but they are not yet under Git's control. The same applies to files that existed in the working folder at the time you initialised the repo.
A Git repo has an internal space called the staging area which it uses to build the next snapshot. Another name for the staging area is the index).
We can stage) an untracked file to tell Git that we want its current version to be included in the next snapshot. Once you stage an untracked file, it becomes 'tracked' (i.e., under Git's control).
In the example below, you can see how staging files change the status of the repo as you from (a) to (c).
staging area
[empty]
other meta data ...
├─ fruits.txt (untracked!)
└─ colours.txt (untracked!)
staging area
└─ fruits.txt
other meta data ...
├─ fruits.txt (tracked)
└─ colours.txt (untracked!)
fruits.txt
.staging area
├─ fruits.txt
└─ colours.txt
other meta data ...
├─ fruits.txt (tracked)
└─ colours.txt (tracked)
colours.txt
.1 First, add a file (e.g., fruits.txt
) to the things
folder.
Here is an easy way to do that with a single terminal command.
echo "apples\nbananas\ncherries\n" > fruits.txt
apples
bananas
cherries
2 Stage the new file.
2.1 Check the status of the folder using the git status
command.
git status
On branch master
No commits yet
Untracked files:
(use "git add <file>..." to include in what will be committed)
fruits.txt
nothing added to commit but untracked files present (use "git add" to track)
2.2 Use the add
command stage the file.
git add fruits.txt
You can replace the add
with stage
(e.g., git stage fruits.txt
) and the result is the same (they are synonyms).
2.3 Check the status again. You can see the file is no longer 'untracked'.
git status
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: fruits.txt
As before, don't worry if you don't understand the content of the output (we'll unpack it in a later lesson). The point to note is that the file is no longer listed as 'untracked'.
2.1 Note how the file is shown as ‘unstaged’. The question mark icon indicates the file is untracked.
If the newly-added file does not show up in Sourcetree UI, refresh the UI (: F5
| ⌥+R)
2.2 Stage the file by selecting the fruits.txt
and clicking on the Stage Selected
button.
2.3 Note how the file is staged now i.e., fruits.txt
appears in the Staged files
panel now.
If Sourcetree shows a \ No newline at the end of the file
message below the staged lines (i.e., below the cherries
line in the above screenshot), that is because you did not hit enter after entering the last line of the file (hence, Git is not sure if that line is complete). To rectify, move the cursor to the end of the last line in that file and hit enter (like you are adding a blank line below it). This new change will now appear as an 'unstaged' change. Stage it as well.
If you modify a staged file, it goes into the 'modified' state i.e., the file contains modifications that are not present in the copy that is waiting (in the staging area) to be included in the next snapshot. If you wish to include these new changes in the next snapshot, you need to stage the file again, which will overwrite the copy of the file that was previously in the staging area.
The example below shows how the status of a file changes when it is modified after it was staged.
staging area
Alice
other meta data ...
Alice
staging area
Alice
other meta data ...
Alice
Bob
staging area
Alice
Bob
other meta data ...
Alice
Bob
1 First, add another line to fruits.txt
, to make it 'modified'.
Here is a way to do that with a single terminal command.
echo "dragon fruits" >> fruits.txt
apples
bananas
cherries
dragon fruits
2 Now, verify that Git sees that file as 'modified'.
Use the git status
command to check the status of the working directory.
$ git status
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: fruits.txt
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: fruits.txt
Note how fruits.txt
now appears twice, once as new file: ...
(representing the version of the file we staged earlier, which had only three lines) and once as modified: ...
(representing the latest version of the file which now has a fourth line).
Note how fruits.txt
appears in the Staged files
panel as well as 'Unstaged files'.
3 Stage the file again, the same way you added/staged it earlier.
4 Verify that Git no longer sees it as 'modified', similar to step 2.
Git does not track empty folders. You can test this by adding an empty subfolder inside the things
folder (e.g., things/more-things
and checking if it shows up as 'untracked' (it will not). If you add a file to that folder (e.g., things/more-things/food.txt
) and then staged that file (e.g., git add more-things/food.txt
), the folder will now be included in the next snapshot.
EXERCISE: stage-fright
after staging, you can now proceed to save the snapshot, aka creating a commit.
Saving a snapshot is called committing and a saved snapshot is called a commit.
A git commit is a snapshot of your project based on the files you have staged, more precisely, a record of the exact state of all files in the staging area (index) at that moment -- even the files that have not changed since the last commit. Consequently, a commit has all the information it needs to recreate the tracked files in the working folder at the time the commit was created.
A commit also includes metadata such as the author, date, and an optional commit message describing the change.
A Git commit is a snapshot of all tracked files, not simply a delta of what changed since last commit.
Assuming you have previously staged changes to the fruits.txt
, go ahead and create a commit.
1 First, let us do a sanity check using the git status
command.
git status
On branch master
No commits yet
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: fruits.txt
2 Now, create a commit using the commit
command. The -m
switch is used to specify the commit message.
git commit -m "Add fruits.txt"
[master (root-commit) d5f91de] Add fruits.txt
1 file changed, 5 insertions(+)
create mode 100644 fruits.txt
3 Verify the staging area is empty using the git status
command again.
git status
On branch master
nothing to commit, working tree clean
Note how the output says nothing to commit
which means the staging area is now empty.
Click the Commit
button, enter a commit message (e.g. add fruits.txt
) into the text box, and click Commit
.
Git commits form a timeline, as each corresponds to a point in time when you asked Git to take a snapshot of your working directory. Each commit links to at least one previous commit, forming a structure that we can traverse.
A timeline of commits is called a branch. By default, Git names the initial branch master
-- though many now use main
instead. You'll learn more about branches in future lessons. For now, just be aware that the commits you create in a new repo will be on a branch called master
(or main
) by default.
gitGraph %%{init: { 'theme': 'default', 'gitGraph': {'mainBranchName': 'master (or main)'}} }%% commit id: "Add fruits.txt" commit id: "Update fruits.txt" commit id: "Add colours.txt" commit id: "..."
Git can show you the list of commits in the Git history.
1 View the list of commits, which should show just the one commit you created just now.
You can use the git log
command to see the commit history.
git log
commit d5f91de... (HEAD -> master)
Author: ... <...@...>
Date: ...
Add fruits.txt
Use the Q key to exit the output screen of the git log
command.
Note how the output has some details about the commit you just created. You can ignore most of it for now, but notice it also shows the commit message you provided.
Expand the BRANCHES
menu and click on the master
to view the history graph, which contains only one node at the moment, representing the commit you just added. For now, ignore the label master
attached to the commit.
2 Create a few more commits (i.e., a few rounds of add/edit files -> stage -> commit), and observe how the list of commits grows.
Here is a example list of bash commands to add two commits while observing the list of commits
$ echo "figs" >> fruits.txt # add another line to fruits.txt
$ git add fruits.txt # stage the updated file
$ git commit -m "Insert figs into fruits.txt" # commit the changes
$ git log # check commits list
$ echo "a file for colours" >> colours.txt # add a colours.txt file
$ echo "a file for shapes" >> shapes.txt # add a shapes.txt file
$ git add colours.txt shapes.txt # stage both files in one go
$ git commit -m "Add colours.txt, shapes.txt" # commit the changes
$ git log # check commits list
The output of the final git log
should be something like this:
commit 18300... (HEAD -> master)
Author: ... <...@...>
Date: ...
Add colours.txt, shapes.txt
commit 2beda...
Author: ... <...@...>
Date: ...
Insert figs into fruits.txt
commit d5f91...
Author: ... <...@...>
Date: ...
Add colours.txt, shapes.txt
To see the list of commits, click on the History
item (listed under the WORKSPACE
section) on the menu on the right edge of Sourcetree.
After adding two more commits, the list of commits should look something like this:
EXERCISE: grocery-shopping
What you learned: You should now be able to initialise a Git repository in a folder and commit snapshots of its files at times of your choice. So far, you did not learn how to actually make use of those snapshots (other than to show a list of them) -- we will do that in later tours.
What's next: Tour 2: Backing up a Repo on the Cloud
Destination: To be able to back up a Git repository on a cloud-based Git service such as GitHub.
Motivation: One (of several) benefits of maintaining a copy of a repo on a cloud server: it acts as a safety net (e.g., against the folder becoming inaccessible due to a hardware fault).
Lesson plan:
→ Lesson: Remote Repositories covers that part.
→ Lesson: Preparing to use GitHub covers that part.
→ Lesson: Creating a Repo on GitHub covers that part.
→ Lesson: Linking a Local Repo With a Remote Repo covers that part.
→ Lesson: Updating the Remote Repo covers that part.
→ Lesson: Omitting files from revision control covers that part.
To back up your Git repo on the cloud, you’ll need to use a remote repository service, such as GitHub.
A repo you have on your computer is called a local repo. A remote repo is a repo hosted on a remote computer and allows remote access. Some use cases for remote repositories:
It is possible to set up a Git remote repo on your own server, but an easier option is to use a remote repo hosting service such as GitHub, GitLab, or BitBucket. In our case, we will be using GitHub.
The first step of backing up a local repo on GitHub: create an empty repository on GitHub.
You can create a remote repository based on an existing local repository, to serve as a remote copy of you local repo. For example, suppose you created a local repo and worked with it for a while, but now you want to upload it onto GitHub. The first step is to create an empty repository on GitHub.
1 Login to your GitHub account and choose to create a new repo.
2 In the next screen, provide a name for your repo but keep the Initialize this repo ...
tick box unchecked.
3 Note the URL of the repo. It will be of the form https://github.com/{your_user_name}/{repo_name}.git
.
e.g., https://github.com/johndoe/foobar.git
(note the .git
at the end)
EXERCISE: remote-control
The second step of backing up a local repo on GitHub: link the local repo with the remote repo on GitHub.
A Git remote is a reference to a repository hosted elsewhere, usually on a server like GitHub, GitLab, or Bitbucket. It allows your local Git repo to communicate with another remote copy — for example, to upload to commits that you created locally but missing in the remote copy.
By adding a remote, you are informing the local repo details of a remote repo it can communicate with, for example, where the repo exists, what name to use to refer to the remote, and which network protocol to use to communicate with it (e.g., HTTPS vs SSH).
Add the empty remote repo you created on GitHub as a remote of a local repo you have.
1 In a terminal, navigate to the folder containing the local repo.
2 List the current list of remotes using the git remote -v
command, for a sanity check. No output is expected if there are no remotes yet.
3 Add a new remote repo using the git remote add
command.
command: git remote add {remote_name} {remote_repo_url}
e.g., git remote add origin https://github.com/johndoe/foobar.git
4 List the remotes again to verify the new remote was added.
git remote -v
origin https://github.com/johndoe/foobar.git (fetch)
origin https://github.com/johndoe/foobar.git (push)
The same remote will be listed twice, to show that you can do two operations (fetch
and push
) using this remote. You can ignore that for now. The important thing is the remote you added is being listed.
1 Open the local repo in Sourcetree.
2 Choose Repository
→ Repository Settings
menu option.
3 Add a new remote to the repo with the following values.
Remote name
: the name you want to assign to the remote repo e.g., upstream1
URL/path
: the URL of your repo (ending in .git
) that e.g., https://github.com/johndoe/foobar.git
Username
: your GitHub usernameThe third step of backing up a local repo on GitHub: push a copy of the local repo to the remote repo.
You can push content of one repository to another. Pushing can transfer Git history (e.g., past commits) as well as files in the working directory. Note that pushing to a remote repo requires you to have write-access to it.
When pushing to a remote repo, you typically need to specify the following information:
origin
).master
).If this is the first time you are pushing this branch to the remote repo, you can also ask Git to track this remote/branch pairing (e.g., remember that this local master
branch is tracking the master
branch in the upstream repo origin
i.e., local master
branch is tracking upstream origin/master
branch), so in future you can push the same remote/branch without needing to specify them again.
Here's how you can push the content of a local repo to an empty remote repo (assuming you already have a local repo that is connected to an empty remote repo, from previous hands-on practicals):
# format: git push -u <remote-repo-name> <branch-name>
git push -u origin master
Explanation:
push
: the Git sub-command that pushes the current local repo content to a remote repoorigin
: name of the remotemaster
: branch to push-u
(or --set-upstream
): the flag that tells Git to track that this local master
is tracking origin/master
branchClick the Push
button on the main menu, ensure the settings are as follows in the next dialog, ensure the Track
option is selected, and click the Push
button on the dialog.
The push command can be used repeatedly to send further updates to another repo e.g., to update the remote with commits you created since you pushed the first time.
Add a few more commits to your local repo, and push those commits to the remote repo, as follows:
1 Commit some changes in your local repo.
2 Push the new commits to your fork on GitHub
Any of the following commands should work:
git push origin master
git push origin
master
branch)git push
origin
)Click the Push
button on the main menu, ensure the settings are as follows in the next dialog, and click the Push
button on the dialog.
Note that you can push between two repos only if those repos have a shared history among them (i.e., one should have been created by copying the other).
EXERCISE: push-over
DETOUR: Pushing to multiple repos
You can push to any number of repos, as long as the target repos and your repo have a shared history.
upstream
, central
, production
, myOtherRemote
...), if you haven't done so already.e.g., git push myOtherRemote master
.
Git allows you to specify which files should be omitted from reversion control.
You can specify which files Git should omit from reversion control. While you can always omit files from revision control simply by not staging them, having an 'ignore-list' is more convenient, especially if there are files inside the working folder that are not suitable for revision control (e.g., temporary log files) or files you want to prevent from accidentally including in a commit (files containing confidential information).
A repo-specific ignore-list of files can be specified in a .gitignore
file, stored in the root of the repo folder.
1 Add a file into your repo's working folder that you presumably do not want to revision-control e.g., a file named temp.txt
. Observe how Git has detected the new file.
2 Configure Git to ignore that file:
Create a file named .gitignore
in the working directory root and add the following line in it.
temp.txt
The file should be currently listed under Unstaged files
. Right-click it and choose Ignore…
. Choose Ignore exact filename(s)
and click OK
.
Observe that a file named .gitignore
has been created in the working directory root and has the following line in it.
temp.txt
The .gitignore
file
The .gitignore
file tells Git which files to ignore when tracking revision history. That file itself can be either revision controlled or ignored.
To version control it (the more common choice – which allows you to track how the .gitignore
file changes over time), simply commit it as you would commit any other file.
To ignore it, follow the same steps you followed above when you set Git to ignore the temp.txt
file.
It supports file patterns e.g., adding temp/*.tmp
to the .gitignore
file prevents Git from tracking any .tmp
files in the temp
directory.
More information about the .gitignore
file: git-scm.com/docs/gitignore
Files recommended to be omitted from version control
*.class
, *.jar
, *.exe
(reasons: 1. no need to version control these files as they can be generated again from the source code 2. Revision control systems are optimized for tracking text-based files, not binary files.What you learned: You should now be able to creat a copy of your repo on GitHub, and keep it updated as you add more commits to your local repo. If something goes wrong with your local repo (e.g., disk crash), you can now recover the repo using the remote repo (this tour did not cover how exactly you can do that -- it will be covered in a future tour).
What's next: Tour 3: Using the Revision History of a Repo
Destination: To be able to make use of the revision history stored by Git.
Motivation: Having put in effort to record the revision history of the working folder, it only makes sense that we use the revision history to our benefit. For example, to be able to answer questions such as "What did I change in this file since last Monday?"
Lesson plan:
→ Lesson: Examining the Revision History covers that part.
→ Lesson: Traversing to a Specific Snapshot covers that part.
→ Lesson: Tagging Commits covers that part.
→ Lesson: Comparing Points of History covers that part.
It is useful to be able to visualise the commits timeline, aka the revision graph.
The Git data model consists of two types of entities: objects and refs (short for _references). In this lesson, you will encounter examples of both.
A Git revision graph is visualisation of a repo's revision history, contains examples of both objects and refs. First, let us learn to work with simpler revision graphs consisting of one branch, such as the one given below.
Nodes in the revision graph represent commits.
f761ea63738a67258628e9e54095b88ea67d95e2
) that acts like a fingerprint, ensuring that every commit can be referenced unambiguously.Edges in the revision graph represent links between a commit and its parent commit(s) In some revision graph visualisations, you might see arrows (instead of lines) showing how each commit points to its parent commit.
Git uses refs to name and keep track of various points in a repository’s history. These refs are essentially 'named-pointers' that can serve as bookmarks to reach a certain point in the revision graph using the ref name.
In the revision graph above, there are two refs ← master and ← HEAD.
master
branch), and moves together with the branch ref.In the revision graph above you see a third type of ref (↖ origin/master). This is a remote tracking branch ref that represents the state of a branch in a remote repository (if you previously set up the branch to track a remote branch). In this example, the master
branch in the remote origin
is also at the commit C3
(which means you have not created new commits after you pushed to the remote).
If you now create a new commit C4
, the state of the revision graph will be as follows:
Explanation: When you create C4
, the current branch master
move to point to C4
, and HEAD
moves along with it. However, the master
branch in the remote origin
remains at C3
(because you have not pushed C4
yet).
Let us use Git features to examine the revision graph of a simple repo. For this, use a repo with just a few commits and only one branch for this hands-on practical.
1 First, use a simple git log
to view the list of commits.
git log
commit f761ea63738a... (HEAD -> master, origin/master)
Author: ... <...@...>
Date: Sat ...
Add colours.txt, shapes.txt
commit 2bedace69990...
Author: ... <...@...>
Date: Sat ...
Add figs to fruits.txt
commit d5f91de5f0b5...
Author: ... <...@...>
Date: Fri ...
Add fruits.txt
For comparison, given below the visual representation of the same revision graph. As you can see, the log
output shows the refs slightly differently, but it is not hard to see what they mean.
2 Use the --oneline
flag to get a more concise view. Note how the commit SHA has been truncated to first seven characters (first seven characters of a commit SHA is enough for Git to identify a commit).
git log --oneline
f761ea6 (HEAD -> master, origin/master) Add colours.txt, shapes.txt
2bedace Add figs to fruits.txt
d5f91de Add fruits.txt
3 The --graph
flag makes the result closer to a graphical revision graph. Note the *
that indicates a node in a revision graph.
git log --oneline --graph
* f761ea6 (HEAD -> master, origin/master) Add colours.txt, shapes.txt
* 2bedace Add figs to fruits.txt
* d5f91de Add fruits.txt
The --graph
option is more useful when examining a more complicated revision graph consisting of multiple parallel branches.
Click the History
to see the revision graph.
HEAD
ref may not be shown -- it is implied that the HEAD
ref is pointing to the same commit the currently active branch ref is pointing.origin/master
) is not showing up, you may need to enable the Show Remote Branches
option....
Git can load a specific version of the history to the working directory. Note that if you have uncommitted changes in the working directory, you need to stash them first to prevent them from being overwritten.
Use the checkout <commit-identifier>
command to change the working directory to the state it was in at a specific past commit.
git checkout v1.0
: loads the state as at commit tagged v1.0
git checkout 0023cdd
: loads the state as at commit with the hash 0023cdd
git checkout HEAD~2
: loads the state that is 2 commits behind the most recent commitFor now, you can ignore the warning about ‘detached HEAD’.
If you checkout a commit that comes before the commit in which you added the .gitignore
file, Git will now show ignored files as ‘unstaged modifications’ because at Git hasn’t been told to ignore those files.
Double-click the commit you want to load to the working directory, or right-click on that commit and choose Checkout...
.
Click OK
to the warning about ‘detached HEAD’ (similar to below).
The specified version is now loaded to the working folder, as indicated by the HEAD
label. HEAD
is a reference to the currently checked out commit.
If you checkout a commit that comes before the commit in which you added the .gitignore
file, Git will now show ignored files as ‘unstaged modifications’ because at Git hasn’t been told to ignore those files.
To go back to the latest commit, double-click it.
...
Each Git commit is uniquely identified by a hash e.g., d670460b4b4aece5915caf5c68d12f560a9fe3e4
. As you can imagine, using such an identifier is not very convenient for our day-to-day use. As a solution, Git allows adding a more human-readable tag to a commit e.g., v1.0-beta
.
Here's how you can tag a commit in a local repo:
To add a tag to the current commit as v1.0
:
$ git tag v1.0
To view tags:
$ git tag
To learn how to add a tag to a past commit, go to the ‘Git Basics – Tagging’ page of the git-scm book and refer the ‘Tagging Later’ section.
Right-click on the commit (in the graphical revision graph) you want to tag and choose Tag…
.
Specify the tag name e.g. v1.0
and click Add Tag
.
The added tag will appear in the revision graph view.
After adding a tag to a commit, you can use the tag to refer to that commit, as an alternative to using the hash.
Annotated vs Lightweight Tags: The Git tags explained above are known as lightweight tags. There is another type of Git tags called annotated tags. See git-scm.com/book for more info.
Tags are different from commit messages, in purpose and in form. A commit message is a description of the commit that is part of the commit itself. A tags is a short name for a commit, which exists as a separate entity that points to a commit.
...
Git can show you what changed in each commit.
$ git show < part-of-commit-hash >
Example:
$ git show 5bc0e306
commit 5bc0e30635a754908dbdd3d2d833756cc4b52ef3
Author: … < … >
Date: Sat Jul 8 16:50:27 2017 +0800
fruits.txt: replace banana with berries
diff --git a/fruits.txt b/fruits.txt
index 15b57f7..17f4528 100644
--- a/fruits.txt
+++ b/fruits.txt
@@ -1,3 +1,3 @@
apples
-bananas
+berries
cherries
To see which files changed in a commit, click on the commit. To see what changed in a specific file in that commit, click on the file name.
Git can also show you the difference between two points in the history of the repo.
The diff
command can be used to view the differences between two points of the history.
git diff
: shows the changes (uncommitted) since the last commit.git diff 0023cdd..fcd6199
: shows the changes between the points indicated by commit hashes.git diff v1.0..HEAD
: shows changes that happened from the commit tagged as v1.0
to the most recent commit.Select the two points you want to compare using Ctrl+Click
. The differences between the two selected versions will show up in the bottom half of Sourcetree, as shown in the screenshot below.
The same method can be used to compare the current state of the working directory (which might have uncommitted changes) to a point in the history.
What you learned: ...
What's next: coming soon ...
Guidance for the item(s) below:
As you are likely to be using an IDE for the iP, let's learn at least enough about IDEs to get you started using one.
🤔 In case you are puzzled by the sudden change of topic, it's because we take an iterative approach to covering topics, as explained in the panel below:
Professional software engineers often write code using Integrated Development Environments (IDEs). IDEs support most development-related work within the same tool (hence, the term integrated).
An IDE generally consists of:
Examples of popular IDEs:
Some web-based IDEs have appeared in recent times too e.g., Amazon's Cloud9 IDE.
Some experienced developers, in particular those with a UNIX background, prefer lightweight yet powerful text editors with scripting capabilities (e.g. Emacs) over heavier IDEs.
Guidance for the item(s) below:
As you start adding features to your project iteratively, you'll need a way to detect if the new code breaks the existing code. Next, let's learn a rather simple way to do that using a certain type of testing (we'll be learning more sophisticated methods in later weeks).
This also means we are now switching focus from the implementation aspect to the testing aspect of SE.
Testing: Operating a system or component under specified conditions, observing or recording the results, and making an evaluation of some aspect of the system or component. –- source: IEEE
When testing, you execute a set of test cases. A test case specifies how to perform a test. At a minimum, it specifies the input to the software under test (SUT) and the expected behavior.
Example: A minimal test case for testing a browser:
longfile.html
located in the test data
folder.longfile.html
.Test cases can be determined based on the specification, reviewing similar existing systems, or comparing to the past behavior of the SUT.
For each test case you should do the following:
A test case failure is a mismatch between the expected behavior and the actual behavior. A failure indicates a potential defect (or a bug) -- we say 'potential' because the error could be in the test case itself.
Example: In the browser example above, a test case failure is implied if the scrollbar remains disabled after loading longfile.html
. The defect/bug causing that failure could be an uninitialized variable.
When you modify a system, the modification may result in some unintended and undesirable effects on the system. Such an effect is called a regression.
Regression testing is the re-testing of the software to detect regressions. The typical way to detect regressions is retesting all related components, even if they had been tested before.
Regression testing is more effective when it is done frequently, after each small change. However, doing so can be prohibitively expensive if testing is done manually. Hence, regression testing is more practical when it is automated.
An automated test case can be run programmatically and the result of the test case (pass or fail) is determined programmatically. Compared to manual testing, automated testing reduces the effort required to run tests repeatedly and increases precision of testing (because manual testing is susceptible to human errors).
A simple way to semi-automate testing of a CLI (Command Line Interface) app is by using input/output re-direction. Here are the high-level steps:
Let's assume you are testing a CLI app called AddressBook
. Here are the detailed steps:
Store the test input in the text file input.txt
.
Example input.txt
Store the output you expect from the SUT in another text file expected.txt
.
Example expected.txt
Run the program as given below, which will redirect the text in input.txt
as the input to AddressBook
and similarly, will redirect the output of AddressBook
to a text file output.txt
. Note that this does not require any changes in AddressBook
code.
java AddressBook < input.txt > output.txt
The way to run a CLI program differs based on the language.
e.g., In Python, assuming the code is in AddressBook.py
file, use the command
python AddressBook.py < input.txt > output.txt
If you are using Windows, use a normal MS-DOS terminal (i.e., cmd.exe
) to run the app, not a PowerShell window.
Next, you compare output.txt
with the expected.txt
. This can be done using a utility such as Windows' FC
(i.e. File Compare) command, Unix's diff
command, or a GUI tool such as WinMerge.
FC output.txt expected.txt
Note that the above technique is only suitable when testing CLI apps, and only if the exact output can be predetermined. If the output varies from one run to the other (e.g. it contains a time stamp), this technique will not work. In those cases, you need more sophisticated ways of automating tests.
Follow up notes for the item(s) above:
Congrats! You've made it to the end of this week's topics. It feels like a lot right now but now that we got an early start, this stuff will be second nature to you by the time you are done with the semester. 😃