Day 006: Git from scratch

In modern software development, managing code across multiple versions, with multiple collaborators and, support multiple platforms is critical.
Git, is a distributed software version control software. It has become a de-facto standard of Version Control Softwares.

Software Version Management

Software Version Management(SVM) is the practice of tracking and controlling changes to various files in the source code and manage those integration while supporting various collaborators.

A Software Version Management Software should:

Track changes to files
Revert back to a previous version
Offer seamless collaboration capabilities

Introduction to Git

Git is a distributed version control system created by Linus Torvalds in 2005. It was created to manage the Linux kernel source code after he(Linus) faced limitations working with BitKeeper, another VCS.

The first version of Git was designed to be simple, with an emphasis on speed and efficiency on handling very large code bases like the Linux kernel source itself.

Git gained widespread adoption when in 2008 platforms like Github launched making it easier for developers to collaborate on Open-Source repositories.

Git Commands

Git provides a vast array of commands to manage a code base, but some commands are used frequently and make the core of Git. Understanding these can offer insight into Git’s internal working.

git init: This command is used to create a new Git repository in the current directory. It created a folder .git which contains all the internal files needed by git to track changes and manage the code base.
git add: This command is used to specify which all files are needed to be tracked and included in the next snapshot(commit).
git commit: Commits are like the heart of Git, it takes the staged changes and other metadata like current timestamp, user name and email, commit message and a unique hash. It stores the status of the current code base as a snapshot.
git push: It is used to push the code updates to a remote repository allowing other collaborators to pull those changes.
git pull: It is used to pull the latest changes from a remote repository and merge them to local repository.
git clone: Cloning is used to create a local copy of a remote repository, it downloads all of the project’s history and branches.

Git Internals

While Git feels easy and intuitive to use, under-the-hood it has a sophisticated and well built system to ensure robust software version control.

Git’s Data Model

At it’s core Git’s data model consists of 3 main types of objects/data structures.

Blob: A Binary Large Object(BLOB) is used to store the file contents. Every files added or tracked in Git is stored as a blob.
Tree: A tree object represents a directory structure in which reference to every file blob is present. It also contain reference to other trees.
Commit: A commit object records a snapshot into the current state of tracked files. It contains metadata such as current timestamp, username, commit message and all file states.

The SHA-1 Hash

Each object in Git is identified by a 20 byte or 40 character hash value. A hash is used to ensure integrity of data and its easy to compute for any size of file.

If any part to the file every if one single bit changes the hash value will change drastically, this ensures any file lookup to be possible and also for tracking changes.

Staging Area

The index or the staging area in Git is used to maintain a middle ground between the current status of the working directory and the commit history. Files are added to index when added through git add. Then when a commit is created with git commit the files in the index are taken into consideration and added to commit.

This separation allows developers to prepare files before commiting.

Branching

Git allows us to create branches which acts as independent lines of development and HEAD points to the current line of development.

Branches can be changed using git checkout or git switch.

HEAD: It refers to the current branch or commit which working directory is on.
Branch: It is simply an independent pointer to any point of time in commit history which can be incremented of moved independently.

Merging and Rebasing

When working on multiple branches Git provides mechanisms to combine the work.

Merging: Git merges two branches by creating a new commit that has two parent commits from each branches. If there are conflicts Git will alert the user to resolve them before completing the merge.
Rebasing: Rebasing rewrites the commit history by placing current changes on top of another branch. This results in a cleaner, linear history but requires more caution as it can rewrite commits if not used carefully.

Conclusion

Git is an incredibly powerful and versatile version control system which is often the best fit to manage both small and very large projects alike, and provide great collaborating features for parallel development of features or bug fixed which can later be merged to main branch.

Advanced commands and Git under-the-hood can be found here.

Lakshyajeet.