Git Data Model

There are many ways to use the Git, but there are only 8 simple core models that consist of Git’s foundation. Without understanding this model, you may not be able to use the Git as you are supposed to. I hear “Git is not easy, but in my opinion, those people are not understanding Git’s model. It is actually very simple, so I will explain the core models that consist of Git and relations between objects related to Git’s basic commands.

Git Basic Concepts

You can take apart the basic concepts of Git into these two:

  • GitObject
  • Reference

GitObject is the object Git manages. Commit is the GitObject. Files stored under the objects that appears in the .git which is the Git repository, is the GitObject. GitObject is shown with the first 2 characters as subdirectory and the remaining character as filename based on the hashed strings of its contents.

Reference refers to Commit. Branch is the Reference. When you open the .git which is the Git repository, the files stored under refs are the Reference. Reference contents is Commit ID.

This is the model of these basic core concepts and relations between them.

Git data model around Commit with Astah

    • Commit – this is a concept to show the commit graph. Commit contents include not only the comment but also former Commit, Trees that show the structure of file directories, User, which is the committer and author, date committed and file timestamp. When you execute rebase or cherry-pick, commit --amend, it changes Commit ID because it changes the former commit and its file timestamp.

 

    • User – This is a concept to show the commit-related person. It shows in user icon in GitHub. User’s email is used.

 

    • Tree – This is a concept to show the tree. Tree has Tree that shows subdirectory and Blob that shows files. There is a name to refer to both the Tree and Blob.

 

    • Blob – This is a concept to show files. All the files in the working tree is copied. .git stores all the files that have been changed.Because Blob objects are the files, the size of .git/objects tends to become large. To prevent it becoming too large, it compresses the contents and also uses hash on filename for the same contents without making a new separate file.

 

    • Branch – branch itself, can be branch of the local repository and the remote repository. Reference under .git/refs/heads is the branch of local one and Reference under .git/refs/remotes is the remote’s. The refs specified in .git/HEAD is the the branch you are currently on. When you run the commit while you are checking out the branch, it follows the commit, but it is just only updating the contents of the branch pointed by .git/HEAD.

 

  • Tag – This is the Reference under .git/refs/tags.

Git has so many features to offer, but these are the only core models and objects based on these concepts are working together.

Astah_Git2

Git really gives meaning to each object by its filenames and directories. I made models about 4 frequently-used commands in Git, clone, fetch, push and pull.

clone

Astah Git Clone

When you run the command clone, it gets GitObject and Reference from the specified remote repository and creates master branch in the local branch.

fetch

Git Fetch

When you run the command fetch, it gets GitObject and Reference from specified remote repository. It only updates the remote branch in the local repository. The local branch itself will not be updated.

push

Git Push

When you run the command push, it sends GitObject and Reference to the specified remote repository. If it is not the fast forward push with remote repository’s branch and local, it returns an error. push -f will skip the check if it is the fast forward or not. When you specify the branch, it pushes it and also GitObject that is related to the branch.

pull

Git Pull

When you run the command pull, it gets GitObject and Reference from specified remote repository and then merges the remote branch to the local.

Repository Setup

Repository setting is configured in the .git/config. When you open it, it should show as below.


[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
        ignorecase = true
        precomposeunicode = false
[remote "origin"]
        url = git@github.com:ChangeVision/astah.git
        fetch = +refs/heads/*:refs/remotes/origin/*
[ branch "master" ]
        remote = origin
        merge = refs/heads/master

In this sample, it is defined as below:

Setting up repository

You can find it around [core]. You can see this is created as non-bare repository from bare = false.

Name and URL of remote repository

You can find it around [remote "origin"]. URL of the remote repository named origin is git@github.com:ChangeVision/astah.git and fetch target is +refs/heads/*:refs/remotes/origin/*.

Upstream tracking branch between local and remote branch

You can find it around [ branch "master"]. Upstream of master branch is set as origin‘s refs/heads/master.

So this was my brief explanation about Git’s core concepts. Hope you find that Git is actually simple.
I believe modeling helps to visualize and understand the relations of concepts in the system that are hard to find clearly just by reading the documents.

How about you try modeling the concepts of any tools you are using? It will help you understand the system deeper and find the better way to make a use of the system!

Reference

GitBook

I used Astah to create these diagrams.
If you want to view diagrams and directly edit, you can download the file from here and open it using Astah (Free version Astah Community or 50-day free trial of Astah Professional edition).

Leave a comment