This is a simple version, but It does work, we have all the basic most used capabilities, I built another version where I used to move files physically, it was more of a fancy file mover LOL, but this one is like the actual git.
Also, some things might not work as expected, so apologies from my side, I am also learning things and building, but any suggestions are appreciated
Not only I built the git, It also have cli command capabilities
Credits/References : I used lots of references to build this, I don't wanna lie and say I built this by myself and took no help and the code just came to my mind, I used chat-gpt to debug, read the blogs:
https://benhoyt.com/writings/pygit/
https://wyag.thb.lt/
I spent 2 days just reading all the blogs about how others build theirs, and what language would be best for my case etc.
This is the third project where I did not watch any YouTube tutorials to build a project, all was done by reading articles, actual Python docs, and blogs, slowly I am getting into the habit of reading pdf now, its more fun, and faster
To use it : GitHub Link
The commands are mentioned on my GitHub.
use the mygit_cli.py version to test CLI's , there is another python file there called mygit.py , it does not have cli , but it uses user user print commands
MyGit CLI Overview:
File System: MyGit works with your local file system, tracking changes and creating snapshots.
Git Data Structures:
Tree: Represents the directory structure of your project.
Commit: Snapshots of your project at specific points in time.
Blob: Individual file contents stored as objects.
Hashing:
- SHA-1 Hashing: Securely identifies objects by creating unique hash values based on their content.
Commands:
mygit init
: Initializes a new MyGit repository.mygit add
: Stages changes for commit, creating Blobs and updating the Tree.mygit commit
: Creates a new Commit by referencing the current Tree.mygit create_branch
: Adds new branches.mygit switch_branch
: Switches between branches.mygit log
: Displays the commit history.
How MyGit Works - Step by Step:
Initialization (
mygit init
):Creates an empty MyGit repository with essential directories.
Initializes the default 'master' branch.
Adding Changes (
mygit add
):Stages changes by creating Blobs for each file.
Updates the Tree to reflect the current directory structure.
Committing (
mygit commit
):Creates a Commit object:
References the current Tree.
Records authorship and commit message.
Optionally references the parent commit for history.
Updates the current branch reference to the new Commit.
Branching (
mygit create_branch
):Adds a new branch, creating a branch reference.
Allows parallel development on separate branches.
Switching Branches (
mygit switch_branch
):Changes the current working branch.
Enables developers to work on different features or versions.
Viewing Commit History (
mygit log
):Lists available branches.
Displays detailed commit history for the selected branch.
Code working and Explanation:
ayo master, this is the main stuff you came for, also gib me moni and job if you like me
REPO_PATH = "mygit"
Input: There is no direct input to this constant. It simply defines the path to the Git repository.
Output: No output is produced. It's a constant value used throughout the code.
Algorithm (Initialization):
- The constant
REPO_PATH
is set to "mygit," which represents the path to the Git repository.
- The constant
Mathematics and Intuition:
REPO_PATH
serves as a placeholder for the path to the Git repository, making it easier to reference throughout the code.
TreeEntry
Class:
class TreeEntry:
def __init__(self, name, is_directory, hash):
self.name = name
self.mode = "40000" if is_directory else "100644"
self.hash = hash
Input:
name
: Name of the file or directory.is_directory
: Boolean indicating whether it's a directory (True) or a file (False).hash
: Hash of the file content (if it's a file) or the tree object (if it's a directory).
Output: An instance of the
TreeEntry
class is created with the specified attributes.Algorithm (Initialization):
The
__init__
method initializes aTreeEntry
object with three attributes:name
,mode
, andhash
.name
represents the name of the file or directory.is_directory
is a boolean flag that determines if the entry is a directory or a file.hash
stores the hash associated with the object (file content hash for files and tree object hash for directories).The
mode
attribute is determined based on whether the entry is a directory or a file.
Mathematics and Intuition:
The
TreeEntry
class encapsulates file and directory information in Git's tree structure.It abstracts the complexity of storing mode, name, and hash for each entry.
The
mode
attribute is set based on whether the entry represents a directory or a file, following Git's conventions.
Tree
Class:
class Tree:
def __init__(self):
self.entries = {}
def hash(self):
data = []
for name, entry in sorted(self.entries.items()):
data.append(f"{entry.mode} {name}\0{entry.hash}")
return hash_object("".join(data).encode(), "tree")
Input: The
Tree
class has no direct input. It initializes an empty tree with anentries
dictionary.Output: The
hash
method returns the SHA-1 hash of the tree object represented by theentries
.Algorithm (Initialization):
- The
__init__
method initializes an emptyTree
object with an emptyentries
dictionary.
- The
Algorithm (Hash Calculation -
hash
method):To calculate the hash of the
Tree
object, follow these steps:Create an empty list
data
to store data entries.Iterate through the
entries
dictionary, sorted by entry name.For each entry, concatenate its mode, name, and hash with null characters in between.
Append the concatenated entry to the
data
list.Join all entries in
data
into a single string and encode it.Compute the SHA-1 hash of the encoded data using the
hash_object
function.Return the computed SHA-1 hash.
Mathematics and Intuition:
The
Tree
class represents a Git tree object, which organizes file and directory entries within a Git repository.The
hash
method calculates a unique identifier (hash) for the tree object by concatenating the mode, name, and hash of each entry. This hash is crucial for tracking and managing changes within the repository.
hash_object
Function:
hash_object(data, obj_type):
header = f"{obj_type} {len(data)}\0"
full_data = header.encode() + data
sha1_hash = hashlib.sha1(full_data).hexdigest()
with open(f"{REPO_PATH}/objects/{sha1_hash}", "wb") as f:
f.write(full_data)
return sha1_hash
Input:
data
: The data to be hashed, typically the content of a Git object (e.g., file content or tree structure).obj_type
: The type of the Git object, such as "blob" for file content or "tree" for tree structure.
Output: The function returns the SHA-1 hash of the provided data.
Algorithm:
To calculate the SHA-1 hash of the provided data for a Git object, follow these steps:
Construct a header string that includes the object type and the length of the data, separated by a null character.
Combine the header and data by encoding them as bytes and storing them in the
full_data
variable.Calculate the SHA-1 hash of the
full_data
variable.Create a file in the Git repository's "objects" directory with the SHA-1 hash as the filename, and write the full object content to the file.
Return the computed SHA-1 hash.
Mathematics and Intuition:
The
hash_object
function generates a unique identifier (hash) for Git objects, which is essential for Git's version control system.It constructs a header that includes metadata about the object type and data length.
The SHA-1 hash is computed over the combined data, resulting in a unique identifier for the Git object.
This hash is then used for storing and retrieving Git objects within the repository.
create_tree
Function:
create_tree():
tree = Tree()
for root, dirs, files in os.walk(".", topdown=True):
subtree = Tree()
for dir_name in dirs:
subtree.entries[dir_name] = TreeEntry(dir_name, True, "")
for file_name in files:
file_path = os.path.join(root, file_name)
file_content = open(file_path, "rb").read()
file_hash = hash_object(file_content, "blob")
subtree.entries[file_name] = TreeEntry(file_name, False, file_hash)
dir_name = os.path.relpath(root, start=".")
if dir_name == ".":
dir_name = ""
tree.entries[dir_name] = TreeEntry(dir_name, True, subtree.hash())
return tree.hash()
Input: None
Output: The function returns the SHA-1 hash of a tree object that represents the current state of the working directory.
Algorithm:
Create an empty
tree
object.Traverse the directory structure using
os.walk
, starting from the current directory (".").For each directory encountered, create a
subtree
object and add entries for subdirectories (with empty hashes).For each file encountered, calculate the SHA-1 hash of its content using
hash_object
and add it to thesubtree
.Determine the relative directory path (
dir_name
) and add an entry for it in thetree
, pointing to thesubtree
object.Repeat steps 3-5 for all directories and files in the working directory.
Return the SHA-1 hash of the
tree
object.
Mathematics and Intuition:
The
create_tree
function generates a tree object that represents the current state of the working directory in a Git repository.It constructs the tree by recursively traversing the directory structure and calculating the SHA-1 hash of file contents.
The tree's SHA-1 hash serves as a unique identifier for the state of the working directory at a specific point in time.
commit
Function:
commit(args):
repo_path = args.repo_path # Get the repository path from args
tree_hash = create_tree()
# Determine the current branch dynamically if not provided
current_branch = args.branch
if current_branch is None:
current_branch = get_current_branch(repo_path)
parent_commit = get_head_commit(current_branch)
commit_data = f"tree {tree_hash}\nauthor {args.author}\ncommitter {args.author}\n\n{args.message}\n"
if parent_commit:
commit_data += f"parent {parent_commit}\n"
commit_hash = hash_object(commit_data.encode(), "commit")
# Update the branch reference to the new commit
branch_ref_path = os.path.join(repo_path, "refs", "heads", current_branch)
with open(branch_ref_path, "w") as ref_file:
ref_file.write(commit_hash)
print(f"Committed: {commit_hash[:7]} to branch {current_branch}")
return current_branch
Input:
args
: A dictionary-like object containing information about the commit, including author, message, and optional branch name.
Output: The function returns the name of the branch to which the commit was made.
Algorithm:
Get the repository path from the provided
args
.Calculate the SHA-1 hash of the tree object representing the current state of the working directory using the
create_tree
function.Determine the current branch dynamically based on the provided branch name or the default branch name "master" using the
get_current_branch
function.Get the SHA-1 hash of the head commit on the current branch (if it exists) using the
get_head_commit
function.Construct commit data:
Include the tree hash, author, committer, and commit message.
Optionally, include the parent commit hash if there is a previous commit on the branch.
Calculate the SHA-1 hash of the commit data using the
hash_object
function.Update the branch reference to point to the new commit in the repository.
Print a confirmation message indicating the commit's SHA-1 hash and the branch it was committed to.
Return the name of the current branch.
Mathematics and Intuition:
The
commit
function is responsible for creating a new commit in a Git repository.It calculates the SHA-1 hash of the tree object representing the current state of the working directory and constructs commit data that includes this tree hash, author information, committer information, and a commit message.
If it's not the initial commit, it also includes a reference to the parent commit.
The resulting commit object is identified by its SHA-1 hash, and the branch reference is updated to point to this new commit, effectively advancing the branch.
get_head_commit
Function:
get_head_commit(branch="master"):
head_file_path = os.path.join(REPO_PATH, "refs", "heads", branch)
if os.path.isfile(head_file_path):
with open(head_file_path, "r") as ref_file:
return ref_file.read().strip()
return None
Input:
branch
(optional): The name of the branch to retrieve the head commit from. If not provided, it defaults to "master."
Output: The function returns the SHA-1 hash of the commit at the head of the specified branch, or
None
if the branch doesn't exist.Algorithm:
Determine the path to the branch reference file based on the provided or default branch name.
If the branch reference file exists: a. Read the content of the branch reference file, which contains the SHA-1 hash of the head commit. b. Return the SHA-1 hash.
If the branch reference file doesn't exist (branch doesn't exist), return
None
.
Mathematics and Intuition:
The
get_head_commit
function retrieves the SHA-1 hash of the commit at the head of a specified branch.It does this by reading the content of the branch reference file, which stores the SHA-1 hash of the head commit.
If the branch doesn't exist, it returns
None
.
create_default_branch
Function:
create_default_branch(repo_path):
branch_name = "default"
branch_ref_path = os.path.join(repo_path, "refs", "heads", branch_name)
with open(branch_ref_path, "w") as ref_file:
ref_file.write("")
update_head_ref(repo_path, branch_name)
return branch_name
Input:
repo_path
: The path to the Git repository.
Output: The function returns the name of the default branch, which is "default."
Algorithm:
Set the default branch name as "default."
Determine the path to the branch reference file for the default branch.
Create an empty branch reference file for the default branch.
Update the HEAD reference to point to the default branch.
Return the name of the default branch, which is "default."
Mathematics and Intuition:
The
create_default_branch
function initializes a Git repository by creating a default branch named "default."It ensures that the HEAD reference points to the default branch, making it the initial branch of the repository.
create_branch
Function:
create_branch(args):
branch_name = args.branch_name
branch_ref_path = os.path.join(args.repo_path, "refs", "heads", branch_name)
if os.path.isfile(branch_ref_path):
print(f"Branch '{branch_name}' already exists.")
else:
with open(branch_ref_path, "w") as ref_file:
ref_file.write("")
print(f"Created branch '{branch_name}'.")
# Automatically switch to the newly created branch
switch_branch(args)
return branch_name
Input:
args
: A dictionary-like object containing the name of the branch to be created (branch_name
).
Output: The function returns the name of the newly created branch.
Algorithm:
Get the desired branch name from the provided
args
.Determine the path to the branch reference file for the new branch.
Check if the branch reference file already exists.
If the branch reference file exists, print a message indicating that the branch already exists.
If the branch reference file doesn't exist (new branch), create an empty branch reference file for it.
Print a message indicating the successful creation of the new branch.
Automatically switch to the newly created branch using the
switch_branch
function.Return the name of the newly created branch.
Mathematics and Intuition:
The
create_branch
function creates a new branch in the Git repository.It checks if the branch already exists and, if not, creates a new branch reference file for it.
Optionally, it can automatically switch to the newly created branch, ensuring that you start working on the new branch immediately.
switch_branch
Function:
switch_branch(args):
branch_name = args.branch_name
current_branch = get_head_commit(args.repo_path)
if not branch_exists(branch_name, args.repo_path):
print(f"Branch '{branch_name}' does not exist. Would you like to create it? (y/n)")
choice = input().strip()
if choice.lower() == 'y':
create_branch(args)
else:
print("Aborted branch switch.")
return current_branch
update_head_ref(args.repo_path, branch_name)
return branch_name
Input:
args
: A dictionary-like object containing the name of the branch to switch to (branch_name
).
Output: The function returns the name of the branch that was switched to.
Algorithm:
Get the desired branch name from the provided
args
.Get the SHA-1 hash of the commit at the head of the current branch using
get_head_commit
.Check if the desired branch exists using the
branch_exists
function.If the desired branch does not exist: a. Print a prompt asking if you want to create the branch. b. Read the user's choice ('y' for yes, 'n' for no). c. If the user chooses 'y' (yes), create the branch using the
create_branch
function. d. If the user chooses 'n' (no), print a message indicating that the branch switch was aborted and return the current branch.If the desired branch exists, update the HEAD reference to point to the desired branch.
Return the name of the branch that was switched to.
Mathematics and Intuition:
The
switch_branch
function allows you to switch to a different branch within the Git repository.It first checks if the desired branch exists by calling
branch_exists
.If the branch exists, it updates the HEAD reference to point to the desired branch, effectively switching branches.
If the branch doesn't exist, it offers to create the branch if the user chooses to do so.
branch_exists
Function:
branch_exists(branch_name, repo_path):
branch_ref_path = os.path.join(repo_path, "refs", "heads", branch_name)
return os.path.isfile(branch_ref_path)
Input:
branch_name
: The name of the branch to check for existence.repo_path
: The path to the Git repository.
Output: The function returns
True
if the branch exists, andFalse
otherwise.Algorithm:
Determine the path to the branch reference file for the specified branch.
Check if the branch reference file exists.
Return
True
if the file exists (branch exists), orFalse
if it doesn't.
Mathematics and Intuition:
The
branch_exists
function checks whether a specified branch exists within the Git repository.It does this by checking for the presence of the branch's reference file.
If the file exists, it returns
True
, indicating that the branch exists. Otherwise, it returnsFalse
.
log
Function:
log(args):
repo_path = args.repo_path # Get the repository path from args
branch_name = args.branch
if branch_name is None:
branches = [branch for branch in os.listdir(os.path.join(repo_path, "refs", "heads")) if os.path.isfile(os.path.join(repo_path, "refs", "heads", branch))]
if not branches:
print("No branches found.")
return
print("Available branches:")
for branch in branches:
print(f"- {branch}")
branch_choice = input("Enter the branch name to view commits (or 'default' for default): ").strip()
branch_name = branch_choice if branch_choice else "default"
if not branch_exists(branch_name, repo_path):
print(f"Branch '{branch_name}' does not exist.")
return
with open(os.path.join(repo_path, "refs", "heads", branch_name), "r") as ref_file:
latest_commit = ref_file.read().strip()
commit = latest_commit
while commit:
commit_path = os.path.join(repo_path, "objects", commit)
with open(commit_path, "r") as commit_file:
commit_data = commit_file.read()
print(f"Commit: {commit[:7]}")
lines = commit_data.split("\n")
for line in lines:
if line.startswith("author "):
print(line)
print(commit_data)
lines = commit_data.split("\n")
parent_commit = None
for line in lines:
if line.startswith("parent: "):
parent_commit = line.split(": ")[1]
break
if parent_commit:
print(f"Parent: {parent_commit}\n")
commit = parent_commit
else:
break
Input:
args
: A dictionary-like object containing information about the branch to log (branch
) and the repository path (repo_path
).
Output: The function displays the commit history for the specified branch or the default branch.
Algorithm:
Get the repository path and branch name from the provided
args
.If no branch is specified, list the available branches in the repository.
Prompt the user to enter the name of the branch they want to view commits for.
If the user doesn't specify a branch, default to "default."
Check if the specified branch exists using the
branch_exists
function.If the branch doesn't exist, print a message indicating that it doesn't exist.
Read the SHA-1 hash of the latest commit on the branch from the branch's reference file.
Start a loop to iterate through the commit history.
Read and display commit information, including author, commit message, and parent commit.
Update the
commit
variable to point to the parent commit and repeat the loop until there are no more parent commits.
Mathematics and Intuition:
The
log
function provides a commit history log for a specified branch (or the default branch if not specified).It retrieves commit information, including the author, commit message, and parent commit(s), and displays them in a structured format.
The function iterates through the commit history by following parent commits, if they exist.
add
Function:
add(args):
if args.filename == ".":
# Adds all files in the current directory and subdirectories
for root, dirs, files in os.walk(".", topdown=True):
for file_name in files:
file_path = os.path.join(root, file_name)
add_file(args.repo_path, file_path)
if not args.recursive:
break
else:
# Adds the specified file
add_file(args.repo_path, args.filename)
Input:
args
: A dictionary-like object containing information about the file to be added (filename
) and whether to add files recursively (recursive
).
Output: The function adds the specified file(s) to the Git repository's index.
Algorithm:
Check if the
filename
inargs
is set to "." (indicating adding all files in the current directory and subdirectories).If the filename is ".", iterate through the directory structure using
os.walk
:For each file encountered, call the
add_file
function to add it to the repository's index.If the
recursive
flag is not set, break the loop after processing the current directory.
If the filename is not ".", call the
add_file
function to add the specified file to the repository's index.
Mathematics and Intuition:
The
add
function is responsible for adding files to the Git repository's index, preparing them for the next commit.It can add either a single specified file or all files in the current directory and its subdirectories.
The function iterates through the directory structure to find and add files, and it respects the
recursive
flag to control the depth of file searching.
init
Function:
init(repo_path):
os.makedirs(os.path.join(repo_path, "objects"))
os.makedirs(os.path.join(repo_path, "refs", "heads"))
create_default_branch(repo_path)
tree_hash = create_tree()
commit_data = f"tree {tree_hash}\nauthor Anonymous\ncommitter Anonymous\n\nInitial commit\n"
initial_commit_hash = hash_object(commit_data.encode(), "commit")
update_head_ref(repo_path, "default", initial_commit_hash)
print("Initialized empty MyGit repository with an initial commit.")
Input:
repo_path
: The path where the Git repository will be initialized.
Output: The function initializes an empty Git repository and prints a message to confirm the initialization.
Algorithm:
Create the directory structure for the Git repository:
Create the "objects" directory to store Git objects.
Create the "refs/heads" directory to store branch references.
Create the default branch using the
create_default_branch
function.Calculate the SHA-1 hash of the tree object representing the initial state of the working directory using the
create_tree
function.Construct commit data for the initial commit, including the tree hash, author, committer, and commit message.
Calculate the SHA-1 hash of the commit data using the
hash_object
function.Update the HEAD reference to point to the default branch and the initial commit.
Print a confirmation message indicating the successful initialization of the repository.
Mathematics and Intuition:
The
init
function creates a new, empty Git repository and sets up the initial default branch.It calculates the tree hash and creates an initial commit that represents the initial state of the working directory.
The function initializes the HEAD reference to point to the default branch, making it the starting point for version control.
update_head_ref
Function:
update_head_ref(repo_path, branch_name, commit_hash=None):
head_ref_path = os.path.join(repo_path, "HEAD")
if commit_hash is not None:
with open(head_ref_path, "w") as head_file:
head_file.write(f"ref: refs/heads/{branch_name}\n")
branch_ref_path = os.path.join(repo_path, "refs", "heads", branch_name)
with open(branch_ref_path, "w") as branch_file:
branch_file.write(commit_hash)
else:
with open(head_ref_path, "w") as head_file:
head_file.write(f"ref: refs/heads/{branch_name}\n")
Input:
repo_path
: The path to the Git repository.branch_name
: The name of the branch to update the HEAD reference to.commit_hash
(optional): The SHA-1 hash of the commit to point the branch to. If not provided, it only updates the branch reference.
Output: The function updates the HEAD reference to point to the specified branch and commit (if provided).
Algorithm:
Determine the path to the HEAD reference file.
If a
commit_hash
is provided: a. Write a reference to the specified branch in the HEAD reference file. b. Determine the path to the branch reference file for the specified branch. c. Write thecommit_hash
to the branch reference file, updating the branch to point to the specified commit.If no
commit_hash
is provided:- Write a reference to the specified branch in the HEAD reference file, indicating that it's the current branch.
Mathematics and Intuition:
The
update_head_ref
function is responsible for updating the HEAD reference in a Git repository.It allows you to specify both the branch and commit to which the HEAD reference should point.
By doing so, it effectively changes the current state of the repository to the specified branch and commit.
code ends here buddy, below is just the theory of data structure I used.
Data Structure Used:
1. Tree Structure and TreeEntry
In the Git code, the tree structure (Tree
class) represents the hierarchical organization of files and directories within the repository. Mathematically, it can be defined as follows:
Let
T
represent a Git tree object.T
can be defined as an ordered set ofTreeEntry
objects:T = {entry_1, entry_2, ..., entry_n}
, where eachentry_i
corresponds to a file or directory.Each
TreeEntry
object has the following attributes:name_i
: The name of the file or directory (a string).mode_i
: The mode or permissions of the entry (a string, e.g., "100644" for files).hash_i
: The SHA-1 hash of the entry's content (a string).
Example:
Consider a simple directory structure within the repository:
codeRoot/
|-- File1.txt (mode: 100644, hash: abc123)
|-- Directory1/
| |-- File2.txt (mode: 100644, hash: def456)
|-- Directory2/
| |-- File3.txt (mode: 100644, hash: ghi789)
In this example, the Tree
object T
can be represented as:
codeT = {
TreeEntry(name="File1.txt", mode="100644", hash="abc123"),
TreeEntry(name="Directory1", mode="40000", hash="..."),
TreeEntry(name="Directory2", mode="40000", hash="..."),
}
Why This Choice?
Efficient Representation: The tree structure efficiently captures the hierarchical nature of files and directories within a repository. It organizes them into a structured format that is easy to traverse and manipulate.
Hashing: Each entry's hash ensures that changes in file content or structure result in a different tree hash, providing a means to track changes.
2. Hash Calculation (hash_object Function)
In the Git code, the hash_object
function calculates the SHA-1 hash of Git objects, including blobs (file content), trees (directory structure), and commits (version snapshots). Here's a more mathematical explanation:
Mathematical Explanation:
The SHA-1 hash function, denoted as
SHA1
, takes an input binary stringdata
and computes a 160-bit (20-byte) hash value.It can be expressed as
SHA1(data) = hash_value
, wherehash_value
is a 160-bit hexadecimal string.Mathematically, the function
SHA1
operates on binary sequences, and it produces a fixed-size output.
Example:
Let's say we have a blob object with binary data data_blob
:
codedata_blob = "This is some content."
Using the SHA-1 hash calculation, we can compute the hash:
codeSHA1(data_blob) = "2ef7bde608ce5404e97d5f042f95f89f1c61f0744f"
Why This Choice?
Cryptographic Security: SHA-1 is chosen for its cryptographic properties. It produces a unique hash value for each distinct input, making it extremely difficult to find two different inputs that produce the same hash (collision resistance).
Efficiency: SHA-1 provides a fixed-size hash output (160 bits), which is efficient for Git's purposes. It balances the need for uniqueness and efficiency in tracking changes.
Commit Data Structure
In the Git code, a commit object represents a version snapshot of the repository. Here's a mathematical view:
Mathematical Explanation:
A Git commit can be represented as a tuple of attributes:
Commit = (tree_hash, author, committer, message, parent_commit)
.Each attribute can be expressed mathematically:
tree_hash
: The SHA-1 hash of the corresponding tree object (a string).author
: Author information (a string).committer
: Committer information (a string).message
: Commit message (a string).parent_commit
: An optional reference to the parent commit (a string or null for the initial commit).
Example:
Consider a simple commit:
codeTree Hash: abc123
Author: John Doe
Committer: John Doe
Message: Initial commit
Parent Commit: None
Mathematically, this commit can be represented as:
codeCommit = ("abc123", "John Doe", "John Doe", "Initial commit", None)
Why This Choice?
Structured Data: Using a structured tuple to represent commit data allows Git to organize and store essential information in a consistent format.
Parent Commit Reference: Including a reference to the parent commit allows Git to build a directed acyclic graph (DAG) of commits, representing the entire history of the repository.
4. Branching and HEAD Reference
Mathematical Explanation:
A Git branch can be represented as a set of commits. Let
B
represent a branch, andCommits(B)
represent the set of commits in that branch.The HEAD reference (
HEAD
) can be thought of as a pointer to a branch. It signifies the currently checked-out branch. In mathematical terms, it can be represented asHEAD -> B
, whereB
is the active branch.Switching branches involves updating the HEAD reference, effectively changing its target branch:
HEAD -> B'
, whereB'
is the new branch.
Example:
Consider two branches, master
and feature
, with their respective commit histories:
codeMaster Branch: A -- B -- C
Feature Branch: \
D -- E
If the feature
branch is checked out, the HEAD reference can be represented as:
codeHEAD -> Feature
Why This Choice?
Branch as a Set of Commits: Representing a branch as a set of commits is mathematically elegant and aligns with Git's internal data structure, which is essentially a directed acyclic graph (DAG) of commits.
HEAD as a Pointer: The concept of HEAD as a pointer to a branch makes it easy to switch between branches and denote the currently active branch.
Traversal of File System (os.walk)
In your Git code, the os.walk
function is used to traverse the directory structure of the working directory. Here's a mathematical perspective on directory traversal:
Mathematical Explanation:
Directory traversal can be viewed as a directed graph traversal, where directories and files are nodes, and containment relationships are edges.
Let
G
be a directed graph representing the file system hierarchy, where nodes represent directories or files, and edges represent parent-child relationships.Traversal involves visiting nodes in a specific order, similar to graph traversal algorithms like depth-first search (DFS) or breadth-first search (BFS).
Example:
Consider a directory structure:
codeRoot/
|-- File1.txt
|-- Directory1/
| |-- File2.txt
|-- Directory2/
| |-- File3.txt
In graph terms, this structure can be represented as:
codeG = {
Nodes: {Root, File1.txt, Directory1, File2.txt, Directory2, File3.txt},
Edges: {(Root, File1.txt), (Root, Directory1), (Directory1, File2.txt), (Root, Directory2), (Directory2, File3.txt)}
}
Why This Choice?
Graph Representation: Representing the file system as a graph allows for efficient traversal and exploration of directory structures.
Graph Algorithms: Concepts from graph theory, like DFS and BFS, provide well-defined strategies for navigating and manipulating the file system.
Functions I used over
## `init()`
- **Purpose:** Initializes a new Git repository in the specified directory.
- **Function Calls:**
- `os.makedirs()`: Creates necessary directories for the repository.
- `hash_object()`: Creates the initial tree and commit objects.
- `create_tree()`: Generates the initial tree structure.
- `commit()`: Creates the initial commit.
## `create_tree()`
- **Purpose:** Creates a tree object that represents the current state of the working directory.
- **Function Calls:**
- `Tree()`: Initializes a new tree object.
- `add_file()`: Adds files and directories to the tree.
- `hash_object()`: Creates a tree object by hashing the serialized tree data.
## `add(filename, recursive=False)`
- **Purpose:** Stages changes made to files or directories for the next commit.
- **Function Calls:**
- `add_file()`: Stages a specific file.
- `create_tree()`: Updates the tree structure.
## `add_file(file_path)`
- **Purpose:** Stages a specific file for the next commit.
- **Function Calls:**
- `hash_object()`: Hashes the file's content.
- Updates the staging area.
## `commit(message, author="Anonymous", current_branch="master")`
- **Purpose:** Creates a new commit with the staged changes.
- **Function Calls:**
- `create_tree()`: Creates the tree object representing the current state.
- `get_head_commit()`: Retrieves the current branch's latest commit (if it exists).
- `hash_object()`: Creates a new commit object.
- Updates the branch reference.
## `get_head_commit(branch="master")`
- **Purpose:** Retrieves the latest commit hash for the specified branch.
- **Function Calls:**
- Reads the branch reference file.
## `create_branch(branch_name)`
- **Purpose:** Creates a new branch with the given name.
- **Function Calls:**
- Checks if the branch already exists.
- Creates a new branch reference.
## `switch_branch(branch_name, current_branch)`
- **Purpose:** Switches to the specified branch.
- **Function Calls:**
- Checks if the branch exists.
## `branch_exists(branch_name)`
- **Purpose:** Checks if the specified branch exists.
- **Function Calls:**
- Checks if the branch reference file exists.
## `log(current_branch="master")`
- **Purpose:** Displays the commit history for the current branch or the specified branch.
- **Function Calls:**
- Reads commit objects and displays commit details.
Working Of Head:
Branch References:
In MyGit, branch references are used to keep track of the latest commit in each branch. These branch references are files stored in the
refs/heads
directory. Each file corresponds to a branch and contains the SHA-1 hash of the latest commit in that branch.For example, if you have a branch called "master," there will be a file named
refs/heads/master
, and its content will be the SHA-1 hash of the latest commit on the "master" branch.HEAD Reference:
The HEAD reference points to the currently checked-out branch. In MyGit, this is usually stored in the
HEAD
file in the main directory of your repository.When you create a new branch (e.g., "feature-branch"), the HEAD reference is updated to point to this new branch. This means you are now on the "feature-branch."
When you make a new commit on the current branch, the branch reference for that branch is updated to point to the new commit's hash. The HEAD reference is updated to indicate that you are still on the same branch.
When you switch branches using commands like
switch_branch
, the HEAD reference is updated to point to the new branch, and the working directory is updated to match the files in that branch.When you create a new commit, the branch reference for the current branch is updated to point to the new commit's hash, and the HEAD reference is updated to indicate that you are still on the same branch.
When you use commands like
log
, they read the HEAD reference to determine which branch you are currently on, and then they use the branch reference to retrieve the commit history for that branch.