# Git & Version Control ## 1. Core Architecture ``` Working Directory --> Staging Area --> Local Repo --> Remote Repo (your files) (index) (.git) (GitHub) | | | | git add git commit git push git pull git restore git restore git fetch git clone --staged ``` | Zone | Description | Command to Exit | | :--- | :--- | :--- | | Working Directory | Files you are editing, untracked changes | `git add ` | | Staging Area | Files marked for the next commit | `git commit -m "msg"` | | Local Repo | Committed snapshots in `.git` folder | `git push` | | Remote Repo | GitHub/GitLab cloud copy | `git pull` or `git fetch` | --- ## 2. Git vs GitHub | Tool | What it is | | :--- | :--- | | **Git** | Version control software (runs locally) | | **GitHub** | Cloud hosting for Git repos + collaboration | | **GitLab** | Alternative to GitHub (self-hostable) | | **Bitbucket** | Another alternative by Atlassian | --- ## 3. Daily Workflow Commands ```bash # Check state git status # what changed? git diff # exact content diff of unstaged changes ✅ git diff --staged # diff of staged changes vs last commit git log --oneline --graph # visual commit history # Stage and commit git add file.py # stage specific file git add . # stage ALL changes ✅ git add -p # stage interactively (chunk by chunk) git commit -m "message" # save snapshot ✅ git commit -am "message" # stage tracked files + commit in one step git commit --amend # edit last commit message # Upload git push # push to tracked branch git push origin main # push to specific branch git push -u origin feature # push + set upstream tracking ✅ git push --force-with-lease # safe force push (fails if others pushed) ✅ # Download git pull origin main # fetch + merge remote changes ✅ git fetch origin # download only, no merge (safe) ✅ git fetch --force # overwrite local remote-tracking refs ``` ### pull vs fetch (Exam Critical) | Command | What it does | | :--- | :--- | | `git fetch` | Downloads changes from remote. Does NOT modify your working files. Safe. | | `git pull` | `git fetch` + `git merge`. Modifies your working files. | --- ## 4. `.gitignore` Reference ```gitignore # Python virtual envs and caches venv/ .venv/ __pycache__/ *.pyc # Secrets (most important!) ✅ .env .env.* secrets.json credentials.json # Large data files data/raw/ *.csv *.parquet *.pkl # IDE .vscode/ .idea/ # OS .DS_Store # Mac Thumbs.db # Windows # Jupyter .ipynb_checkpoints/ ``` - `.gitignore` is committed to the repo so all team members ignore the same files ✅ - ❌ NOT for tracking contributors - ❌ NOT for defining repository settings (that is `.git/config`) --- ## 5. Branching ```bash # Create git branch feature-analysis # create branch git checkout -b feature-analysis # create + switch ✅ (most common) git switch -c feature-analysis # newer syntax (Git 2.23+) # Switch git checkout main # switch to main git switch main # newer syntax # List git branch # local branches git branch -a # all branches (local + remote) git branch -r # remote branches only # Delete git branch -d feature # safe delete (must be merged) git branch -D feature # force delete git push origin --delete feature # delete remote branch ``` ### Branch Naming Conventions ``` feature/data-cleaning new features bugfix/fix-null-handling bug fixes hotfix/critical-api-fix urgent production fixes release/v1.2.0 release preparation docs/update-readme documentation updates ``` --- ## 6. Undoing Changes | Command | Effect | Destructive? | | :--- | :--- | :--- | | `git restore .` | Discard all unstaged changes ✅ | Yes (local only) | | `git checkout -- .` | Same as above (older syntax) ✅ | Yes (local only) | | `git restore --staged file` | Unstage a file (undo `git add`) | No | | `git reset --soft HEAD~1` | Undo last commit, keep staged | No | | `git reset --mixed HEAD~1` | Undo last commit, unstage changes (default) | No | | `git reset --hard HEAD~1` | Undo last commit, discard all changes ✅ | Yes | | `git revert HEAD` | New commit that undoes previous (safe for shared branches) | No | | `git stash` | Temporarily save current changes | No | | `git stash pop` | Apply most recent stash + delete it | No | | `git stash apply` | Apply most recent stash (keep it) | No | | `git clean -fd` | Remove all untracked files and dirs | Yes | ### Recovery: Reflog ```bash git reflog # see ALL past HEAD positions, even after reset --hard git checkout # recover any lost commit from reflog ``` --- ## 7. Merging & Conflict Resolution ```bash # Merge feature into main git checkout main git merge feature-analysis # merge git merge --no-ff feature # always create merge commit (no fast-forward) git merge --abort # abort if conflict is too complex ``` ### Merge Conflict Markers ``` <<<<<<< HEAD (your branch) df.dropna(subset=['name']) ======= df.dropna(subset=['email']) >>>>>>> feature-branch ``` **Resolution steps:** 1. Open conflicted file 2. Choose which change to keep (or combine both) 3. Remove all conflict markers (`<<<<`, `====`, `>>>>`) 4. `git add file.py` 5. `git commit` - ✅ Merge conflicts MUST be resolved manually - ✅ Conflicts occur when two people edit the SAME LINE on different branches --- ## 8. History Inspection ```bash git log # full commit history git log --oneline # compact one-line per commit git log --oneline --graph --all # visual branch graph ✅ git log -n 5 # last 5 commits git log --author="John" # commits by specific author git log --since="2024-01-01" # commits after date git log -- file.py # commits touching specific file git log -p file.py # commits + their full diffs for a file git log -S "secret_key" # find commits that added/removed a string ✅ git show abc123 # show specific commit content git show HEAD~1 # show second-to-last commit git blame file.py # who changed each line, when ``` --- ## 9. Hotfix Workflow (Exam Critical) ```bash # Step 1: Branch from production git checkout main git checkout -b hotfix/critical-bug # Step 2: Fix and commit git add . git commit -m "Fix critical routing bug" # Step 3: Push and create PR for expedited review git push origin hotfix/critical-bug # Step 4: Merge into BOTH main AND develop ✅ (exam answer) git checkout main git merge hotfix/critical-bug git tag -a v1.0.1 -m "Hotfix" git push origin main --tags git checkout develop git merge hotfix/critical-bug git push origin develop # Step 5: Delete hotfix branch git branch -d hotfix/critical-bug git push origin --delete hotfix/critical-bug ``` - ✅ Merge hotfix into BOTH main AND develop - ❌ Apply fix directly to main without review - ❌ Delay to next regular release cycle --- ## 10. Removing Files / Secrets from History ### Remove file from last commit (not yet pushed) ```bash git rm --cached secret.env # remove from index, keep file locally git commit --amend # amend the last commit ``` ### Find and remove file from ALL history (after push) ```bash # Option 1: filter-branch (older) git filter-branch --force --index-filter \ "git rm --cached --ignore-unmatch credentials.txt" \ --prune-empty --tag-name-filter cat -- --all # Option 2: filter-repo (modern, preferred) ✅ git filter-repo --path .env --invert-paths # Force push all branches and tags git push origin --force --all git push origin --force --tags ``` - ✅ After scrubbing history, ROTATE THE API KEYS IMMEDIATELY - ❌ Deleting from latest commit does NOT remove from Git history ### Find a specific file in history ```bash git log --all -- path/to/deleted_file.txt # find commit that had it git show :path/to/file.txt # view file at that commit git checkout -- path/to/file.txt # restore it ``` --- ## 11. GitFlow Branch Strategy ``` main (production-ready, always deployable) | +-- hotfix/emergency-fix → merge back to main + develop | develop (integration branch) | +-- feature/data-cleaning → PR → merge back to develop +-- feature/api-integration | +-- release/v1.2.0 → merge to main + develop, then tag ``` ### Simple GitHub Flow (simpler alternative) ``` main (always deployable) +-- feature/x → PR → main +-- bugfix/y → PR → main +-- hotfix/z → PR → main (expedited) ``` --- ## 12. Pull Requests (PR) ``` 1. Create feature branch 2. Make changes + commits 3. Push branch to remote 4. Create PR on GitHub (base: main, compare: feature) 5. Request reviewers 6. Reviewers comment, author makes changes, push more commits 7. Reviewer approves 8. Merge PR into main 9. Delete feature branch ``` - ✅ PRs allow code review + approval before merging - ❌ PRs are NOT faster than direct merging - ❌ PRs do NOT automatically fix bugs --- ## 13. Tags ```bash git tag -a v1.0.0 -m "Release 1.0.0" # annotated tag ✅ git tag v1.0.0 # lightweight tag git push origin --tags # push all tags git tag # list all tags git checkout v1.0.0 # checkout a tag (detached HEAD) ``` --- ## 14. Remote Management ```bash git remote -v # show remote URLs git remote add origin # add remote git remote add upstream # add upstream (for forks) git remote set-url origin # change remote URL # Syncing a fork git fetch upstream git merge upstream/main ``` --- ## 15. Exam Scenario Quick Answers | Scenario | Command | | :--- | :--- | | Get latest team updates | `git pull` ✅ | | See exact content that changed | `git diff` ✅ | | Save and upload work | `git add . -> git commit -> git push` ✅ | | Create and switch to branch | `git checkout -b feature-name` ✅ | | Discard all local changes | `git restore .` or `git checkout -- .` ✅ | | Files Git should not track | `.gitignore` ✅ | | Collaborative review before merge | Pull Request (PR) ✅ | | Critical production bug fix | Hotfix branch -> merge to main + develop ✅ | | Two people edit same line | Merge conflict -> manual resolution ✅ | | Download without merging | `git fetch` ✅ | | Find commit that deleted a file | `git log --all -- path/to/file` ✅ | | Remove secret from all history | `git filter-repo --path secret --invert-paths` ✅ | | Safely undo last commit | `git reset --soft HEAD~1` ✅ | | Recover after `reset --hard` | `git reflog` -> `git checkout ` ✅ |