roadrunnertwice | Why Git is hard

Julia Evans’ StrangeLoop 2023 keynote was about digging into the different reasons a tool can be hard to learn, and it was a real good talk! At the end, as a tossed-off addendum to a conclusion about continuing to learn things, she said “I still don’t know why Git is hard.”

I happen to have thoughts about that one!

I have had to teach a fair number of (generally clever and persistent) people how to get around in Git or how to use its more advanced features. While doing so, I have often failed to get the basics to stick, which is incredibly aggravating to someone who prides themselves on explaining things. Sometimes this devolves into me talking about wave/particle duality as a crucial metaphor for getting through a rebase intact, and everyone in the room looking at me like my second head just tried to convert them to Gnosticism.

So, I’ve spent some time thinking about this before this weekend.

It’s not the bad CLI

There's an obvious superficial problem, which is that the Git CLI design sucks ass. But it's not a sufficient explanation, because we’ve all seen people cope with worse. (Or... at least, very nearly just as bad.)

As long as you have a detailed understanding of what you want to accomplish, it should be possible to work around it with memorization and external references. It's a drag, but there's a known path to that kind of learning. That's not what we see happen with Git learners.

It’s the alien mental models

I’ve come to a tentative two-and-a-half-part conclusion:

It is impossible to safely use Git unless you have a reliable mental model and metaphor kit for Git’s core concepts — the objects it knows about and the actions it can take on them.
Git’s core concepts are aggressively incompatible with the way most people view the world. Thus, users tend to adopt more comfortable (but non-viable) metaphors, which lead to confused and unreliable predictions.
- (Also, Git's CLI aggressively obscures those core concepts, so any learner is in an uphill battle even if their mind does happen to work the right way.)

The real problem I end up seeing is that novice users are unable to formulate a valid plan for what they want to accomplish. Try pressing a learner to describe what they’re doing, sometime! Often what you get back isn’t just incomplete or naïve, it’s completely garbled.

Here’s a few core facts about Git that tend to overwhelm the imagination and beggar comprehension:

A commit is its entire worldline
Commit content is both a snapshot and a patch
Remotes have fuzzy identities and time lag
Branches aren't quite branches, they're more like little bookmark go-karts
Merge conflicts are actually just difficult
git pull exists

A commit is its entire worldline

Commits are unique and immutable, and are anchored to a specific point in the graph of history and causality. This means that a commit’s identity is made up of both its content and its context. (If a commit has the same content but a different parent, it’s NOT the same commit.) This in turn means that you need to be comfortable and fluent in a branching many-worlds cosmology, so you can distinguish between changes and snapshots that have the same intent and content but which are completely non-interchangeable and imply entirely different flows of historical events.

This turns out to not be a common proficiency, and unfortunately I think this is the model that causes the most havoc if you don’t grasp it.

A more natural way of thinking seems to be that content alone constitutes identity. This belief will get you in a ton of trouble.

Commit content is both a snapshot and a patch

Git commits are stored on disk as snapshots, but that's not necessarily how Git treats them! Various Git commands treat a commit as either a complete repo snapshot or as a change set, depending on what you’re doing. (The patch version of a commit isn’t stored anywhere persistent, but is instead derived at need by comparing against its parent(s).)

For example, commands like cherry-pick or revert treat commits like patches. show treats commits as both, depending on its arguments!

Novice Git users tend to have a hard time understanding whether they’re trying to treat a commit as a snapshot or a patch, which adds some extra difficulty to formulating reliable plans.

Remotes have fuzzy identities and time lag

Ooh boy. So, you’ve got:

Your local repo
The “real” version of the upstream remote, located across the network on another computer
Your locally cached copy of the upstream remote’s branches, from the last time you fetched
A branch in your local repo that is annotated as tracking a branch in the upstream remote
More instances of the last three, because you probably have multiple remotes and many tracking branches

Novice Git users are bad at telling these things apart! The CLI does a terrible job of disambiguating, and also the relationships and interactions are just complex. Like, consider:

Your local clone is also a remote.
Branch names are wholly independent across remotes. (The natural mental model is that names denote global identity. Nope! Your main has no special tie to either upstream main or your fork’s main, it’s all just a social construct.)
…but commits have a single stable global identity! And tags are, in practice, somewhere in between.
By the way, what does origin mean? There's nothing special about it, it's just the default name that git clone creates, but is it the upstream repo, your GitHub fork of it, or something else? Renaming remotes is taught super late, which means you get a lot of opportunities to get real confused about where you're pushing something. (FWIW I advise always renaming origin so any remote is either upstream or some_name, but what do I know.)
“Tracking branches” aren’t actually special, they’re just a way to sign up for some CLI conveniences. You can push or merge from anything to anything.
The forward slash is sometimes a special namespace separator to denote a cached remote branch, and sometimes not.
...but when pushing a branch to a remote, never use a slash, use a space instead. (Because, you're acting on the real remote, not the cache? Or something?)
When reporting the state of a tracking branch, Git uses the cached remote content, but that’s likely out of date, and your push can get rejected based on the real remote state.

Branches aren't quite branches, they're more like little bookmark go-karts

I don't have a fully developed explanation for this, but basically: the branching directed acyclic graph structure is inherent to a given history of immutable commits, and a Git "branch" is really just a piece of metadata that can help you navigate that structure more conveniently:

Your set of branches are like bookmarks you can quickly switch between.
The current branch moves forward with every new commit you make, so it's easier to track your work.
But, you can pick up a branch and set it down at any arbitrary point on the graph (git reset).
And branches don't mutually exclude each other, they don't "own" the branch of history that they happen to be placed on.

So, they're little mobile bookmarks. If you picture them as something more, you'll eventually run into something real confusing about how they act.

Merge conflicts are actually just difficult

This one isn’t really a conceptual problem; people tend to understand the basic scenario of “we made conflicting edits.” It’s just that it’s really mechanically difficult to parse and resolve them. "What would I have changed if I'd seen your change first" is already a fairly complex question, and that's before the fact that only some of your changes conflicted and most available interfaces for conceptualizing the problems are pretty dizzying and burn through your available working memory pretty much instantly.

`git pull` exists

Sorry, maybe this one is just me, but it absolutely boggles my poor stupid mind that the first thing taught to new users is to rely on a command that translates to "immediately mutate my local branch to incorporate upstream edits that, by definition, I have not had a chance to review, using an arbitrary resolution behavior governed by implicit config that may or may not fracture the branch's existing causal history."

My favorite version of this is when the novice has followed someone's dodgy advice to set pull.rebase = true, then they pull a shared branch that they're collaborating on, into which their coworker has just merged origin/main. Instant Sorcerer's Apprentice-scale chaos!

"Pull" presents the illusion that you can just ask Git to make everything okay for you so that you're allowed to push again, without having to understand what you're causing to happen, which, as I mentioned at the top, is literally the opposite of how Git operates. It's an incredibly harmful illusion! Can everyone please start teaching novices the basic "fetch, observe, then consciously choose whether to fast-forward / do a merge commit / rebase / reset" workflow instead, thanks.

So what's to be done?

idk.

Well, no, ok, I have a very strong suspicion: when teaching a new git user, you must keep hammering on the core concepts. Keep pushing the learner to do the dreary work of explaining how they're conceptualizing their tasks, and clarifying their mental models until they can make valid predictions about how Git will respond to various commands. Keep trying different alternate explanations and metaphor chains until you find ones that work for this particular mind. And, accept that you might not ever be the one who finally gets through to them.

I have not yet found any reusable universal explanations for these concepts. There's some fun visualizations you can do with tinker toys to explain DAGs, and that helps a bit, but it doesn't do shit for "code content alone doesn't sufficiently denote identity".

There's some tools that can help users leverage their partial understandings of these concepts to make increasingly useful predictions and evaluations. I especially love exposing the history graph, both via external apps like gitx/gitk and via my beloved log --date=format:'%a %b %e, %Y' --pretty=format:'%C(yellow)%h%C(reset) %s %C(cyan)%cd%C(reset) %C(blue)%an%C(reset) %C(green)%d%C(reset)' --graph command (alias it to lg in your .gitconfig, and use the -S command in your pager if the line wrapping is vandalizing it). But without at least a partially-developed mental model, the graph is just another dizzyingly chaotic input.

Anyway, that's (at least part of) why learning Git is hard.

Flat | Top-Level Comments Only

I have been using git professionally for over ten years and I have never, never successfully used cherry-pick. I think I've tried twice? And both times I've just ended up deleting my local repo and cloning again. I'm not 100% sure what it does, but I know damn well I want no part of it.

Cherry-pick means “treat this arbitrary commit as though it were a set of edits, and try to perform those same edits on the current code (which presumably has never had those edits applied before).”

So, if the commit added a new function and changed another function to call it, cherry-picking would make a new commit on your current branch to add that new function and call it.

It’s a speculative history re-write: “what if I had made those edits here and now, instead of on another branch somewhere?”

If those edits wouldn’t have made sense on the current branch (like, the file you added the function to is gone), then you get a conflict and probably want to abort, since you’re probably outside the set of situations where cherry-picking would save you any time or effort.

Yeah.

BTW have you shown Julia this post yet?

Yes — I think she liked it! I let her boost it and then my masto notifications went a lil berserk. 😅 It’s being received pretty well in general, though I think a lot of readers are enduringly mad about git and probably don’t know that I genuinely love it and just want it to be better and less confusing while retaining its strengths. 😮‍💨

Isn't "git, but less confusing" just mercurial? :)

I had just started using svn for 6 months when I started using git.
Let's say I was lucky that git has the same logic as I.
For me, doing something in git is (relatively) easy, as long as you know what is your current state and what final state you want to get.
I still have problems with some behaviours that are illogical to me, like submodules.

Back to mercurial.
At work, I had to migrate from using git to hg around 2012–2013.
hg was so much confusing to me.
It was always in the way of how I used version control and I had to add a plugin or something to regain some control on my local repo, to edit commits and local history as I see fit.
I still don't understand what is gained to hardlink a commit to a branch.
Why can a branch have more than one tip?
It was very enabling for my collegue that never did a merge an just push his work as a new tip.
You can't be that lazy on git. Or you can try and others will easily see that you "forget" to merge your working branch (or push -f 🤬).

Let's just say I went happily back to git after that and let time not using hg erase what I had to learn of it.

I've been using Git professionally for over 8 years. It is one of my favorite tools precisely because it is so powerful, but also fast and lightweight compared to alternatives. It is also very intuitive once you understand the fundamentals, but I do agree: the CLI design sucks ass. Still, it is the "wrench" of the software developer. Would you trust a mechanic with your car if they can't use a wrench?

Why Git is hard

It’s not the bad CLI

It’s the alien mental models

A commit is its entire worldline

Commit content is both a snapshot and a patch

Remotes have fuzzy identities and time lag

Branches aren't quite branches, they're more like little bookmark go-karts

Merge conflicts are actually just difficult

`git pull` exists

So what's to be done?

no subject

no subject

Yeah!!

Re: Yeah!!

Re: Yeah!!

Re: Yeah!!

Nice article!

Why Git is hard

It’s not the bad CLI

It’s the alien mental models

A commit is its entire worldline

Commit content is both a snapshot and a patch

Remotes have fuzzy identities and time lag

Branches aren't quite branches, they're more like little bookmark go-karts

Merge conflicts are actually just difficult

git pull exists

So what's to be done?

no subject

no subject

Yeah!!

Re: Yeah!!

Re: Yeah!!

Re: Yeah!!

Nice article!

`git pull` exists