Letters From A Maintainer
This blog post is a series of letters from me, an OSS maintainer, to anyone contributing code to OSS projects. My goal is condense haphazard advice into a single document and help contributors understand a maintainer's perspective on OSS.
For some context, I've been an OSS maintainer of go-ipfs and go-libp2p for over 3 years now, along with some personal projects including tempfile (rust), and horrorshow (rust).
These letters mostly draw on my experiences maintaining go-ipfs and go-libp2p. While they should be generally applicable, they're most applicable to systems programming in go.
Contributing
If every stumbled on a new project and wanted to help out but didn't know where to start? This is for you.
Do: Triage issues, new and old
Triaging issues, new and old, is time consuming. Any amount of time a contributor can commit to this is helpful as the task often falls to the project's maintainers:
- Reproducing old issues to trim down the backlog.
- Asking clarifying questions on new issues.
- Finding and calling out duplicates.
Do: Help on the forums (and IRC)
You don't need a deep understanding of a project to start answering questions on a project's forums. Most questions can be answered with a bit of searching. The difference between you and the person asking the question is that you have confidence in your ability to find the answers yourself.
Additionally, that bit of searching will teach you about the project and open up additional avenues for contributing to the project.
Do: Add Documentation/Examples
Again, this is a great way to start learning about a project as it forces you to learn how it works.
Do: Add Tests
Nobody likes writing tests. Unfortunately, this means that tests are one of the things many projects need most. If you want to start coding on a new project immediately and want to better understand how its internals work, one of the easiest ways is to start writing new tests.
- Look through the code for some example tests.
- Look through the issue tracker for issues labeled "needs test".
- Use code coverage analysis tools to find areas of the code that are poorly tested.
- Open an issue tracking which areas you plan on testing. This gives the maintainer a chance to give you feedback and suggestions before you start actually coding.
Do: Fix Bugs
Go through bugs marked "good first issue" and/or "help wanted" and start fixing them. Unless the project is rigorous about project management, it's probably a good idea to leave a quick comment asking if you should fix the bug just in case the labels are out of date.
Honestly, many issues in OSS projects are small and just need someone to spend a few hours debugging them. However, these kinds of issues tend to pile up into an mountain that nobody wants to climb.
Don't: Airdrop (large) Patches
I know you really want that amazing feature. However, that amazing feature may not fit in with the rest of the project and may conflict other in-flight endeavors. Read Writing Code For Review before submitting patches.
Don't: Spam Ideas
We're all guilty of this. We see a shiny new project and have so many ideas about how it could be improved! Concrete feature requests with concrete designs are awesome! However, if you're new to a project, that's probably not the best place to start.
A deluge of "I think this project should do X" requests are quite frustrating for maintainers. Unfortunately, your idea probably isn't new to the maintainer. By raising it in an issue, the maintainer now needs to address the issue. Addressing and managing expectations (i.e., "we don't have time for that right now") around half-baked ideas is really time consuming.
If you've ever written up a long, revolutionary issue on how a project could improve by 10x and found yourself completely ignored, this is probably why.
If you have an idea, write up a short post on the project's forum (if it exists). This makes it clear that the idea is for discussion and that there are no expectation that any maintainers and/or devs will implement and/or respond any time soon.
Note: This advice really depends on the project size and activity. For small projects with little activity, a thoughtful feature request here and there can very be welcome.
Don't: Hijack
Please be very careful not to hijack a discussion. If you think you have a similar idea/bug as an existing issue but aren't sure, just open a new issue. Unfortunately, popular OSS collaboration tools like GitHub don't support threaded conversations. This means any off topic discussion will stick out as a large blob of confusion in the middle of a long technical discussion.
Code For Humans
While there are many techniques for structuring readable code, the three I find most useful when reviewing are:
- Modularity/Composability
- Isolation
- Good Abstractions
The best way to learn how to write good code is to read code. Read lots of code, both good and bad. Writing code is not a substitute.
Note: This post is not a substitute for reading a good book on software architecture or taking a class. Please take the time to do so.
Modularity/Composability
When a system is built of smaller components, the reader can independently understand each piece in isolation and how they fit together instead of having to understand the entire system all at once.
This sounds obvious because it is. The non-obvious part is how to put this into practice. The main take away is: modular designs are worth the effort; if you find yourself working on a large, messy service, try breaking it into smaller pieces.
On the other hand, modularity can make things more complicated by introducing indirection. Be careful: If you have co-dependent services (services that know about and/or call into each other frequently), consider merging them.
The best way to avoid this is to clearly define a component's responsibility up-front and define the interfaces between components as early as possible (ideally in a design document before you even begin coding).
Isolation
Independent subsystems should behave and/or fail independently. For example, if two subsystems use a shared service, and one of these subsystems cancels a request, this cancellation shouldn't affect the other subsystem.
This rule is never up for debate as interference between seemingly isolated services is extremely difficult to debug. Always decide and document up-front whether a service/component is shared or owned.
Good Abstractions
Abstractions/APIs are the UX of programming. The actual behavior of an abstraction/API must conform to the user's expectation of that abstraction/API. Unfortunately, this is mostly subjective.
Advice:
- Read tons of code. The more code you read, the more abstractions/idioms you'll see and the better code you'll write.
- Use your code. This is the fastest way to find pain-points and affordance mismatches.
- Take advice. If an experienced user thinks something is funky with your abstraction, pay attention. They're your user.
- Explain your abstractions to a rubber duck.
- If you have to break your abstraction to access internals, it's probably wrong or at least sub-optimal.
Examples:
- A close method should close and cleanup all resources.
- A close method should generally be idempotent (depends on the language) simply because it usually is (and is therefore expected).
- In go, if a function returns a value and an error, the caller should be able to bubble the error up the chain and walk away. Specifically, the caller should not have to do anything with the value (e.g., close it).
- If your code looks like it follows a well-known abstraction (e.g., one of the Go interfaces, etc.), it should actually conform to that abstraction. Familiarize yourself with your programming language's standard & common libraries.
Breaking Changes
Breaking changes are changes that require downstream projects to react to changes made upstream. From least to most severe, IPFS and libp2p have the following interfaces that can "break":
- Compile-time APIs
- Datastores/Databases
- Network Protocols
- Run-time APIs
- Data Formats
Compile-time APIs
The go-ipfs Go API.
While annoying, compile-time API changes are the easiest to deal with as downstream projects usually either lock (e.g., with go modules) or vendor dependencies.
However,
- Please don't make breaking API changes for aesthetic reasons (unless you're already breaking the API for other reasons). These kind of changes are extremely frustrating to downstream users.
- Explicitly call-out all breaking changes when submitting a patch. These changes need to receive extra-careful attention, need to be called out in release notes, and may need additional review from stakeholders.
Database/Datastore
Database schemas.
Changes to datastores/databases can be handled through an automatic migrations. However, while they're supposed to be transparent, such migrations often take time and can fail catastrophically if not implemented correctly. Either avoid these kinds of changes or test your migration thoroughly.
Network Protocols
Bitswap, Yamux, etc.
Network-protocols can evolve over time as long as a deprecation period is provided. However, this deprecation period gets longer and longer the more mature a system becomes.
The process usually looks like:
- Add a new feature.
- Wait some time (adoption period).
- Enable the new feature by default and deprecate the old feature.
- Wait a long time (deprecation period).
- Remove the old feature.
Luckily, IPFS and libp2p are not yet mature enough for this to be a major issue. Users tend to update regularly so network-protocol deprecation periods can be as short as a few months.
Run-time APIs
The IPFS HTTP API or the Linux kernel syscall API.
Changes to run-time APIs are problematic because, unlike compile-time APIs, the end-user is responsible for dealing with incompatible run-time APIs. That is, the developer deals with compile-time API changes when compiling/distributing their app. The end-user deals with run-time API changes when they update their IPFS daemon or their Linux kernel and their app breaks.
Worse, downstream apps are often unmaintained. For compile-time APIs, this isn't really an issue: different apps can be compiled against different versions of the same dependency (usually). For run-time APIs, this is a significant issue as one can usually only have one copy of the runtime dependency.
This is what Linus is talking about when he says "we don't break userspace". The Linux kernel breaks compile-time APIs all the time and expects module maintainers to update their code. However, ancient, unmaintained applications should continue to "just work".
Data Formats
IPLD Formats, UnixFS, etc.
Removing support for old data/file formats is difficult to impossible. When data is structured and managed by some database, one can use migrations to migrate from one format to another. However, this doesn't apply to the general case.
- Users will backup files and then try to read them years later. They should be able to do so.
- Data format changes are difficult to reconcile with content addressing (used everywhere in IPFS).
Really, it's the second point here that makes removing support for old data formats (nearly) impossible. Re-formatting data changes the content address (at least in IPLD) which would break links to the data. Basically:
- IPFS doesn't break links.
- IPFS can't force-migrate data.
- IPFS can't remove support for old data formats.
Parallel Programming & Locks
Concurrent programming is easy(ish); just take a global lock and try to avoid deadlocking. Parallel systems programming is hard because it usually involves multiple complex synchronization primitives and multiple services all trying to maximally use multiple cores without blocking each other.
While it can be learned through practice, I highly suggest that you read a book on the theory (e.g., The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit). However, if you only learn one thing, learn this: locks are not pepper. Don't pepper your code with locks and expect it to work.
First, we need to understand what a lock does. While a useful simplification, a lock does not, strictly speaking, "protect some variable(s)".
- The primary purpose of a lock is to protect a critical section where invariants are violated. For example, adding a key/value to a map (in Go) takes multiple instructions. Overall, this sequence of instructions leaves the map in a consistent state. However, the state of the map is not consistent while these instructions are being executed. If another thread attempts to read the map while it's being updated, the read will try to operate on an invalid data structure and fail in unpredictable ways.
- The secondary purpose of a lock (and most other synchronization primitives) is to prevent certain optimizations that are valid in a single-threaded context but invalid when observed from a separate thread. Before attempting to write a parallel program in a new programming language, you should read that programming language's memory model (for example, the go memory model.
The most important part of using locks and synchronization primitives is to be consistent and deliberate:
- Clearly document thread-safety. Specifically, which functions can be called at the same time and which can't.
- Never use locks "just in case". If you don't know if you need a lock or not, your code is buggy.
- When you decide to use a lock, clearly document and consider the invariants maintained by the lock. For example, in Go, use the mutex "hat".
Writing Code For Review
This post covers how to take a patch from an idea to a merged feature.
Before writing code
Before writing code, sit down and write up an issue describing the change you intend to make. Up-front discussion in can avoid having to scrap/rewrite/rework code during a review.
Please do this even if your code is perfect and well tested:
- It may interact with other subsystems or in-flight projects that you aren't aware of.
- The feature may simply not be a high enough priority to warrant the maintainers attention. Reviewing code takes time.
- The subsystem you're modifying may have undocumented errata/bugs that need to be fixed first. While these issues should have been documented, should has little impact on reality.
- The feature may introduce a significant maintenance burden. Code is never written and forgotten, the maintainer will need to move, fix, and update your code as the project changes.
- The feature may not fit in with the goals of the project. That doesn't mean it's not useful, it just means it's out of scope.
Remember, you see the project through the lens of what you need to achieve your goals. The maintainer sees it as a collection of interacting features, interfaces, and subsystems that they'll need to maintain into the future.
Before submitting a patch/PR
Before submitting a patch/PR, run through the following list.
Cleanup
The first step when submitting a patch is to clean it up.
- Avoid dead code.
- Even when documented as "dead", it will bit-rot.
- When not clearly documented as dead, your reviewer will waste time reviewing it.
- Never leave in commented-out code with no explanation of why it's commented out. This usually indicates that some debugging logic got left behind.
- Use a linter. For Go, I recommend golangci-lint.
Testing
After cleaning up your code, the next step is to make sure that the remaining code is well tested. Bugs caught in review waste quite a bit of time.
- Test coverage is nice but not the end-all-be-all. Never trust your test coverage and don't bother shooting for 100%. You'll end up wasting time testing every error path instead of stressing the critical paths.
- Systems programming is different. The concern is rarely "does this function work in isolation",
it's usually "does this function/subsystem work when interleaved with everything else?"
- Write parallel stress tests. Even if they're not deterministic, this will catch quite a few bugs.
- If you're using Go, use the race detector
go test -race
.
- Use it and/or integrate it before asking for review. You'll likely want to rework your interfaces.
- Explain your code to a rubber ducky (or your partner, a child, a wall, a friend (real or imaginary)). If you can't easily do this, consider rewriting your code.
Note: TDD (test driven development) isn't sufficient for systems programming. You'll need to go back and write test-cases that stress your edge-cases and critical paths.
Documentation
Once your code has been tested, it's time to go back through and fill in any documentation you may have left out along the way. It's best to document as you go but you should always do an additional pass before submitting a patch for review.
- Document most types and functions.
- Even if the type name is obvious to you, it may not be to others.
- Small internal or "idiomatic" types/functions don't necessarily need to be documented.
- Private error & result types.
- Single-use helper functions.
- Be brief. Use short sentences, short paragraphs, and plenty of punctuation and bullet points. Nobody wants to read an essay.
- Small examples go a long way. Write them.
- Specify implicit contracts: those not expressed in the types themselves.
- Is the function thread-safe? Under what conditions/assumptions?
- Can the function be called twice?
- Is it idempotent?
- Does it take ownership of the parameters?
- Can it panic?
- Use comments to sign-post your code and call out interesting quirks.
- Comments are great for sectioning long functions. This is especially true for setup functions and/or request handlers.
- If you had to think about something while writing it, it should have a comment explaining your thoughts.
- If you had to fix a non-obvious bug, there should be a comment to prevent someone from making the same mistake.
- If it looks like something can happen but can't, write that down. You'll save someone some head-scratching later.
- If you write some code that looks useless, explain why it's not useless or someone will remove it.
- NEVER leave a comment that tells the reader to do/not do something without an explanation. Even if you're the only one working on the project, you will almost certainly forget why when you revisit the code later.
Break It Up
Finally, when possible, pull small, stand-alone pieces out of large patches and submit them separately. This allows the reviewer to focus on the meaty changes.
Alternatively, break these separate patches into separate commits and clearly communicate that the patch should be read commit by commit.
At the end of the day, this step depends on the style of patch you're submitting and the project's policies. In go-ipfs, for example, cleanup patches with multiple small changes batched into a single patch-set are usually fine and save everyone time fixing merge conflicts.
Submitting a PR
The most important part of submitting a PR/patch is communication.
- Clearly explain what's being changed, why, and how.
- Clearly communicate when something is ready and not ready for review.
For large changes, leave yourself a self-review.
- Call out interesting changes.
- Ask questions.
- Ask for advice.
- Call out alternative solutions that you've considered/tried.
For early feedback, submit a WIP patch with specific questions. WIP "please look, maybe" patches with no specific questions (code or design) are frustrating for reviewers as it's impossible to know what's done and what's still WIP.
After submitting a PR
Code review is a discussion. It's the responsibility of both the code author and the reviewer to work together to either merge or drop the patch.
Scope Creep and MVP
First, only you can prevent scope creep: It's perfectly acceptable (and good!) to say "let's punt that to a later PR".
However, bugs, data races, potential DoS vectors, etc. are never acceptable. MVP means minimum viable product. Viable means bug free and useful.
On the other hand, explicitly unimplemented and/or partially implemented features are OK. However, a patch must never include a component that looks done but isn't and, critically, should never reduce the stability of the program as a whole. Unimplemented/unfinished features that aren't documented as such are bugs.
While partially implementing a feature, you may need to create a new interface that you know you'll need to extend in the near future. When possible, make interfaces extensible from the start. When not, clearly document these interfaces as experimental/partial. Unfortunately, this isn't always enough and the maintainer may ask you to finish the feature up-front.
Finally, don't be a moving target. You will likely realize a better way to do something after submitting a PR. However, be careful to not repeatedly rewrite an in-progress PR as the previous review cycles will have been wasted.
Communication
You may have noticed the theme of this post: communicate clearly.
- Clearly communicate when something is ready for re-review.
- Clearly communicate when a review comment has been addressed.
A Two-Way Street
Review is a two-way street. The reviewer's job is to find problems; it's the author's job to find and propose solutions. A good reviewer/maintainer will offer potential solutions and insights but don't make your reviewer do all the heavy lifting.
This is especially true if you're implementing functionality that the maintainer hasn't explicitly requested. If you make the maintainer figure out how to plug the feature you need into the rest of the application, your patch will sit at the bottom of the pile until the maintainer has enough free time to figure it out (possibly indefinitely).
Finally, the less work a maintainer has to do, the faster they'll be able to review code and the faster your patches will be merged. If a maintainer points out issues in your patch, offer potential solutions and indicate the ones you prefer. If a maintainer points out related changes that need to be made to take your change all the way through to completion, offer to make those changes. At the end of the day, unless you're fixing a critical bug, you probably need your patches merged more than the maintainer does.
Becoming A Maintainer
Becoming a maintainer of an existing codebase involves understanding it to the point where you can confidentially review new patches and understand how they fit into the system as a whole.
There are three paths:
- Rewriting/refactoring the codebase.
- Reviewing code. This option forces you to read and understand the codebase.
- Debugging issues, answering questions, and triaging bugs. Even more so than reviewing, debugging requires you to read and understand a codebase.
Unfortunately, the first option is only usually available when you're the only maintainer so the only remaining choices are the second two.
Reviewing Code
Reviewing code is time consuming and it never gets easier; you just get better at it. Don't expect to wake up one day and suddenly get a speed-code-review superpower.
Treat a code review like you would async pair programming. You should understand the code you're reviewing better (literally, I'm not exaggerating just to make a point) than the author. You should be executing the code in your head. For parallel code, you'll need to execute multiple parallel copies at the same time.
If you've been asked to review some code and you find yourself thinking "I don't think I'm qualified to review this code", that's fine:
- Review it as best you can anyways.
- Ask someone to review the code as well.
If you do this enough, you'll eventually be able to review patches on your own.
Debugging
Debugging is a skill and, the more you work at it, the better you'll get, independent of the codebase. As an added benefit, the better you are at debugging, the better you'll be at spotting potential bugs in your code as you write it. You'll learn to recognize code "smells" and avoid them.
You will not magically get better at debugging by adding features or by fixing known/solved bugs. Once you're practiced at debugging, being familiar with a codebase will help you debug faster however, the best way to get better at debugging itself is to debug unfamiliar systems with minimal information.
The key takeaway here is: debugging, especially debugging unfamiliar systems with little information, is a skill you need to invest time in. If you hand off hard issues to "someone who knows better", you'll never get better at it.
Being A Maintainer
Maintaining a large OSS codebase with many users is a rewarding experience. You often get to work directly with users to address their problems and with contributes to merge their patches. You make things happen.
However, maintaining an OSS project with a sufficiently large user-base will be frustrating at times.
Saying No
As a maintainer, your job is to have a high-level view of the project, figure out how all the pieces will fit together, and, ultimately, to say yes or no.
You will say no to many well-intentioned patches:
- Good patches that implement useful features that are out of scope for the project.
- Patches that require more work than they're worth to bring into a mergeable state.
- Correct but convoluted patches that you just don't have time to review and/or simplify.
- Features that aren't worth the maintenance burden. There are many good features that aren't general-purpose enough to be worth maintaining.
You will not have time to fix everything:
- You will not have time to fix every bug.
- You will not have time to implement every feature, no matter how awesome.
- You will not have time to patiently answer every user's question.
- You will not have time to shepherd every patch.
- Users and contributors are distracting. The more you have, the less time you'll have to actually fix their issues.
Safety Tips
Maintaining OSS software is draining.
In most cases, front-line tech support and actual development work are isolated: those working front-line support aren't ultimately responsible for fixing bugs and those fixing bugs aren't directly in contact with customers. This means tech support doesn't feel personally responsible for customer issues and allows developers to focus on their work.
As an OSS maintainer, you won't have that luxury. Your users will come directly to you directly to complain about code for which you're directly responsible.
- Start training a replacement immediately and treat every new contributor as a potential maintainer. Even if you don't completely burn out, you'll need to hand off subsystems as the project grows.
- Users don't file bugs when something works. You will always hear complaints, you'll have to listen hard to hear happy users.
- You will feel like your users aren't listening and are making the same mistakes over-and-over. That's because every user is a different user, every contributor is a different contributor. The only solution is documentation or the patience of a kindergarten teacher. If you have to repeat yourself, it should be documented.
- Clearly say no, maybe later ("postponed"), or maybe if someone else takes the time to implement this ("help wanted"). Saying no is always better than uncertainty and you won't have time to address everything.
- Create processes and follow them. They allow you to draw boundaries and say no while keeping the discussion impersonal. This is why governments use processes and couldn't work without them.
- Don't take it personally or constantly second guess yourself. Just do the best you can then move on when you can't take it any more.
- Remember that there is only one of you. You can (and should) say "I don't have time for that" and/or "that is not a priority right now".