Up till now the only prose checker/linter I’ve used in Emacs is jinx. It’s an amazing spell checker for programmers and technical writers: it handles snake_case and CamelCase symbols, knows what to spell-check based on the text’s face, etc. Unfortunately, it’s just a spell checker and won’t catch any grammatical errors.

Now I’m looking for a local (offline) grammar checker. Something to catch stupid mistakes: repeat words, cut off sentences, tense disagreements, homonyms, etc. I tend to jump around a lot when writing and want a tool that’ll tell me when I started a sentence in one tense and finished it in another, or failed to finish it entirely. Ideally the tool would also provide general writing advice — wording improvements, etc. — but I’ll probably need a full language model for that.

Vale

I first tried Vale because it’s by far the simplest option and has great support for markup languages and can even lint prose within source code. To be honest, it looks like a great choice for teams that want custom rule-based prose linting. Its primary power is the expressivity of its rules allowing organizations to set and enforce writing standards (e.g., “don’t start a sentence with ‘but’”). Unfortunately, it’s not a general-purpose grammar checker, and I’m not trying to conform to some specific set of writing stylistic rules.

However, if that’s what you want, vale has great Emacs integration via flymake-vale. I say great because (a) it uses flymake (no language server setup required) and (b) only checks changed text making it very fast and lightweight.

Harper

Next I tried Harper. Like Vale, it’s mostly rule-based. However, it’s more focused on providing general-purpose writing tips than on enforcing specific stylistic rules: it can catch repeat words and tell me when a sentence is too long; it has some very basic grammar rules and can catch article/noun disagreements (a/an); it even has some rules to catch “it’s” versus “its”. I hate it.

  • Harper thinks “given its computed value” should be “given it’s computed value”.
  • Harper won’t catch errors like “Go to a the park.”
  • Harper likes to harp on “mistakes” like “long” sentences.

Harper is like that middle-school english teacher with their highly rigid rules about writing: great for teaching good habits to the notice writer but, at this point, I’ve ingrained them. When I break these rules, it’s (usually) because I meant to break them.

What I need is a tool to help me catch stupid errors and/or a tool to help improve the flow of my writing. Harper is not that tool.

Worse, the only Emacs integration I found was harper-ls, a language server. In Emacs specifically, this has a few downsides:

  • Eglot (the default language server package in Emacs) only supports a single language server per buffer, so I can’t use Harper along with some other language server.
  • Eglot expects Language Servers to be used in “projects” and will enumerate all project files when it starts the language server. Starting a language server in my home directory takes forever as it tries to enumerate all my files (including my backup snapshots…).
  • Eglot will start one language server per project (in this case, often per directory).

However, if you really want to use Harper, you can trick Eglot into treating all documents (with the same major mode) managed by harper as if they were in the same project. If you don’t care, skip to the next section.

First you’ll need to define a new project type for harper. In my case, I “root” the project in a guaranteed empty directory.

(cl-defmethod project-root ((project (eql harper)))
  "/var/empty")

(cl-defmethod project-name ((project (eql harper)))
  "harper")

This virtual “harper” project contains all files managed by the harper language server. Honestly, it’s probably OK to simply return nil here.

(cl-defmethod project-files ((project (eql harper)))
  (when-let* ((server (cl-find-if #'eglot--languageId
                                  (gethash (eglot--current-project)
                                           eglot--servers-by-project))))
    (mapcar #'buffer-file-name (eglot--managed-buffers server))))

Finally, add a project “finder” function that puts all buffers that would be managed by harper into the “harper” project, if and only if we’re acting on behalf of Eglot (eglot-lsp-context is non-nil).

(defun project-try-harper (dir)
  (and eglot-lsp-context
       (string= (cadr (eglot--lookup-mode major-mode)) "harper-ls")
       'harper))

(add-to-list 'project-find-functions #'project-try-harper)

LanguageTool

Finally, I broke down and set up LanguageTool: it’s by far the most advanced free-software grammar tool. The next step-up is Grammarly and/or LanguageTool’s online offering, but I absolutely refuse to use a remote grammar tool (I’m not sending all my notes, etc. to someone else’s server).

Unfortunately, it’s definitely a heavyweight. It uses at least 2.5GiB of memory, and you’ll need the 14GiB n-gram database if you really want it to do its job. On the other hand, it’s leagues beyond Harper and Vale in terms of catching grammatical mistakes.

In terms of Emacs integration, I ended up creating a fork of flymake-languagetool that has improved performance (avoids re-checking the entire document on change); excludes markup, code, etc. before sending text to LanguageTool instead of filtering errors afterward for improved accuracy/performance; and renders diagnostics in the correct location even when Emoji are present in the buffer. I considered using a language server, but I had enough of that with Harper.

I recommend you find a good LanguageTool docker container or something because setting it up from scratch isn’t a fun exercise. However, if you’re like me and (a) use Arch Linux and (b) avoid sketchy docker containers, read on.

On Arch, I installed the languagetool package from the extra repository (pick jre21-openjdk-headless when prompted), along with a few AUR packages: languagetool-ngrams-en, fasttext, and fasttext-langid-models. The fasttext packages are only used for language detection so they should be optional, but LanguageTool’s HTTP server complains if they’re not installed.

Next, you’ll need to create a configuration file to tell LanguageTool where to find everything:

fasttextModel=/usr/share/fasttext/lid.176.bin
fasttextBinary=/usr/bin/fasttext
languageModel=/usr/share/ngrams

You’ll need to pass --config /path/to/my/config.properties to the languagetool command (which you can configure via flymake-languagetool-server-command and flymake-languagetool-server-args:

(setopt flymake-languagetool-server-command "languagetool"
        flymake-languagetool-server-args
        '("--http" "--allow-origin" "*"
          "--config" "/path/to/my/config.properties"))

That’ll get you a basic functioning LanguageTool server.