Adventures with man color

As it’s usually the case with me, a simple fix sends me to an unending rabbit hole of complex issues. And this was no exception.

It all started when I tried to help the Git project to move towards asciidoctor, a program that generates documentation from text files using a markdown language. The initial project was asciidoc, but it’s in a bit of a rot. The original asciidoc is written in Python (a language I detest), and the new asciidoctor in Ruby (a language I love), so I clearly saw an edge.

The problem starts with a feature asciidoctor has, but not asciidoc: generate man pages. Of course asciidoc can generate man pages, but it does so by first generating a docbook XML, and then docbook stylesheets can be used to convert those to man pages. The same can be done with asciidoctor, but additionally there’s an option to generate man pages directly.

Using docbook is very slow; generating man pages directly is way faster.

When I sent the initial patch (very trivial), a Git developer mentioned some “groff issues”. Apparently if your system uses groff (GNU troff), you are supposed to build git with GNU_ROFF=1:

Define GNU_ROFF if your target system uses GNU groff. This forces apostrophes to be ASCII so that cut&pasting examples to the shell will work.

It’s actually not “GNU groff”, but “GNU troff”, so this option is wrongly named, but regardless; virtually nobody is using it.

My Arch Linux system doesn’t build git with GNU_ROFF, it does use groff, and yet I don’t see any issues with apostrophes… So what’s going on?

The context

The original commit “Quote ‘ as \(aq in manpages” comes from 2009, and mentions the problem comes from “docbook/xmlto”, and apparently only affects groff. In the original thread “quote in help code example” you can see they mention the output of the git filter-branch command specifically, but when I look at that man page, it looks fine.

So I started to dig in, and for starters I don’t even know what groff is.

I tried different versions of asciidoc, they worked fine. Then I tried different versions of docbook stylesheets, they worked fine. I even tried different versions of asciidoctor, to see when the fixed the issue; the all worked fine. Weird.

So I decided to compile groff myself and find the point where it started to fail. First I tried a version two years ago, and it did not compile correctly, so I had to do some hacking to make it compile, after I did, everything worked fine. I continued going more and more into the past, fixing the compilation issues, and not finding any problem. I reached 2006, and still did not see any change.

This was strange. Clearly at some point in time, with some combination of tools there was a problem, but I couldn’t find either of those.

Then I decided to manually modify a man page, and put quotes directly… Bingo. I could see that ' was actually rendered as `, but what caused it?

I then moved forward in versions and to my surprise they all had this issue, even the most recent version of groff.

What on Earth is going on here?

While doing this I noticed something different from my compiled version of groff, and Arch Linux’s version. In the compiled version of groff I saw links rendered as blue. When I saw the generated man pages I saw \m[blue], but I assumed that was for some other kind of troff program or something totally unrelated. But no, here I was seeing blue, but not with all the groff binaries.

So I tried to build groff with the same options as Arch Linux… Still blue. After trying a few things I eventually found a difference: Arch Linux installs a file /usr/share/groff/site-tmac/man.local, if I remove that file the blue color returns. Inside the file there’s:

\" Shut off SGR by default (groff colors)
\" Require GROFF_SGR envvar defined to turn it on
if '\V[GROFF_SGR]'' \
  output x X tty: sgr 0

That’s it! If you export GROFF_SGR=1 on Arch Linux, you see man pages with colors, just like my compiled version. The reason my compiled version does this by default is that it doesn’t have Arch Linux’s man.local file.

GROFF_SGR

If you google GROFF_SGR you find that it’s not properly documented. Some distributions such as Debian and Arch Linux do disable groff’s colors, but they don’t document doing so. Debian “fixed it”. However, I don’t think most people are going to read the entirety of grotty’s man page, not even a little bit, so that doesn’t help, even if you are running Debian–where it’s documented.

However, if you read the man page, you will find another variable: GROFF_NO_SGR. Unlike GROFF_SGR, this one is standard, and it’s respected in all distributions.

This reminded me of trick I learned while reading Arch Linux’s installation guide:

man() {
    LESS_TERMCAP_md=$'\e[01;31m' \
    LESS_TERMCAP_me=$'\e[0m' \
    LESS_TERMCAP_so=$'\e[01;44;33m' \
    LESS_TERMCAP_se=$'\e[0m' \
    LESS_TERMCAP_us=$'\e[01;32m' \
    LESS_TERMCAP_ue=$'\e[0m' \
    command man "$@"
}

This code automatically converts parts of man pages to color (e.g. bold and underline), which looks much better than normal man pages, but it turns out it only works if you have groff’s SGR disabled, so… In other distributions you need to do GROFF_NO_SGR=1, for the above to work.

Cool. We found something.

Back to apostrophes

There’s something else in Arch Linux’s man.local:

char \' \N'39'

This converts \' to ', instead of groff’s default: \(aa (acute accent: ´). This is the reason why I could not reproduce the problem: Arch Linux was hiding it. However, this is the wrong way of fixing it. There is a reason groff developers decided to pick \(aa; they know better. Distribution packagers should not be overriding this.

Why did they do this? The change came due to task FS#9643 – man PKGBUILD shows slanted single quotes. This happened in 2008, which suggests there was indeed an issue around that time, and pacman documentation was built with asciidoc too.

Arch Linux fixed the issue in the wrong way, though. Debian chose a different path. In their bug report #507673 Shouldn’t parse ‘ to \’ they discussed the issue at length, and they correctly identified that the issue was in docbook-xsl (not groff), and if you are going to convert ' it should be to \(aq, but that would only work in groff. They also found that Pod::man had a portable solution:

.ie \n(.g .ds Aq (aq
.el .ds Aq '

This creates an alias: from Aq to (aq, but only when the program is groff, in all other programs it gets converted to '.

This is the correct solution in the correct layer, and generates the proper output everywhere.

But to check that it is the correct solution it would behoove us to understand what groff actually is. groff (or GNU troff) is a document formatting system; it receives text mixed with commands, and generates documents. A man page is just one of the many types of documents it can generate. It can for example generate a PDF.

So, let’s write a simple groff document:

.nf
single quotes: 'text'
single quoted quotes: \'text\'
apostrophe quote: \(aqtext\(aq
.fi

We can generate a PDF document using groff -T pdf test.groff > test.pdf. But this of course is not what we ultimately want, we want to generate a man page, in the same way as man does. To do that we need to specify the output device as utf8: groff -T utf8 test.groff > test.txt. This generates the following:

single quotes ': ’text’
single quoted quotes \': ´text´
apostrophe quote \(aq: 'text'

As you can see the output text is quite different from the input text; that’s what groff does. But this is only the case on a utf-8 system, if you specify the ascii output, then all the quotes above get translated to simply '.

The output with \(aq is correct in both utf-8 and ascii. And if we add the Aq alias:

apostrophe quote alias \(Aq: 'text'

That is indeed correct. The Debian fix seems to work. To make sure we would need a non-GNU troff, like in Solaris, but alas, I don’t have access to something like that, so I’m just going to assume it works in other systems (as other people reported it did).

This proper fix was eventually picked by docbook in 2010: Fixed bug #2412738 (apostrophe escaping) by applying the submitted patch. If you take a look at the code of git-filter-branch.1 you can see the fixed code in action:

git filter-branch --tree-filter *\(Aqrm filename*\(Aq HEAD

Therefore both Arch Linux’s and git’s workarounds are not necessary anymore. Yet they remain there ten years later.

Digging deeper

OK, so we found out what the issue was, and how it got fixed: in docbook, and also unnecessarily in git and Arch Linux (three different levels). But what caused it? Going back to groff from 2006 didn’t cause the issue, so what happened?

By looking back at docbook stylesheet’s history with git blame, I found out the commit that caused the issue: Reverted necessary escaping of backslash, dot, and dash. This commit happened in 2007, and it was made due to an internal limitation of docbook’s architecture.

So from 2007 to 2010 docbook stylesheets were generating wrong man pages, different projects worked-around the issue in different ways, but today–in 2021–these workarounds are not needed, and yet they remain in place.

Back to Git

After all this investigation I sent a patch to the Git project (doc: remove GNU troff workaround) to remove the GNU_ROFF option which clearly was not needed since at least ten years. But I also sent a comment about what I found regarding GROFF_SGR, and the trick to colorize man pages. In reply I received a suggestion to implement the LESS_TERMCAP trick into git help (which is basically an alias for man).

So I sent a patch (help: colorize man pages), and a big discussion propped up (typical due to the bike-shed effect). In that thread it was mentioned: “why not let the user configure man to do this?”. The problem is that you have too many moving parts; groff, man, git, less, distribution configurations, environment variables, aliases, workarounds, docbook and asciidoc bugs… And of course the thing that started it all: asciidoctor.

But it made me think: what is indeed the best way to configure man to do this?

After several days of investigation, and several days of trying options I arrived to what I think is the actual solution.

Solution

export MANPAGER="less -R --use-color -Dd+r -Du+b"
export MANROFFOPT="-c"

Unlike Arch Linux’s hack, the -D arguments to less are much more succinct, and they allow adding color (in addition to the style (e.g. underlined)), not removing information. So --color=d+r (long option for -D) converts d (Bold text) to r (red), and the + signifies add color (i.e. don’t remove the bold attribute). Moreover --use-color adds other colors to the less interface; the prompt is cyan, searches are in green, and warnings in yellow.

And instead of the the ugly GROFF_SGR=1, we can tell man to pass -c to groff.

So the full command is:

groff -T utf8 -m man -c git-filter-branch.1 | less -R --use-color -Dd+r -Du+b

No man involved. This is way simpler… Why is nobody using this?

After I updated my patch and other people tested it, it became clear it didn’t always work. In particular older versions of less did not have the -D options (at least not for Linux). So I checked the history of less and I found out that they enabled -D for Linux systems in 2021.

No wonder everyone is still using the LESS_TERMCAP_* variables. Nobody knows of the new option, because it’s too new!

So the patch to remove the GNU_ROFF option in Git (totally necessary in 2021) is there. And so are the updated Arch Linux instructions to use the new -D flags of less.

If you want to properly colorize man pages, you do this:

export MANPAGER="less -R --use-color -Dd+r -Du+b"
export MANROFFOPT="-c"

If you want to colorize other similar documentation (like Ruby’s documentation):

export RI_PAGER="less -R --use-color -Dd+r -Du+b"

And so on. And if you want less to format everthing:

export LESS='-RXF --use-color -Dd+r$Du+b'

That’s it. If you are running a recent enough version of less, everything works perfectly with a simple configuration.

Oh, also, I realized asciidoctor didn’t have the portable fix, so I sent them a patch that is now merged. I found an issue with less colors and searches that is fixed now. And I reported their unnecessary workaround to Arch Linux too.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.