Getting Accents / UTF8 in Mutt using Xterm with TrueType

Written by max on 2009-06-08

Overview

I’m starting to adopt Unicode/UTF-8 into my world to allow for more internationalization in my projects. Someday I might play with Perl6’s UTF-8 stuff too. I also want to be able have different charactersets render correctly while using Mutt. I often get email from people in Western Europe, Eastern Europe, and SouthEast Asia, and they are all using different character sets. UTF-8 is the “universal” one.

Shows Hebrew, Chinese and Western European characters all at the same time.

Shows Hebrew, Chinese and Western European characters all at the same time.

The Components

Xterm

In order to have a nice playing Xterm with UTF-8, 256-colors and TrueType, I have to roll my own.

  1. Download the latest one from ftp://invisible-island.net/xterm/xterm.tar.gz.
  2. Compile it
    ./configure \
         --disable-desktop \
         --with-x \
         --enable-256-color \
         --enable-load-vt-fonts \
         --enable-paste64 \
         --enable-readline-mouse \
         --enable-tcap-fkeys \
         --enable-tcap-query \
         --enable-wide-chars \
         --with-Xaw3d
     
    make
    make install

    Note the crucial –enable-wide-chars needed for UTF-8. Your xterm may already be compiled with some or all of these options.

Once I have that, I create a small shell script that I call xt that launches it for me :

#!/bin/sh
# None of these have extended charsets :-(
font="Consolas"
#font="DejaVu Sans Mono"
#font="Bitstream Vera Sans Mono"
#font="luxi mono"
#font="Andale Mono"
#font="courier"
font_size=12
 
exec xterm -bg black -fg white \
    -fa "$font" \
    -fd "$font" \
    -fs $font_size \
    -j -s \
    -sb -si -sk -vb -sl 1024 -rightbar \
    +sf +dc -cr darkgreen  \
    -u8 -geometry 100x40 \
    $@ & 
exit

The important options here are -u8, -fa and -fd. This would be perfect, except that I can’t find a good mono-spaced TrueType font that has all of the character sets. More on that in the font section.

If you want support for all character sets, you can use the “fixed” font built in to X, which has a very
complete character set. I use this script inspired by Marjan Parsa instead :

#!/bin/sh
 
exec xterm -bg black -fg white \
    -j -s \
    -sb -si -sk -vb -sl 1024 -rightbar \
    +sf +dc -cr darkgreen  \
    -u8 -geometry 100x40 \
    -xrm "xterm*font:-misc-fixed-medium-r-normal--18-120-100-100-c-90-iso10646-1"  \
    -xrm "xterm*wideFont:-misc-fixed-medium-r-normal-ja-18-120-100-100-c-180-iso10646-1"  \
    $@ & 
exit

Fonts

You have two choices here : Fixed fonts and TrueType fonts. In my experience, the TrueType fonts look better, and the Fixed fonts have more characters supported.

TrueType

I can’t seem to find a perfect TrueType “Console” font that is fixed-width, looks great, and supports all the Unicode (UTF-8) character sets like Europe, Hebrew, Japanese, Chinese, etc. So I compromise the character set support and just get the pretty one. So far I am most happy with Microsoft’s “Consolas”. Yes, Microsoft!

Here are some good ones to check out:

If you would like to add TrueType fonts to your account / home directory locally, just create a ~/.fonts directory and then copy the .ttf fonts into it. Try these command to explore what fonts you have :

fc-list                 #  see what fonts are loaded.
xfd -fa "DejaVu Sans Mono"  #  explore a certain font

Fixed / System Fonts

I use the following two options when launching Xterm to specify the system fonts :

-xrm "xterm*font:-misc-fixed-medium-r-normal--18-120-100-100-c-90-iso10646-1"
-xrm "xterm*wideFont:-misc-fixed-medium-r-normal-ja-18-120-100-100-c-180-iso10646-1"

Try these commands to see what UTF-8 fonts you have and what’s in them :

xlsfonts | grep 10646
xfd -fn "-ibm-courier-medium-r-normal--0-0-0-0-m-0-iso10646-1"

This is the settings / fonts I used for the above screen shot. See also this Linux Font Tutorial.

Shell / Environment Variables

Key to all of this is that you have your LANG environment variable set correctly

Edit ~/.bashrc and make sure this line is happening :

export LANG=en_US.utf8

Remember, you will need this on two machines : the machine you are creating the X-Term window on, and the machine that you are launching mutt on. These may be the same machine, maybe not.

To get a list of the locales available use this command :

locale -a | grep -i utf

Note that these are case-sensitive, so if you have it set to en_US.UTF8 instead of en_US.utf8 you will get some wanky errors.

Mutt

Mutt works pretty well out of the box w/ UTF8. For reference see The Mutt FAQ.

I used these options in my .muttrc:

set allow_8bit
set allow_ansi=yes                  # in msgs
charset-hook us-ascii iso-8859-1
set send_charset="us-ascii:iso-8859-1"

So what this is doing is still using UTF-8 as my LANG setting and encoding for the terminal, but it’s sending my messages out as iso-8859-1. It also is assuming that anything that comes in as “us-ascii” is actually iso-8859-1.