Getting Accents / UTF8 in Mutt using Xterm with TrueType
Written by max on 2009-06-08
Overview
I’m starting to adopt Unicode/UTF-8 into my world to allow for more internationalization in my projects. Someday I might play with Perl6’s UTF-8 stuff too. I also want to be able have different charactersets render correctly while using Mutt. I often get email from people in Western Europe, Eastern Europe, and SouthEast Asia, and they are all using different character sets. UTF-8 is the “universal” one.
The Components
Xterm
In order to have a nice playing Xterm with UTF-8, 256-colors and TrueType, I have to roll my own.
- Download the latest one from ftp://invisible-island.net/xterm/xterm.tar.gz.
- Compile it
./configure \ --disable-desktop \ --with-x \ --enable-256-color \ --enable-load-vt-fonts \ --enable-paste64 \ --enable-readline-mouse \ --enable-tcap-fkeys \ --enable-tcap-query \ --enable-wide-chars \ --with-Xaw3d make make install
Note the crucial –enable-wide-chars needed for UTF-8. Your xterm may already be compiled with some or all of these options.
Once I have that, I create a small shell script that I call xt that launches it for me :
#!/bin/sh # None of these have extended charsets :-( font="Consolas" #font="DejaVu Sans Mono" #font="Bitstream Vera Sans Mono" #font="luxi mono" #font="Andale Mono" #font="courier" font_size=12 exec xterm -bg black -fg white \ -fa "$font" \ -fd "$font" \ -fs $font_size \ -j -s \ -sb -si -sk -vb -sl 1024 -rightbar \ +sf +dc -cr darkgreen \ -u8 -geometry 100x40 \ $@ & exit |
The important options here are -u8, -fa and -fd. This would be perfect, except that I can’t find a good mono-spaced TrueType font that has all of the character sets. More on that in the font section.
If you want support for all character sets, you can use the “fixed” font built in to X, which has a very
complete character set. I use this script inspired by Marjan Parsa instead :
#!/bin/sh exec xterm -bg black -fg white \ -j -s \ -sb -si -sk -vb -sl 1024 -rightbar \ +sf +dc -cr darkgreen \ -u8 -geometry 100x40 \ -xrm "xterm*font:-misc-fixed-medium-r-normal--18-120-100-100-c-90-iso10646-1" \ -xrm "xterm*wideFont:-misc-fixed-medium-r-normal-ja-18-120-100-100-c-180-iso10646-1" \ $@ & exit |
Fonts
You have two choices here : Fixed fonts and TrueType fonts. In my experience, the TrueType fonts look better, and the Fixed fonts have more characters supported.
TrueType
I can’t seem to find a perfect TrueType “Console” font that is fixed-width, looks great, and supports all the Unicode (UTF-8) character sets like Europe, Hebrew, Japanese, Chinese, etc. So I compromise the character set support and just get the pretty one. So far I am most happy with Microsoft’s “Consolas”. Yes, Microsoft!
Here are some good ones to check out:
- Consolas
- DejaVu Sans Mono
- luxi mono
If you would like to add TrueType fonts to your account / home directory locally, just create a ~/.fonts directory and then copy the .ttf fonts into it. Try these command to explore what fonts you have :
fc-list # see what fonts are loaded. xfd -fa "DejaVu Sans Mono" # explore a certain font |
Fixed / System Fonts
I use the following two options when launching Xterm to specify the system fonts :
-xrm "xterm*font:-misc-fixed-medium-r-normal--18-120-100-100-c-90-iso10646-1" -xrm "xterm*wideFont:-misc-fixed-medium-r-normal-ja-18-120-100-100-c-180-iso10646-1" |
Try these commands to see what UTF-8 fonts you have and what’s in them :
xlsfonts | grep 10646 xfd -fn "-ibm-courier-medium-r-normal--0-0-0-0-m-0-iso10646-1" |
This is the settings / fonts I used for the above screen shot. See also this Linux Font Tutorial.
Shell / Environment Variables
Key to all of this is that you have your LANG environment variable set correctly
Edit ~/.bashrc and make sure this line is happening :
export LANG=en_US.utf8 |
Remember, you will need this on two machines : the machine you are creating the X-Term window on, and the machine that you are launching mutt on. These may be the same machine, maybe not.
To get a list of the locales available use this command :
locale -a | grep -i utf |
Note that these are case-sensitive, so if you have it set to en_US.UTF8 instead of en_US.utf8 you will get some wanky errors.
Mutt
Mutt works pretty well out of the box w/ UTF8. For reference see The Mutt FAQ.
I used these options in my .muttrc:
set allow_8bit set allow_ansi=yes # in msgs charset-hook us-ascii iso-8859-1 set send_charset="us-ascii:iso-8859-1" |
So what this is doing is still using UTF-8 as my LANG setting and encoding for the terminal, but it’s sending my messages out as iso-8859-1. It also is assuming that anything that comes in as “us-ascii” is actually iso-8859-1.