(Last month's
article: Using
LaTeX to make PDF documents with Japanese characters)
I've found a better TeX tool for making Japanese PDFs: XeTeX. Below
are first the technical advantages, and then an analysis of
community and sustainability.
XeTeX is a version of Tex that has been modified to
use Unicode
(UTF-8) encoding
internally. It is also configured to work with modern font tools
such
as FreeType
and
fontconfig. With XeTeX,
the minimal example from my last article becomes:
\documentclass[12pt]{article}
\usepackage{fontspec}
\setmainfont{Sazanami Mincho}
\begin{document}
\section{What I learned today}
I can write this 私はキランです in Japanese.
\end{document}
This is converted to a PDF with the command line
tool xelatex. XeTeX has been part of the very
common TeX Live bundle
since TeX-Live-2007. So if LaTeX is available for your GNU/Linux
distro, I'm sure TeX Live is too, and thus XeTeX.
(TeX-Live-2008 will
be released
soon.)
(For a more complex example, see
jlesson002.tex,
and the output
jlesson002.pdf.)
One improvement in this example is that I wrote the file in the very
common UTF-8 encoding. This means I don't have to tell my
applications to use the JP-EUC format that LaTeX+CJK would have
required, and it means I'm less likely to have compatibility
problems with other text processing tools. (This article was
actually supposed to be about converting Japanese TeX to plain text,
but an application's lack of support for JP-EUC encoding led me
to research UTF-8 versions of TeX.)
A second improvement is that I could use the standard
"article" document class. When using CJK, you can only
use document classes that have been specifically written to work
with CJK. There is a CJK-enabled equivalent for
"article", called "scrartcl", but for some
others classes, there's no equivalent that works with CJK.
Another improvement is that the font is specified in a much more
readable way ("Sazanami Mincho"), and if I want to use
another font, I can use this fontconfig command at the shell to find
all fonts on my system that include Japanese characters:
fc-list :lang=ja
On my system, this finds six fonts. The differences between Gothic
and Mincho are roughly equivalent to sans-serif and serif fonts in
Western scripts.
It's hard to find a list of free Japanese fonts. It seems that many
Japanese font developers have invented their own licences. Two free
fonts available are Kochi and Sazanami, of which some say the latter
is slightly better, but I can't see any difference. There is also a
font called "UmePlus", which seems to be free, but is
missing from some distributions (such as Debian) because the licence
is somewhat unclear (but it looks fine to me). When I say
"free", I mean it in
the free
software sense, e.g. that everyone can use, copy, modify, and
redistribute (modified or unmodified).
Note: I set the default font to a Japanese font because my documents
are wholly/mostly in Japanese. If you just wanted to add some
Japanese to a mostly English document, XeTeX is still a good option,
but I won't go into how to do that (it involves defining a Japanese
environment and beginning the environment, entering Japanese, then
ending the environment).
A last, minor technical improvement is output file size. For a
one-line test file, pdflatex made a file of 19.6kb,
and xelatex made one of only 7.5kb. For a more
complex 1-page file
(jlesson002.tex),
the XeTeX output was 15.1kb, and when I converted it to
LaTeX-CJK, pdflatex made a file of 65.2kb.
What about community support and sustainability?
Is it safe to move from the old reliable LaTeX+CJK package to this
new XeTeX thing? Will XeTeX still have a developer community in the
future? Will developers of other TeX tools take care to ensure
their packages work with XeTex? What do Japanese TeX users use?
My searches suggest that Japanese TeX users are using a mix of
tools. Some use pTeX, which is a version of TeX modified
specifically to work with Japanese. Others use LaTeX+CJK. But
there seems to be consensus that these are tools of the past and
that Unicode is the future. So change is coming.
Japanese top Tex
expert Haruhiko
Okumura said in April 2007: "Since pTeX for Unicode is now
being developed and XeTeX is acquiring pTeX-like versatility, next
year I'll be using either the new pTeX or XeTeX."
The pTeX for Unicode project he's referring to
is uptex.
It exists, but seems to be still in alpha (early testing) stage. It
isn't available in the Debian archives, but someone has made
Debian
uptex packages. (I haven't tested them.)
If Mr. Okumura has now adopted upTeX or XeTeX, I bet he chose XeTeX.
Next, I got really scientific. I put a few combinations of words
into search engines, each time including "2008", a
Japanese word, and either "uptex" or "xetex".
Each time, XeTeX won by miles. So I guess Japanese people are not
currently using uptex. I think XeTeX is winning the battle for
Unicode TeX in Japan.
XeTeX being accepted into the TeX Live bundle
is also a strong endorsement that XeTeX's future is safe, and
the mainainer of LaTeX-CJK is discussing if it and XeTeX can
be merged.
The only bad sign I saw about XeTeX is that the maintainer has
recently resigned his job, but, he says
this shouldn't
affect his ability to maintain XeTeX.
Ok, so that's this month's TeX wisdom from a newbie :-) Hopefully
next month's article will be about generating plain text files from
the same Japanese TeX source files used for generating PDFs. Final
note: I'm pretty sure all these tips work for Chinese, Korean, and
other foreign characters, but I haven't tried that yet.
For more info and links about computers, free software, and Japanese, see my Learning Japanese page.
UPDATE: I just found Dave Crossland's summary of the recent 4-day TeX Users Group conference: day 1
day 2,
day 3, and
day 4. There are also videos of the event
--
Ciarán O'Riordan,
Support free software: Join FSFE's
Fellowship