Ciaran O'Riordan's irregularly kept software freedom journal
Limit entries displayed: [ 2 ] [ 4 ] [ 6 ] [ 8 ]
(Last month's article: Using LaTeX to make PDF documents with Japanese characters)
I've found a better TeX tool for making Japanese PDFs: XeTeX. Below are first the technical advantages, and then an analysis of community and sustainability.
XeTeX is a version of Tex that has been modified to use Unicode (UTF-8) encoding internally. It is also configured to work with modern font tools such as FreeType and fontconfig. With XeTeX, the minimal example from my last article becomes:
\documentclass[12pt]{article}
\usepackage{fontspec}
\setmainfont{Sazanami Mincho}
\begin{document}
\section{What I learned today}
I can write this 私はキランです in Japanese.
\end{document}
This is converted to a PDF with the command line
tool xelatex. XeTeX has been part of the very
common TeX Live bundle
since TeX-Live-2007. So if LaTeX is available for your GNU/Linux
distro, I'm sure TeX Live is too, and thus XeTeX.
(TeX-Live-2008 will
be released
soon.)
(For a more complex example, see jlesson002.tex, and the output jlesson002.pdf.)
One improvement in this example is that I wrote the file in the very common UTF-8 encoding. This means I don't have to tell my applications to use the JP-EUC format that LaTeX+CJK would have required, and it means I'm less likely to have compatibility problems with other text processing tools. (This article was actually supposed to be about converting Japanese TeX to plain text, but an application's lack of support for JP-EUC encoding led me to research UTF-8 versions of TeX.)
A second improvement is that I could use the standard "article" document class. When using CJK, you can only use document classes that have been specifically written to work with CJK. There is a CJK-enabled equivalent for "article", called "scrartcl", but for some others classes, there's no equivalent that works with CJK.
Another improvement is that the font is specified in a much more readable way ("Sazanami Mincho"), and if I want to use another font, I can use this fontconfig command at the shell to find all fonts on my system that include Japanese characters:
fc-list :lang=ja
On my system, this finds six fonts. The differences between Gothic and Mincho are roughly equivalent to sans-serif and serif fonts in Western scripts.
It's hard to find a list of free Japanese fonts. It seems that many Japanese font developers have invented their own licences. Two free fonts available are Kochi and Sazanami, of which some say the latter is slightly better, but I can't see any difference. There is also a font called "UmePlus", which seems to be free, but is missing from some distributions (such as Debian) because the licence is somewhat unclear (but it looks fine to me). When I say "free", I mean it in the free software sense, e.g. that everyone can use, copy, modify, and redistribute (modified or unmodified).
Note: I set the default font to a Japanese font because my documents are wholly/mostly in Japanese. If you just wanted to add some Japanese to a mostly English document, XeTeX is still a good option, but I won't go into how to do that (it involves defining a Japanese environment and beginning the environment, entering Japanese, then ending the environment).
A last, minor technical improvement is output file size. For a
one-line test file, pdflatex made a file of 19.6kb,
and xelatex made one of only 7.5kb. For a more
complex 1-page file
(jlesson002.tex),
the XeTeX output was 15.1kb, and when I converted it to
LaTeX-CJK, pdflatex made a file of 65.2kb.
What about community support and sustainability?
Is it safe to move from the old reliable LaTeX+CJK package to this new XeTeX thing? Will XeTeX still have a developer community in the future? Will developers of other TeX tools take care to ensure their packages work with XeTex? What do Japanese TeX users use?
My searches suggest that Japanese TeX users are using a mix of tools. Some use pTeX, which is a version of TeX modified specifically to work with Japanese. Others use LaTeX+CJK. But there seems to be consensus that these are tools of the past and that Unicode is the future. So change is coming.
Japanese top Tex expert Haruhiko Okumura said in April 2007: "Since pTeX for Unicode is now being developed and XeTeX is acquiring pTeX-like versatility, next year I'll be using either the new pTeX or XeTeX."
The pTeX for Unicode project he's referring to is uptex. It exists, but seems to be still in alpha (early testing) stage. It isn't available in the Debian archives, but someone has made Debian uptex packages. (I haven't tested them.)
If Mr. Okumura has now adopted upTeX or XeTeX, I bet he chose XeTeX.
Next, I got really scientific. I put a few combinations of words into search engines, each time including "2008", a Japanese word, and either "uptex" or "xetex". Each time, XeTeX won by miles. So I guess Japanese people are not currently using uptex. I think XeTeX is winning the battle for Unicode TeX in Japan.
XeTeX being accepted into the TeX Live bundle is also a strong endorsement that XeTeX's future is safe, and the mainainer of LaTeX-CJK is discussing if it and XeTeX can be merged.
The only bad sign I saw about XeTeX is that the maintainer has recently resigned his job, but, he says this shouldn't affect his ability to maintain XeTeX.
Ok, so that's this month's TeX wisdom from a newbie :-) Hopefully next month's article will be about generating plain text files from the same Japanese TeX source files used for generating PDFs. Final note: I'm pretty sure all these tips work for Chinese, Korean, and other foreign characters, but I haven't tried that yet.
For more info and links about computers, free software, and Japanese, see my Learning Japanese page.
UPDATE: I just found Dave Crossland's summary of the recent 4-day TeX Users Group conference: day 1 day 2, day 3, and day 4. There are also videos of the event
--
Ciarán O'Riordan,
Support free software: Join FSFE's
Fellowship
Even if you know nothing about LaTeX, you can make your first
Japanese PDF document by taking a copy of this example
file JIS.tex, going
to a shell command line and typing "pdflatex
JIS.tex". That should produce this
output: JIS.pdf.
If that doesn't work for you, then you need to install some LaTeX
software or Japanese fonts. On my
Debian GNU/Linux
system, I think I just installed texlive-latex-base and
latex-cjk-japanese, and the package manager
automatically installed the other packages needed by those two. I
don't remember if I also had to install a fonts package.
Once you've got that working, you can start modifying and removing lines from that example file to see what you really need. I trimmed it down to eight lines:
\documentclass[12pt]{scrartcl}
\usepackage{CJK}
\begin{document}
\begin{CJK*}[dnp]{JIS}{min}
\section{What I learned today}
I can write this 私はキランです in Japanese.
\end{CJK*}
\end{document}
%%% Local Variables:
%%% coding: euc-japan
Ok ok, that's ten lines since I included two commands at the end to tell Emacs which character encoding to use when saving the file. This seems important since when I saved it as utf-8, the pdflatex program failed. Because these two lines start with percent signs, they will be ignored by LaTeX processors such as pdflatex, so it's safe to leave them there even if you're not using Emacs.
In the sixth line of my small example you should see seven mostly-simple Japanese characters. If that's not what you see, try setting your browser's character encoding to EUC-JP or maybe UTF-8. (This might be in [menu-bar]->View->Character Encoding->...)
Once you have this working, you should look at the other examples
that came with the LaTeX CJK package. On my system, the examples
are installed in the
directory /usr/share/doc/latex-cjk-japanese/examples/
(Thanks for the
tip, LUK
ShunTim) This is probably also the best way to get started with
other complex fonts such as Chinese and Korean.
It took me four hours to figure out how to use LaTeX to make a PDF document with Japanese characters. At one point, I became so frustrated with the LaTeX documentation that I gave up and decided to use DocBook instead. Unfortunately, DocBook's documentation was just as bad.
I think I learned something from all this about what makes a good tutorial: get the user to a working example as quickly as possible. Once you have something working, then you can experiment and learning becomes fun.
For a start, I think I'll put the "ruby" commands from JIS.tex back in since they're a pretty useful reading aid for learners. "Ruby" here refers to the little superscript phonetic kana characters, usually called furigana. It has no relation to the Ruby programming language, which was developed by a Japanese guy.
To write Japanese hirigana, katakana, and kanji, in Emacs you just
use the function
M-x set-input-method and then type
japanese at the prompt. The usual command
(C-h I) will show the documentation for how the
input method works. While using the japanese input
method, typing qq will put you into
the japanese-ascii input method, which you'll need for
typing LaTeX commands and symbols "\{}". And
qq again will bring you from
the japanese-ascii input method back to the
normal japanese input method.
If you want to use other applications, then you'll need to install some separate input method software. I installed the packages "anthy", "scim", and "scim-canna" and then was able to write Japanese in GNOME applications by right clicking in a text box and from the "Input Methods" submenu, choosing "SCIM Input Method". It's annoying that SCIM uses Ctrl+Space as it's activation sequence. You can change this by going to "Show command menu->SCIM Setup->Global Setup" I wasn't able to get OpenOffice.org to work. From looking around, it seems OpenOffice only supports "IIIMP", but I can't see any package that provides IIIMP.
You might find useful info on these pages:
Hope that helps!
--
Ciarán O'Riordan,
Support free software: Join FSFE's
Fellowship