Inputting Japanese text in Linux and some BSDs

First, the disclaimers--Japanese in Linux and the BSDs constantly improves, and parts of this page are often deprecated. If you find any errors here, feel free to send an email to scottro11[at]gmail.com.

This page is now listed, albeit with an incorrect URL, in the scim-anthy README.

As I get busier, it becomes harder to keep up with various distributions. When this page was first begun, around 2002, it was one of the few of its kind and there were far fewer distributions to follow. In many cases, you are now better off going to the main forum for your distribution, and looking for recent tutorials there.

It used to be far more difficult to get Japanese working. Nowadays, one can just go to translate.google.com, choose Japanese as the source, and begin typing phoneticly. It will work like most Japanese input engines, using hiragana and giving kanji options if you hit the space bar.

If you read this page, you'll see that almost all of the information has been gathered from others--in other words, if you have a problem and write me, I will help if I can, but I don't know that much about it.

A few quick introductory links: For a far more detailed treatment of this subject, see Dr. Mike Fabian's page on the Suse website. Charles Muller's site has a page on Japanese in Mandrake and David Thiel's page on Japanese in FreeBSD is brief, but very easy to understand and useful. David Oftedal has a page about Internationlization in Gentoo Linux which covers several languages.

There is a series of articles on using Fedora with Japanese and Chinese at voom.net. Its author has generously given me permission to link to it here.

JWS has a helpful page on multi-lingual text in Linux (which is again referenced in the printing section of this page.)

Note that this article only covers using Japanese in X. Nowadays, with ibus-fbterm, one can actually, using framebuffer, input Japanese in console, but so far, it hasn't worked very well for me, and only worked in Arch, not Fedora. Debian didn't have the ibus-fbterm package, and those are the only three that I've tried.

I most frequently use Japanese in an xterminal with things like vi and mutt as well as OpenOffice. Therefore, these are often the only things that I checked. In most cases, if it works in these applications, it will also work with firefox, thunderbird and the like.

The kinput2 cannaserver combination used to be the input method of choice. Then it was scim-anthy. Then ibus-anthy. Now ibus-mozc is also becoming common.

There are still many distributions that don't have it and some don't even have scim-anthy or uim-anthy or uim-scim packages. Although I give information below about compiling scim, anthy and scim-anthy from source, some of the more newcomer friendly distributions have trouble with compiling source code.

The ibus page is here, but, at least for Japanese, the instructions are a bit lacking.

CentOS 5.x

(Last update December, 2010) CentOS 5.x doesn't have ibus available. It can probably be built from source, but some of its dependencies are newer than what is available in CentOS. Judging from the RedHat 6 beta, CentOS 6 will have ibus as its default input manager.

However, RedHat, including Fedora, seems to have a slightly broken version of ibus-anthy. If one takes their default Gnome desktop, or the KDE one also offered, there is no problem. However, if like myself, you prefer to do a minimal installation, then add X, then add a window manager such as fluxbox or openbox, you'll find that ibus won't work.

The trick seems to be to also install dbus-x11. After installing it with yum -y install dbus-x11, ibus-anthy seems to work as expected in most applications. I have filed a bug at bugzilla, and it is already fixed, in Fedora 14. If not, simply installing dbus-x11 should fix the issue. Apparently, it won't make it into RH 6.0 (and therefore CentOS 6.0 and Scientific Linux 6.0. See the RHEL bug on bugzilla for status updates.

Back to current (as of December, 2010), CentOS, the scim-anthy package is available from the standard CentOS repos. One can choose to install Japanese support during installation, or just install needed packages afterwards. As RedHat based distributions tend to throw in everything you might possibly need under any circumstances, you might prefer to not select Japanese support during installation and just add scim-anthy afterwards.

Install anthy, scim and scim-anthy

yum install scim-anthy
This will pull in scim and anthy as dependencies.

If you always want Japanese immediately available, add these lines to your .bash_profile. (For the absolute newcomer, these two files are located in your home directory. That is, if your user name is john, they'll be found in /home/john, often referred to as $HOME or even ~. Note that they are dotfiles, that is, they have a period in front of them, so if using one of the graphic text editors, you might have to specify that it show hidden files or show all files. I haven't used such editors in years, so I'm not quite sure of the latest way to do that.)

export XMODIFIERS='@im=SCIM'
export GTK_IM_MODULE="scim"
export QT_IM_MODULE="scim"
export LC_CTYPE=en_US.UTF-8
scim -d
The above assumes that United States style English is your default language.

To put them into effect immediately

source .bash_profile

You should see a message that scim is running. (However, this doesn't always work--if it doesn't, just log off and log on again.)

With UTF-8 as your default encoding, theoretically, you shouldn't even have to set LC_CTYPE to ja_JP.UTF-8. You should be able to use your own language, e.g en_US.UTF-8. My experience has been that sometimes this works, and sometimes it doesn't. I would actually try setting LC_CTYPE to your own language (in my case, en_US.UTF-8) and seeing if things work properly. If they don't, then change LC_CTYPE from your native language to ja_JP.UTF-8.

Now when you start most applications, hitting ctrl+space will open up a little scim panel in the lower right of the screen. If you enter english text, you will see hiragana appear. If you hit the space bar, it will select kanji. Note that the panel should have the word Anthy on it. If it doesn't click the words RAW CODE or English whatever and you should have an option for Japanese=>Anthy.

Using scim-anthy you should be able to use Japanese in most applications. If you have trouble inputting Japanese in an xterm, if, for example, you're using vi or mutt, use uxterm. (This can be called by simply typing uxterm from any command line.)

At some point, I got into a habit of only calling these variables when I needed them, and made a little lang.sh script.

#!/bin/sh 
XMODIFIERS='@im=SCIM' LC_CTYPE=en_US.UTF-8 ${1+"$@"} &

Then, I might call mutt, for example, with

lang.sh mutt

Whether or not this saves on resources, it became a habit.

If you choose to do it this way, it's not necessary to have the XMODIFIERS and LC_CTYPE lines in .bash_profile.

The addvantage to this format is if you want to pass more than one argument, that is, put more than one command in. I don't want to make this a treatise on shell scripting, but if for example, I use rxvt or aterm as my terminal. I may want to call it from a fluxbox menu with a black background and white text. Its default is white background with black text.

For those with an interest in shell-scripting they can find a detailed explanation here. It was written by Cameron Simpson who has frequently helped me with scripting questions.

>I've found that I can usually get away with leaving out the GTK and QT IM_MODULES variables, although it's probably better to include them.

This can be a bit confusing as it varies from distro to distro and application to application. Sometimes, one doesn't even need the LC_CTYPE line, if your distro's default is en_US.UTF-8. One has to play with the variables and see what works for their distribution or O/S of choice, as well as their favorite applications.

If you wish your menus and the like to be in Japanese as well, you can add, either to the lang.sh script or your .bash_profile

LANG=ja_JP.UTF-8

Now, most applications will also work in Japanese--some things may show up as mojibake (gibberish) but you will be able to use Sylpheed, xchat (an irc client) etc in Japanese without problem. You'll also be able to input kanji as text in GIMP.

On rare occasions, I've found that the Ctl+space hotkey combination wouldn't open up the scim widget, although clicking on the icon that would come up when it was started would work. Scim creates a $HOME/.scim/config file which should have the line
/Hotkeys/FrontEnd/Trigger = Control+space

If that line is missing from the config file, add it. In some recent Fedora versions, the $HOME/.scim/directory was never created. In that case, one creates the directory and config file--the config file only needs that one line.

If using Gnome, and gnome-terminal, one can right click in the terminal, opening a menu that includes input method. One can choose scim from from that menu. If you didn't choose Japanese support at installation, you will also need fonts. You can try doing
yum search fonts-ja

which should give some choices. Once fonts are installed, there should be no problem. In CentOS, for example, one will find ttfonts-ja. In older versions of Fedora, I think it was ttfonts-japanese, or something similar. (For the soon to be out CentOS 6, use the oneliner given in the Fedora section.)

As of late November, 2008, one can install OpenOffice-3.x from the tarball on OpenOffice's site. However, for some reason, scim-anthy doesn't work with it. There were some posts about a similar problem with Ubuntu that had some suggestions, generally consisting of linking some library files from /usr/lib to library files in OpenOffice, however, neither myself nor another Japanese speaker on the CentOS forums has had success with that method.

The problem is solely with OO-3 and scim. Otherwise, although installing 3rd party software in CentOS does sometimes risk the distribution's noted stability, OpenOffice-3 works without a problem. Hopefully, it will be fixed shortly.

Fedora

(Last update, January 2011) Fedora now uses ibus as its default input manager, though scim may still be available. Install ibus with yum.
yum -y install ibus-anthy

This should pull in all dependencies. However, note that as mentioned above in the CentOS section, if you are running a more minimalistic environment, using, for example, fluxbox, it fails to pull in dbus-x11, so you will need to install that as well with yum -y install dbus-x11. You will also need fonts.
yum search fonts |grep -i japanese 

will give you the names of the available fonts. If you want to install them all, the following one liner will do it
yum search fonts |grep -i japanese;awk '{print $1}' |xargs yum -y
install 

Once installed run
ibus-setup

You will get a message that the ibus daemon isn't started, do you want to start it. Choose yes.
When the dialog box opens, click the input method tab. Click select input method and scroll down to Japanese. Pick anthy if more than one method is listed.

Click add. Anthy will appear. If English is also listed, and Anthy is below English, click Anthy to highlight it and click the move up button so that it's above English.

It will give instructions that if it doesn't work, add the following to .bashrc
export XMODIFIERS=@im=ibus
export GTK_IM_MODULE=ibus
export QT_IM_MODULE=ibus

However, if added to .bashrc, it will be sourced whenever you open a new terminal. Adding it to $HOME/.bash_profile should be sufficient. (If, like me, you boot into text mode, then just add it to $HOME/.xinitrc, so that it's called whenever you start X.)

At this point, you should be able to input Japanese in most applications. If it doesn't work, then kill ibus and restart it with --xim.
pkill ibus
ibus-daemon --xim &

As I mentioned in the CentOS section, I prefer to call the variables only when I need them, so I have a little $HOME/bin/lang.sh script. In this case, I use ibus rather than scim for it.
XMODIFIERS=@im=ibus LC_CTYPE=en_US.UTF-8 ${1+"$@"} &

If necessary, I can add the GTK_IM_MODULE and QT_IM_MODULE lines mentioned above, but in practice this seldom seems necessary. Now, to use mutt I can just type
lang.sh mutt

and mutt will be started with the XMODIFIERS and LC_CTYPE variables set. Although not fully tested, I am finding, that as ibus improves, it may not even be necessary to set the variables, so long as ibus is running.

I don't use Gnome or KDE, but according to the forums, you may be able to accomplish all of this by just going to the Gnome Systems Menu, Personal, Preferences, Input method.

In Fedora, (and RHEL6), if you use openbox (though this isn't true for Fluxbox) it seems that ibus automatically starts running when you start X. The reason for this is that Openbox uses xdg, and in /etc/xdg/autostart, there is, at least in Fedora and Scientific Linux 6 alpha, an imsettings-start.desktop. Near the end of the file is an Exec line that will start ibus.

I don't necessarily consider this a good thing. It strikes me as Windows like, starting a program without me specifically asking for it. For what it's worth, I haven't seen Debian or Arch do this. With either of those, using Openbox, either add ibus to autostart.sh or start it manually.

I concede that the vast majority of users probably want ibus to start when they start X, and therefore consider this a feature, but to me it's one more example of how many of the desktop distributions, and RedHat, despite its use in the server market, often seems to have a desktop distribution mentality, become more and more like Windows and Apple, in the bad sense.

Gentoo Linux and Japanese

(Last update, June 2006)

I used to have an entire section on Gentoo. However, as I don't use it these days, when updating this page, a bit of research indicated that my method was entirely deprecated. Aside from the link to David's page at the top, the reader can also check this thread on Gentoo Forums.

Kevin W. (AKA sandcrawler on Gentoo Forums) was kind enough to send me his mini Gentoo howto.

He added the following USE variables

immqt-bc nls cjk unicode

Then

emerge --newuse world

Emerge the necessary programs

emerge scim anthy scim-anthy scim-qtimm

He added the following to his .bash_profile

export XMODIFIERS='@im=SCIM'
export GTK_IM_MODULE="scim"
export QT_IM_MODULE="scim"
export LC_CTYPE=ja_JP.UTF-8
scim -f socket -c socket -d

(If not booting into X, you might leave off the scim line and put it in .xinitrc or whatever file you use to start X.)

This enables him to input Japanese in most applications.

Ubuntu and Japanese

(Last tested with Lubuntu Raring Ringtail, April 2013). Although ibus may already be installed, I've found that with both Lubuntu and one quick test on Ubuntu, one first has to install Japanese language support. I'm using the Lubuntu menus here, see below for Ubuntu Unity..

From the start button go to Preferences, then Language Support. Click the Install/Remove Languages button and select Japanese. Click Apply Changes and it will download and install the necessary tools, including fonts.

At this point, if you're sure you always want ibus running when the system boots, you can change the Kyeboard input method system from default to ibus. (See below). At this point, one can run ibus-setup from the terminal or go back to Start, then Preferences and choose Keyboard Input Method. Either way, you will be asked if you want to start ibus. Choose yes, and when the dialog box opens, click the Input Method tab. Check the Customize active input methods box and then click Select an input method, choose Japanese and it should show Anthy. Choose it and then click Add. It should now show up in the input method box.

With Unity, I've never quite figured out the menu so I just open a terminal with ctl+alt+t and type ibus-setup. I'm sure it's easy, but I'm not a big Unity fan, so I've never even bothered to look. If I ever get around to finding it, I'll post it here, but it's just as easy to open a terminal and type ibus-setup. A little bit of googling indicates that at least some people go to Dash Home from the Unity launcher and type in Keyboard Input methods.

You may need to add the the following to .bash_profile. (If there is no $HOME/.bash_profile, create it. See the Fedora section for my comments about adding it to .bashrc, and why I prefer .bash_profile.) Lately, I haven't found it necessary, but if you do find that you need it
export XMODIFIERS=@im=ibus
export GTK_IM_MODULE=ibus
export QT_IM_MODULE=ibus

If you want it running at boot, go back to Start, then Preferences and once again, choose Language Support. There is a Keyboard input method system choice button. It's probably set to default, change it to IBus.

If you just want to run it on occasion, it can be started by again going from Start to Preferences to Keyboard Input Methods. At that point, it will ask you if you want to start the ibus daemon and if you do, you will then be able to enter Japanese with the usual ctl+space keyboard shortcut.

There are almost always updated tutorials on using Japanese in Ubuntu on the Ubuntu Forums and the reader is advised to use their search function. If all of the above is done, it should work without issue whenever you hit ctl+space in most applications.

ArchLinux

(Last update April 2010) ArchLinux has packages for ibus-anthy. The ibus pages mentions AUR, but that is no longer necessary. Install with with pacman
pacman -S ibus-anthy

Install fonts. I've always used the arphic fonts, I'm not sure if better ones are now available.
Pacman -S ttf-arphic-ukai ttf-arphic-uming

Once installed, setup is identical to the Fedora setup. Run ibus-setup, choose Anthy, and you should be good to go. I set the XMODIFIERS and various IM_MODULE variables in in .xinitrc, before the line calling your window manager. (I also use it to start ibus). For example, if your window manager is fluxbox

export XMODIFIERS=@im=ibus
export LC_CTYPE=en_US.UTF-8
ibus-daemon --xim
exec startfluxbox
There is a package for rxvt-unicode, which is what I usually use. If you install rxvt-unicode, it's called with the command urxvt.

Desired locales should be created using locale-gen. Open /etc/locale.gen and you will see a list of locales, commented out with a # sign. Uncomment the ones that you want, for example, en_US.UTF8 UTF-8, and ja_JP.UTF8 UTF-8. Then run
locale-gen

You should see a message that the desired locales were created.

Back when scim was the input manager of choice (and of course, to others, it still is), there was a thread on the ArchLinux forums started by someone who had better luck using uim instead of scim for Japanese input. For those who would prefer to use uim, the thread can be found here.

Installing scim and anthy from source.

If your distribution doesn't have a package for scim, anthy and scim-anthy, they can easily be installed from source.

Scim can be downloaded here, and anthy here. Note that the anthy link sends you to a download selection page. You want the latest version of anthy, not anthy-ss. At time of writing, it's 7900.

The scim-anthy source can be found here.

Once downloaded, untar and install the three programs. Install anthy first, then scim, and scim-anthy last. In each case, the commands are the same. The versions given in these examples are current at time of writing, change the command to fit the version you download.

tar -zxvf anthy-7900.tar.gz
cd anthy-7900
./configure --prefix=/usr && make && make install

Do the same for scim and scim-anthy in that order. Restart X and you should be able to call up scim input in any program by hitting ctrl+space.

You will also want Japanese fonts, especially if you are using Japanese in something like OpenOffice. Subsitute kochi truetype fonts can be found from download.sourceforge.jp. You want the package kochi-substitute-20030809.tar.bz2.

Download it and untar it.

tar -jxvf kochi-substitute-20030809.tar.bz2

This will create a kochi-substitute-20030809 directory. You will see the kochi-mincho and kochi-gothic substitute fonts. They have a .ttf ending.

Move the fonts to /usr/X11R6/lib/X11/fonts/TrueType or /usr/X11R6/lib/X11/fonts/TTF if there is no TrueType directory.

(These are the typical directories called by the FontPath section in /etc/X11/xorg.conf. Doublecheck your system's xorg.conf and if the FontPath is different than the above, use that path.)

Slackware and some Slackware based distributions

(Last updated November, 2006) Slackware worked without problem when I installed anthy, scim, and scim-anthy from source I also installed the kochi fonts.

However, with one of its offshoots, Vector, at first I would open, for example, an mlterm session. I hit ctl+space and the scim panel appeared. I then entered romaji, but rather than seeing hiragana, I saw dotted squares. If I typed correctly, and hit space (for example, typing nihongo and hitting space once), the word nihongo, in kanji, would appear, however, I didn't see this until I hit enter.

The scim faq indicates that this is because scim isn't finding the fonts it needs. I am not sure what packages were missing--however, choosing to install gimp during the initial installation fixed the problem. Afterwards, even if I deinstalled gimp, it would still work properly

Vector's default editor, like Slackware's, is elvis, which didn't work properly. I had to grab the Slackware package for vim and install it. I used a Slackware CD that I had, but if you don't have one, go to Slackware's package search site. I used the version from 10.1, which may change by time of writing.

As of November, 2006, Vector has a scim-anthy package, which pulls in scim and anthy. I didn't see a font package, and used those kochi fonts I've mentioned above, that I manually retrieved from sourceforge.

kinput2 with canna.

In some cases, the scim-anthy combination might not work or not be available for your distribution.

Two programming friends, Godwin Stewart and Stuart Bouyer (who has done a great deal of work on Japanese input packages for Gentoo Linux) made me a tarball of a modified kinput2 and canna installation. It is not perfect--when one starts cannaserver, you see the message Terminated. However, doing pgrep cannaserver shows that it is running and it works perfectly for me.

The tarball is available from qnd-guides.net.

Thanks to the generosity of the Tokyo Linux User Group it is also available on their site. To use it, first download and untar it. You will see two gzipped files there, one for Canna and one for kinput. Install canna first as the kinput file will be looking for it.

tar -jxvf vanillajpn.tar.bz2
tar -zxvf Canna36p1.tar.gz
cd Canna36p1
xmkmf
make Makefile
make canna
make install
make install.man

When done, you'll have a file /usr/sbin/cannaserver.

Now kinput2

tar -zxvf kinput2-v3.1-beta3.tar.gz
cd kinput2-v3.1.-beta3
xmkmf
make Makefiles
make depend
make 
make install

As every distribution has its own way to make a program run at startup, that is an exercise I will leave to the reader. For example, in Slackware, you can add a few lines to /etc/rc.d/rc.M. As I said, you will see, after starting /usr/sbin/cannaserver the word Terminated. However, it can be ignored.

You will need a terminal that can display Japanese. As mentioned above, one can use the builtin uxterm. The mlterm and rxvt-unicode programs also work.

Add these lines to your .xinitrc above the line that calls your window manager.

export XMODIFIERS='@im=kinput2'
export LC_CTYPE=ja_JP.UTF-8
kinput2 -canna &

This should enable you to input Japanese in most programs.

If you get an error similar to "Unable to set locale" that is often the reason, you have it as, for example, utf8 and the system is looking for UTF-8.

To sum up, most people consider the scim-anthy combination better than kinput2 and canna, and many consider ibus better still. If your distribution doesn't have packages for scim and anthy, you can download and install them, following the instructions given above. If they don't work for you, then use the kinput2 canna combination, using the vanilljpn tarball, for I have found that to work in almost every distribution that I have tried.

Despite there being over 400 Linux distributions, most of them seem to be based on RedHat, Debian or Slackware so the instructions above should work for almost every distribution.

FreeBSD

FreeBSD has a scim-anthy combination. If one installs the scim-anthy port it installs both scim and anthy.
cd /usr/ports/japanese/scim-anthy
make install clean

There is a package message, suggesting setting the LANG variable to ja_JP.eucJP. However, I haven't found this necessary.

In your .xinitrc file

export XMODIFIERS='@im=SCIM'
export GTK_IM_MODULE="scim"
export QT_IM_MODULE="scim"
export LC_CTYPE=en_US.UTF-8
scim -d

One will need a terminal capable of displaying unicode. There is the builtin uxterm, mlterm and rxvt-unicode. One oddity I have found is that if I try to type Japanese directly into one of these terminals, it may not display correctly. However, if one tries to cat a text file written in Japanese, it will display the file correctly.

FreeBSD's vi is nvi. I haven't gotten this working properly with Japanese, so I install /usr/ports/editors/vim-lite. One can create an alias by editing their shell's rc file. For example, I use zsh, so in my $HOME/.zshrc file I have

alias vi=vim

For OpenOffice and the like, I need Japanese fonts. I use the the substitute kochi fonts in /usr/ports/japanese/kochi-ttfonts.

NetBSD

(Last updated November 2006) NetBSD doesn't yet have scim or scim-anthy in pkgsrc. However, scim and scim-anthy are available in their Work In Progress collection. On my machine, the scim-anthy package failed to build, due to unable to allocate memory error in gcc, however, googling indicated that adding)
UNLIMIT_RESOURCES=	datasize

to the scim-anthy Makefile would fix the problem. I tried that solution and it worked.

If you want to stick with pkgsrc they do have anthy and uim. To use that combination
cd /usr/pkgsrc/inputmethod/anthy
make install clean; make clean-depends
cd /usr/pkgsrc/inputmethod/uim
make PKG_OPTIONS.uim="-canna" install clean; make
clean-depends

(This will install uim with anthy and gtk).

Add the following to your .xinitrc above the line calling your window manager.
export XMODIFIERS=@im=uim
uim-xim --engine=anthy &
After starting X, you can then enter Japanese text by hitting shift+space. You turn off Japanese input in the same manner, hitting shift+space again.

Although NetBSD 3.x has en_US.UTF-8 as a locale, they don't have ja_JP.UTF-8. I've had mixed results using en_US.UTF-8 as my LC_CTYPE. Sometimes, if you set LC_CTYPE to en_US.UTF-8 though you can input Japanese, after hitting enter, all that appears are blank squares. You can create a ja_JP.UTF-8 locale by downloading en_US.UTF-8.src from
ftp://ftp.netbsd.org/pub/NetBSD/NetBSD-current/src/share/locale/ctype/en_US.UTF-8.src

and then using mklocale. If you downloaded it into your home directory, as root or with root privilege (assuming your user name is joe)
cd /usr/share/locale
mklocale < /home/joe/en_US.UTF-8.src > ja_JP.UTF-8

Hopefully, then a locale -a | grep ja_JP will show ja_JP.UTF-8.

I've found this was necessary to get UTF-8 working with, for example, thunderbird, though it wasn't necessary to input Japanese in a terminal.

In NetBSD, I've never gotten mlterm working properly, so I use rxvt-unicode. I haven't researched this deeply, but the unicode fonts in urxvt aren't as clean as the fonts used by, say, rxvt with eucJP encoding. The choice is up to the reader.

For eucJP encoding, I use either rxvt or mrxvt. If you use rxvt, after it's installed, there is a message telling you that double-byte encoding is disabled by default. You then have to edit /usr/pkg/lib/X11/app-defaults/Rxvt. You will see several lines marked !Rxvt.multichar_enoding
One of them has eucj at the end of it. Take out the ! at the beginning of the line. (Also, put a ! at the beginning of the top line, which ends with noenc).
If you use mrxvt then edit /usr/pkgsrc/x11/mrxvt/Makefile. You will see a section of CONFIGURE_ARGS+= enabling xft, text-shadow and the like. Add
CONFIGURE_ARGS+=	--enable-xim
CONFIGURE_ARGS+=	--enable-cjk
CONFIGURE_ARGS+=	--with-encoding=eucj

You may, of course, choose to use kinput2 and canna with NetBSD. If so
cd /usr/pkgsrc/inputmethod
cd canna; make install clean
cd ../kinput2 make PKG_OPTIONS.kinput2="-wnn4 -sj3" install clean

When done, you'll have a /usr/pkg/sbin/cannaserver as well as kinput2. Cannaserver should be started as daemon upon the next reboot. (You'll see that it also provides a script in /usr/pkg/local/rc.d)
Once again, set your variables in your .xinitrc.

export XMODIFIERS='@im=kinput2'
LC_CTYPE=ja_JP.eucJP
kinput2 -canna &
Like FreeBSD, I've found that I have to use vim rather than vi.

You may get an error when trying to start kinput2. It will say it can't load the app-defaults file and that XFILESEARCHPATH might be set incorrectly. This can also be added to .xinitrc however, be sure to add it ABOVE the kinput2 -canna & line.

export XFILESEARCHPATH=/usr/pkg/lib/X11/app-defaults/Kinput2

DragonFlyBSD

(Last updated June 2006) DragonFlyBSD now uses NetBSD's pkgsrc collection for third party software. The instructions above, for NetBSD, also work with DragonFly. Just use bmake instead of make when installing the various packages. DragonFly does have ja_JP.UTF-8 installed by default, so the reader can ignore the part about using mklocale.

A Digression about Terminals and UTF.8

Not all people need or want Japanese in an xterm. However many do. For example, I use mutt, so to use Japanese in an email, I need a terminal capable of handling Japanese. Others only use it in things like OpenOffice and Firefox.

Although most browsers can read Japanese encoding, you might have to manually select it. In opera, firefox and mozilla, it's in View => encodings. Although there is an autoselect for Japanese, it doesn't always work. If you get a page in Japanese that seems to be mojibake, then try different encodings, including Unicode (which isn't in the Japanese section) and one should work.

Dark Prince from bsdnexus.com forums was kind enough to send the following. If you are creating a web page with Japanese in UTF-8, this code should make the viewer's browser use UTF-8 on the page. He tested this on apache, but it should work with any server that can use php. At the top of the page put

<?php header('Content-Type: text/html; charset=UTF-8'); ?>

Again, this will only work if your server has php enabled. Many ISP provided web pages don't support php.

Then, code that tells the browser to read UTF-8. (Dark Prince says this may not be necessary, but it probably can't hurt.) This code should be between the <head> </head> tags

<meta HTTP-EQUIV="content-type" CONTENT="text/html;charset=UTF-8">

Having the meta tags will not be sufficient to make the viewer's browser use UTF-8, the php code is necessary. However, if the page is in straight html, the meta HTTP-EQUIV tags should be enough. (Martin Swift was kind enough to point out that I'd neglected them myself, causing some of the special characters on the page to display incorrectly. I've fixed it since. Thank you Martin.) :) p>

Lately, I've been playing with mrxvt. Again, there is no unicode support. If building from source, one needs to configure it as follows (in addition to any options you choose)

./configure --enable-xim --enable-cjk --with-encoding=eucj

If you are using FreeBSD, a patch I submitted has been accepted to add EUC input. When installing the port simply type

make -DWITH_JAPANESE install clean
Lately, it seems as if mlterm has become the most popular of the multilanguage terminals. We have a quick and dirty guide to it here.

Note that the guide suggests setting mlterm's font size to 14. I've found that if I use the default size of 16, if I set LC_CTYPE to ja_JP.UTF-8, the terminal becomes overly large. (I found this happened even if I set LC_CTYPE to en_US.UTF-8.) This doesn't seem to be distro or window manager specific. The QND guide mentioned above discusses setting mlterm with a transparent background. This is a matter of preference. Josh, who wrote the guide, has younger eyes than I do, but I prefer a gray background with black type. One can set the background at the command line, in .Xdefaults, or do as Josh suggests, creating a directory in your home directory called .mlterm and a file in that directory called main.

One note for others with aging eyes. Recently, it seems to me that urxvt's fonts have gotten smaller. I've found that setting the font size in .Xdefaults helps. I have this entry
urxvt*font: a14

Another thing that I've found with rxvt-unicode, specifically on CentOS is that installing the package doesn't create a termcap entry. This can cause problems with several programs, such as w3m, and even man. The solution is to run the "tic" command on the terminfo file. In /usr/share/doc/rxvt-unicode-<version-number>/etc you will find an rxvt-unicode.terminfo file. Change into that directory and run
tic rxvt-unicode.termcap

and it usually fixes the problem, even without moving the file anywhere. If you are still getting an error such as can't find termcap entry, then, after running tic, copy the file over to /usr/share/terminfo/r As a side note, if running urxvt on one machine, and you ssh into another machine that doesn't have it installed, and get errors, either something like no mention of urxvt in termcap or even WARNING: terminal is not fully functional, you can always temporarily fix it by typing
TERM=xterm

To get a list of available fonts one can type xlsfonts at a command prompt.

Using Putty from Windows

Much of this was taken from this page from umiacs.umd.edu.

If you are using Putty to open an ssh session, you can still view Japanese encodings. On the Windows machine, you will have to install Asian Language Support. (Control Panel, Regional Settings or Regional and Language Settings.) You will probably need the Windows installation CD for this. You should also choose the option to Install files for complex script and right to left languages. (The link given above has several screenshots.). Windows will suggest you reboot after installation. Do so.

Open your putty session, right click on the title bar, and choose Change Settings from from the menu.
Go to Window, Appearance. Click the Change button in the Font settings section.
Choose MS Gothic or MS Mincho and Japanese as the script.
Go to the Translation section. In the dropdown box at top, choose UTF-8. Also check the box that says Treat ambiguous CJK characters as wide.

Once this is done, you should be able to view Japanese text in a putty ssh session from a Windows machine.

Using ibus

As mentioned throughout, ibus is becoming the input manager of choice in many distributions. Some of the advantages, according to a posting on the Fedora testing list:

Ibus has been rewritten in C. Scim written in C++ using STL has problems with weak symbol conflicts without the added complexity and lower stability of the scim-bridge layer to workaround that.

* It is possible to write client and engines for ibus in any language that supports dbus bindings.

* ibus loads engines on demand rather than all installed engines as scim does, which improves the startup tim scim loads engines as dl-modules so a problem in any engine can take down scim, whereas in ibus because the processes are separated only a faulty process will die leaving rest of the system working normally.

* The architecture of ibus is bus-centric and so much closer to the CJK OSS Forum Workgroup 3 draft "Specification of IM engine Service Provide and memory footprint.

It works quite well in most GTK applications that I've tried, such as firefox and gnome-terminal. It also works beautifully with openoffice. As time has passed, it also seems to work in just about everything else as well.

As I use fluxbox or openbox rather than gnome, rather than use im-chooser, I will start it with
ibus-daemon &

This used to work perfectly, however, updates (as of June, 2009) seem to have changed it slightly. I'm not sure if this is a permanent change or not.

It would still work this way with any GTK application, such as gnome-terminal or firefox, and openoffice. However, it wouldn't work with my two most used xterminals, uxterm and urxvt.

To get it to work in those as well, I had to change this to
ibus-daemon --xim

I prefer to start it from command line. However, experimentation indicated that in Fedora, if I start it by using im-chooser (by typing im-chooser at the command line) this will also get it to work in everything. Although, after selecting it in im-chooser, it will give a message that you'll have to log out and log back in, and until then it will only work in GTK apps, it seems to work in everything as long as I enable it.

As mentioned in some of the distribution specific sections, the usual variables to be set are
GTK_IM_MODULE=ibus
QT_IM_MODULE=ibus
XMODIFIERS=@im=ibus

LC_CTYPE can be set for your native language, ja_JP.UTF-8, or any other UTF-8 locale of your choice. Before using it the first time, run the ibus-setup program which will allow you to add anthy, as described in the distro sections.

Printing

(Last updated August, 2008) Several applications will translate a file to postscript level 2 (or possibly higher). Acroread, xpdf, OpenOffice, mozilla, firefox and seamonkey will all do this. With such applications, assuming you can already print from them, no further work is necessary and Japanese will print out as written.

Printing in *nix, can be non-trivial in itself. CUPS is making it easier when it works--when it doesn't work, one finds that they spend a lot of time searching google to find many people with the same error messages and few solutions. I have a few simple CUPS solutions on another page.

Depending upon distribution, installing OpenOffice can be a major undertaking. FreeBSD for example, has the development version as a port that requires over 9 gigs of free space to compile. Building the port can take 6-8 hours on a reasonably fast machine.

In OpenOffice, be sure to enable Japanese support under Tools, Options, Language Support, Languages.

To print Japanese one needs the fonts (I use the kochi fonts mentioned above). Once this is done, you can use spadamin to add the fonts. In FreeBSD, they'll be in /usr/X11R6/lib/X11/fonts/TrueType. In some distributions, the path is the same save that it's called truetype. You may have to be root or have root privilege to run spadmin.

In FreeBSD, at least, rather than using spadmin, I just either copy or symlink the fonts to /usr/local/openoffice(version)/share/fonts/truetype. without the fonts, you will be able to input Japanese in OpenOffice, but it won't print correctly.

One can use openoffice, firefox or seamonkey (as well as any other browser that does the postscript conversion for you) to print Japanese text files. Open the textfile in firefox, for example, and then print it.

Recently, looking through a page about UTF-8 I came across another solution for printing textfiles. The author mentions using the openoffice command with the -p option. For example, in FreeBSD, OpenOffice is called with openoffice.org. Suppose I have a text file in Japanese, called nihongo.txt
openoffice.org -p nihongo.txt

will print the text file. (Note that the openoffice command will vary between Linux distributions as well as the BSD's. Many distros use the command ooffice, others use soffice and no doubt other distros use something else.) This can be used with both UTF-8 and EUC encoded textfiles. This is simpler than using OpenOffice, as it will correctly print the file without having to open the OpenOffice application.

The author also mentions the paps program. It converts UTF-8 files to postscript. The mentions of FreeBSD below also apply to NetBSD.

It's available as a package in most distributions, and as a port in FreeBSD. (/usr/ports/print/paps).

It didn't work for me with the very basic cups laserjet or deskjet ppds. I also needed the hpijs program, which provides more specific drivers for various HP printers.

Most distributions (and FreeBSD) have hpijs available as a package. If not, you can go to the HPLIP home page and install from source. The HPLIP package includes hpijs package.

Once the package is installed, you can modify your printer, using the cups web interface or lpadmin command to change your printer's ppd from the generic deskjet or laserjet to the hpijs driver for your printer.

Make sure you have some Japanese fonts. Sometimes, even though scim will input Japanese perfectly, if you don't have some specific Japanese fonts paps won't print correctly. If your distribution doesn't have fonts available, use the kochi substitute fonts from sourceforge.

To use paps to print a textfile called nihongo.txt
paps nihongo.txt | lp 

I use mutt, and paps works well with it. With mutt, hitting the pipe key, |, will, obviously enough, pipe the email to another command. If I want to print a Japanese email I would use
|paps|lp

which sends the mail to paps and from there to the lp command.

Evolution and Thunderbird do the postscript conversion on their own--in other words, they will print Japanese emails as easily as firefox prints a Japanese web page.

Sylpheed needs the print command set to feed the email to paps. Changing the standard lpr %s that they use for printing to
paps %s | lp

will enable Japanese emails to print. (This can be found in Configuration, Details, External Commands).

As JWS mentions in his page, if the file uses a different encoding, such as EUC, one can use iconv to first convert the file to UTF-8, then use paps. Using iconv with the -l (a lower case L, as in list) gives the program's naming conventions. For instance EUC encoding can be specified as EUC-JP or EUCJP (as well as typing the complete EXTENDED_UNIXCODE_PACKED_FORMAT_FOR_JAPANESE, but I doubt anyone would want to type that.)

The syntax is quite simple, one uses -f as in from -t as in to and the file name. So, let's say we wanted to print a textfile called nihongo_euc.txt. (This is in FreeBSD, where the iconv -l shows supported encodings in upper case. I'm not sure about Linux, it may be lower case).
iconv -f EUCJP -t UTF-8 nihongo_euc.txt | paps | lp

For me at least, paps was one of the final pieces that makes Japanese almost as easy to use in Linux and some of the BSDs as it is in Windows. (I think Mac stil has the edge as far as ease of Japanese use.)

Speaking of Mac, now that Apple bought CUPS, all of the above may become unnecessary. On Fedora starting from Fedora 8, and CentOS 5.2, running cups-1.3.4, if I run lp nihon.txt it correctly prints the Japanese text. However, this isn't the case on Ubuntu. So, whether this was something in the RH versions of CUPS, or somewhere else, I don't know.

One other minor issue that I had was with #!Crunchbang, a distribution based on Ubuntu that uses Openbox as its default window manager. Using paps nihon.txt|lp wasn't working. In that case, I had to specify paper size. The default is A4, and my printer uses the US letter size paper. If I just ran paps nihon.txt|lp, it would run through the printer and the cups logs showed everything working, but nothing would print. However, as long as I remember to specifiy paper size, using
paps --paper=letter nihon.txt|lp

it worked as it should.

Romaji

While only indirectly related, there are times when when one wishes to use special characters when typing romaji. When writing for people studying Japanese, my own tendency is to imitate hiragana, and write, for example, juudou for the martial art. However, for people with no knowledge of the language, this can be somewhat confusing. Most systems that have locales will also have a Compose file for that locale. For example, in FreeBSD, there is a file /usr/local/lib/X11/locale/en_US.UTF-8/Compose. It will probably be elsewhere on a Linux system. That file will have several entries for special characters. Mine has things like
<Multi_key> <underscore> <a>  : "ā"   U0101 # LATIN SMALL
LETTER A WITH MACRON

(If that didn't look like a small letter a with a line over it, then your browser is probably using a different encoding. Go to View=>Encodings and choose UTF-8 and you should see (among other things) a lower case "a" with a line over it.)

This means that if I use the Multi or Compose key, then hit underscore, then a, I will get an a with a line over it. The problem is setting the compose key.

This can be done globally (for all users) by making an entry in /etc/X11/xorg.conf. For example, to use the right Windows key one could add
   Option "XkbOptions"  "compose:rwin"

(Finding the right name can be tricky as it varies on different systems. In FreeBSD, one can find the name in /usr/local/share/X11/xkb/keycodes/xfree86. Usually, you're looking for xkb/keycodes/xfree86.). Another way is to use the xev program, find the numeric code and add it to xmodmap. For example, if I want to use the menu key (usually to the right of the right hand Windows key) I run the xev program from an xterm which opens up a little box. (The xev program is usually installed by default, if not, it's readily available for almost all systems. In FreeBSD it's in /usr/ports/x11/xev.)

In the box, hit a key. In the terminal you used to call xev you'll see various things including (if we hit the right Windows key) keycode 116. Now, we create (if it doesn't exist) a $HOME/.xmodmaprc file. In it we add
keycode 117 = Multi_key

Next type
xmodmap ~/.xmodmaprc

If you now hit the right hand Windows key, then type _a, you'll see ā. So, I can use this to write, for example, jūdō.

Special thanks to Dr. Mike Fabian for all his help, as well as several other members of the Tokyo Linux Users Group (tlug).