coptic cross
Moheb's Coptic Pages

glibc support for Coptic


The glibc library defines the locales (internationalization files) in your Linux operating systems. A lot of applications depend on it. It defines a set of attributes for every different country or region, like the language character set, the collation (sequence) of characters, currency, date format,...

Since there is no "Coptic" territory, it does not make sense to define a dedicated locale for Coptic. It would be even absolutely sufficient to extend your current locale by few more capabilities, so that at least the Coptic characters are defined.

Try to type: locale, the output should be something like this:

LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=

If you have the extension UTF-8 at the end of you current locale parameters, then your current locale actually knows UTF-8, and there is almost nothing you have to do. I say almost, because the UTF definition will probably be lacking the Coptic range, since it is new (introduced in Unicode 4.1.0). In other words: if you wait till a new version of glibc is available, you will probably have to do nothing. But if you are inpatient, read few lines later.

What about, if your current locale does not have the UTF-8 extension? Try to list all available locales by typing:
locale -a
If there is no single locale ending with .utf8 then you should consider updating your glibc (which is really very critical because of the dependencies with other applications). Maybe it would be more convenient to update you whole distribution!

Otherwise, you need to consider modifying the following files:
  • /usr/share/i18n/locales/i18n
  • /usr/share/i18n/charmaps/UTF-8
optionally
  • /usr/share/i18n/locales/i18n/iso14651_t1

Updating the i18n file can be systematically done using the utility "gen-unicode-ctype", which is included in the tar ball of the glibc library. You can also get it directly here. Compile it with "gcc gen-unicode-ctype.c -o gen-unicode-ctype" then download the latest Unicode definition file (UnicodeData.txt) from the Unicode.org server. Generate a new version of i18n with: "./gen-unicode-ctype UnicodeData.txt", rename the output file to i18n and copy it to the directory /usr/share/i18n/locales/. You can also get the version, which I generated that way here.

Updating the file UTF-8 requires more "hand work". I have prepared a version that only adds the Coptic range, you can get it here.

So far I did not update my iso14651_t1 file.

After updating these files, you have to compile your current locale to reflect these changes, if for example you current locale is en_US.UTF-8, then as root type:

localedef --charmap=UTF-8 --inputfile=en_US en_US.utf8

Make sure that the subdirectory en_US.utf8 under the directory: /usr/lib/locale is now updated.

If you would like to test, if your modified locale now works, try to compile and run this test code. It tests the conversion of the upper case Coptic character alpha to lower case. It should output: 0x2c81



last updated: 31.05.2006
Moheb Mekhaiel