trans130 Introduction

trans Character Encoding Converter Generator Package

This is a snapshot of a work in progress. Things may change, other things may be added/removed. Please wait for the release. Sorry, no date scheduled, yet.

Currently there are 80 different Character Encoding Description Files supplied with this package, not counting the following files:

iso6429, iso646 (which are included by many other files)
iso10646, iso10646.mes (these are here for reference purposes)

trans covers 7-bit encodings such as ISO 646, 8-bit encodings such as many MS-DOS Codepages (also for IBM OS/2), Microsoft Windows Codepages, ISO 8859, HP, Adobe, Apple Macintosh, Atari, NeXTSTEP Character Encodings, some EBCDIC Encodings, koi8-r and a few more...

Should your favourite Character Encoding be missing, please contribute!

Where to get updates

The latest version of this package should be available at:

How to create a Character Encoding Converter

To create translators, use make to compile, link and install all trans tools, first. Installing means moving the executables into a directory included in your command search path. In order to do that, check Makefile which uses "/usr/local/bin" as the default directory for installing binaries which you probably want to change.

Example: U*IX (e. g. Linux)

# use gcc
make # compile trans executables

Makefile offers a couple of options which you may want to use:

make install # this copies executables to /usr/local/bin
make clean # this deletes objects and executables - never mind warnings

make check # check cedf files (create error.log)

make html # create HTML tables from cedf files (check destination in Makefile)
make list # create list of cedf files (create encoding.lis)

make date # for my personal use only ;)
make pack # for my personal use only ;)
make uni # for my personal use only :) (unierror.lis)

SunOS using gcc seems to require


After that, please change your working directory to the "bin" directory.

Please set an enviroment variable TRANS that points to the directory where this package resides on your computer *including* the trailing directory separator character

e. g.: TRANS="/usr/local/src/charsets/trans130/"

All Character Encoding Description Files reside in the cedf subdir.

If you don't set a variable TRANS the default location "/usr/local/lib/trans/" will be assumed (see file "tab.h", DIR_TRANS).

To test the translator generator after have done at least "make" and "make install", type

cd "$TRANS"bin

This should generate two translators between ISO 8859-1 and MS-DOS Codepage 850. Each translator consists of three files (e.g.):

isox850.c   the main program
isox850.h   the header file   the translation table file

Each translator will #include the files

trans.c   the main invariant program
trans.h   the main invariant header file

You should be able to compile and link isox850.c and 850xiso.c easily. Read to learn more about the syntax for transtab.

Have a look at maketabs respectively to get an inspiration for program names.

This package is written in ANSI-C using the two non-ANSI functions strdup () and strupr (). Sources for these functions are supplied should your compiler/library not contain them. Should you encounter any problems while trying to compile this package, your compiler is very likely not ANSI-C compliant. Should your compiler be ANSI-C compliant and still report warnings and/or errors, please let me know. I'll need the following data in order to to help you:

Directory tree for this package

The directory tree for this utility should look like this:

directory   file   description
./       contains the complete package
    index.htm   this file
    Makefile   sample makefile for U*IX using gcc
    encoding.lis   list of Character Encoding Description Files
    error.log   output created by checkall
    unierror.log   diffs between cedf and selected Unicode files
src/       contains the translation table generator source
    Makefile   makefile for gcc (eg. Linux)
    comptran.c   compute translation table and output
    comptran.h   header file for comptran.c
    datatype.h   handy data types
    gettrans.c   get TRANS directory
    gettrans.h   header file for gettrans.c
    head_c.h   generic translator main program
    head_h.h   generic translator header file
    head_tab.h   generic translator table file header
    head_u.h   generic translator Unicode FormatA file header
    loadtab.c   read xlt binary table and Unicode FormatA
    loadtab.h   header file for loadtab.c
    os-stuff.h   OS/compiler dependent definitions
    readtab.c   read character encoding description file
    readtab.h   header file for readtab.c
    scanflag.c   parse program parameters and flags
    scanflag.h   header file for scanflag.c
    strdup.c   in case your compiler doesn't have it
    strdup.h   header file for strdup.c
    strupr.c   in case your compiler doesn't have it
    strupr.h   header file for strupr.c
    tab.h   table constants
    taberr.h   trans error codes and messages
    checkiso.c   checks character encoding description names
    checkiso.h   header file for above program   man page for above program
    checkuni.c   compares cedf file with Unicode Format A table
    checkuni.h   header file for above program   man page for above program - for internal use
    transiso.c   translator generator to ISO 10646 main program
    transiso.h   header file for above program   man page for above program
    transtab.c   translator generator main program
    transtab.h   header file for above program   man page for above program
    transce8.c   translator program (8-bit) main program
    transce8.h   header file for above program   man page for above program
    transhtm.c   program that displays HTML tables
    transhtm.h   header file for above program   man page for above program
    checkall   check all tables
    chkuni   for internal use only
    mklist   create list of all tables
    mkhtml   create HTML table (mkxlt may be required before running this one)
    mkxlt   create XLT files (binary translation files)
bin/       contains the translator main program (invariant part) and a few scripts to create translators
    compile   compile one program
    makeall   compile all programs
    maketabs   create many translator sources
    one   create one translator
    trans.c   invariant main translator program
    trans.h   invariant main translator header file
    utf.c   convert from/to plain 16-bit Unicode/UTF
    utf.h   header for utf.c
    utimbuf.h   helps to keep file date stamps
htm/       contains information in HTML format about the description files and other more general information
cedf/       contains Character Encoding Description Files
    adobeiso   Adobe ISOLatin1Encoding Encoding Vector
    adobestd   Adobe StandardEncoding Encoding Vector
    adobesym   Adobe Symbol Encoding Vector
    applecro   Apple Macintosh Croatian
    applegk2   Apple ][ Greek extended for Macintosh
    applegrk   Apple Macintosh Greek
    appleice   Apple Macintosh Icelandic
    applerom   Apple Macintosh Roman
    applerum   Apple Macintosh Romanian
    appletur   Apple Macintosh Turkish
    atarist   Atari ST/TT
    cp1250   Microsoft Windows Codepage 1250 (EE)
    cp1251   Microsoft Windows Codepage 1251 (Cyrl)
    cp1252   Microsoft Windows Codepage 1252 (ANSI)
    cp1253   Microsoft Windows Codepage 1253 (Greek)
    cp1254   Microsoft Windows Codepage 1254 (Turk)
    cp1255   Microsoft Windows Codepage 1255 (Hebr)
    cp1256   Microsoft Windows Codepage 1256 (Arab)
    cp1257   Microsoft Windows Codepage 1256 (BaltRim)
    cp1258   Microsoft Windows Codepage 1256 (Viet)
    mslinedr   Microsoft Windows MS LineDraw
    symbol   Microsoft Windows Symbol Encoding Vector
    wingding   Microsoft Windows Wingdings Encoding Vector
    cp437   IBM Codepage 437 (US)
    cp737   IBM Codepage 737 (Greek defacto Standard)
    cp775   IBM Codepage 775 (BaltRim)
    cp850   IBM Codepage 850 (Multilingual Latin 1)
    cp851   IBM Codepage 851 (Greece) - obsolete
    cp852   IBM Codepage 852 (Multilingual Latin 2)
    cp853   IBM Codepage 853 (Multilingual Latin 3)
    cp855   IBM Codepage 855 (Russia) - obsolete
    cp857   IBM Codepage 857 (Multilingual Latin 5)
    cp860   IBM Codepage 860 (Portugal)
    cp861   IBM Codepage 861 (Iceland)
    cp862   IBM Codepage 862 (Israel)
    cp863   IBM Codepage 863 (Canada (French))
    cp864   IBM Codepage 864 (Arabic)
    cp865   IBM Codepage 865 (Norway)
    cp866   IBM Codepage 866 (Russia)
    cp869   IBM Codepage 869 (Greece)
    cp874   IBM Codepage 874 (Thai)
    cp895   IBM Codepage 895 (Czech Kamenicky)
    decmcs   DEC Multinational Character Set (DEC MCS)
    ebc037   EBCDIC Codepage 037
    ebc500   EBCDIC Codepage 500
    ebc875   EBCDIC Codepage 875 (Greek)
    ebc1026   EBCDIC Codepage 1026 (Turkish)
    ebc1047   EBCDIC Codepage 1047
    hp48   HP 48 Character Set
    hproman8   HP Roman-8
    iso10646   ISO 10646 (sorted by name, 16-bit)
    iso6429   ISO 6429 Control Characters (00-1F, 7F)
    iso646   ISO 646 (common character base)
       ISO 646 (French Canadian)   ISO 646 (Swiss)   ISO 646 (German)   ISO 646 (Spanish)   ISO 646 (Finnish)   ISO 646 (French)   ISO 646 (United Kingdom)
    iso646.irv   ISO 646 (International Reference Version)   ISO 646 (Italian)   ISO 646 (Dutch)   ISO 646 (Norwegian/Danish)   ISO 646 (Portuguese)   ISO 646 (Swedish)
    iso8859.1   ISO 8859-1 (Latin 1)
    iso8859.2   ISO 8859-2 (Latin 2)
    iso8859.3   ISO 8859-3 (Latin 3)
    iso8859.4   ISO 8859-4 (Latin 4)
    iso8859.5   ISO 8859-5 (Latin/Cyrillic)
    iso8859.6   ISO 8859-6 (Latin/Arabic)
    iso8859.7   ISO 8859-7 (Latin/Greek)
    iso8859.8   ISO 8859-8 (Latin/Hebrew)
    iso8859.9   ISO 8859-9 (Latin 5)
    iso8859.10   ISO 8859-10 (Latin 6)
    iso8859.13   ISO 8859-13 (Latin 7 - Baltic Rim)
    iso8859.14   ISO 8859-14 (Latin 8 - Celtic)
    iso8859.15   ISO 8859-15 (Latin 9)
    koi8-r   Cyrillic encoding as defined in RFC-1489
    nextstep   NeXTSTEP Encoding Vector
       TeX dcr input (contains non-ISO 10646 names)
    tex-dcr.out   TeX dcr output (contains non-ISO 10646 names)
xlt/       contains binary conversion tables (default is little endian)
        all files listed in cedf/ should be here, except for iso6429, iso646, iso10646, iso10646.mes

Should you not have a "little endian" CPU (Intel i386, i486, Pentium and many other brands), please do a "make bintab" to create the very same tables using your native byte order. This will most likely only work on U*IX (like) systems.