Russian (original and up to date) version here Mirror (Newmail) Mirror (ROL)

Languages, fonts and encodings as Unicode incarnation


Last update: 03-feb-2001. Konstantin Kazarnovsky Forum/Guestbook

 

3. Fonts and Unicode applications



Office-97

The main feature of files and philosophy of Office-97, first of all Winword-97, - almost full transition to the coding Unicode, i.e. and in the documents Word-97 a text part represented in Unicode, and during fulfilment of the program - input and text processing - is used (here, certainly, the clause), on all probability is necessary, it is exclusively Unicode. However, as well as in the previous versions Word, for normal work with language it is necessary, that in Word'Е there was it a support, irrespective of Unicodeness of the font.

Internet Explorer 4

  Internet Explorer 4, including majority it of components: browser MSIE, mail&news Outlook Express, in some smaller degree HTML-editor Frontpage Express, uses Unicode as the main encoding for internal purposes and in input windows. As described in a number of articles in Microsoft Knowledge Base, in MSIE is realized the concept of "multilingual object" is realized: class and functions for working with it are defined. All this allows to support practically anyone encoding in the Internet, including Unicode-based (UTF-7 and UTF-8), and for an rendering of characters of language (character set), which is not supported by default font, is selected the font, where this language present. Such font substitution works for those Unicode's, which can be associated to the certain character set (problem similar to a problem of recognition accessible character sets of the font), and this substitution will not made, if the character set in the font present, but some characters from this set are absent - then they are represented by small squares.

Internet Explorer 5 beta 2

The multilingual support in MSIE5 has even more extended, and now it includes as standard option to installation, apart paneuropean language pack, also Hebrew, Arabic, Thai, Vietnamese, Japanese, Korean and Chinese. The label for the Ukrainian coding became standard KOI8-U. The keyboard support for Hebrew, Arabic, Thai is not put, but, at least, for Hebrew if to register language in the system and to set an own keyboard layout, in Outlook Express wilk be supported correct bidirectional input. Hebrew fragments in Unicode-pages (UTF-8 etc.) are now represented "logically", i.e. with change of a direction, as in products from Accent, instead of "visually", as in the previous version.
As well as was promised, the management of encodings through the Registry (section MIME) was in this version very hardly reduced and is transferred in the tables mlang.dll (and at all not into resources, where are mainly names of encodings in different languages, such as "Cyrillic (Windows)").


Netscape Communicator 4

  In the fourth version Netscape became Unicode-aware and multilingual, with support UTF-7 and UTF-8, however this support is much more brocken than in MSIE4. The fonts are required (or can be used) Unicode, but for single-byte encodings Netscape don't perform the interpretation of encoding but coverts it in the main Windows encoding for this Language. In particular, all Russian encodings will be recoded in CP1251, with loss of the pseudo-graphic from koi8-r. "Ampersand" entities are understood only beforehand known, exept of Unicode-encodings UTF-7 and UTF-8 - in them the characters for any "ampersand"-unicode are taked from the specific font via their Unicode-values.


Netscape 6
For the first time for long "newest browsers history " it is possible to speak about essence innovations in recently published Netscape 6 Preview Release 1, then 2 and 3, and, at last, in Netscape 6 release some of which are absent in MSIE. (We shall mark in particular, that the interface tends to full adjustability, many inyterface elements are made via XML and JavaScript.)
Language support is now completely based on Unicode and included Hebrew, Arabian, and also Thai, Vietnamese (these are by Microsoft so called "complex scripts", but the "comlexity" isn't supported) and Armenian (encoding ARMSCII-8). There are 9 (nine) cyrillic encodings (KOI8-R, Windows-1251, ISO-8859-5, ISO-IR-111, IBM-866, IBM-855, MacCyrillic, KOI8-U, MacUkrainian). Autorecognition of encodings is made on several groups of languages separately (in MSIE from all spectrum of languages). Besides iIt is possible to adjust a kind of the menu of the encodings, having kept in it only necessary encodings, for example, only cyrillic.
Definitions of encodings are contained in libraries ucv*.dll in the subdirectory (Netscape 6\components\) and we probably haven't any "legal" method to change them. At the same time set of viewable encodings may be set in the file (Netscape 6\res\charsetData.properties): for the encoding, which should be removed from the menu View - Character Coding line such so should be placed in the file:
windows-1255.notForBrowser = true.
(By default all or nearly so all Hebrew and Arabic encodings aren't shown.)
The search of characters that are absent in the current font, is made as it seems even not via unicode ranges and languages in TrueType font as in MSIE 4-5, and at the level of single Unicode values. I.e. if the correct system of the fonts is installed (under Windows it is a set of TrueType fonts, compatible with Unicode on the form and on the sense), then the sign, used on page, will be shown and shown correctly, if it is contained even in one installed font. See samples of multilingual pages.
Adobe Photoshop 5 and Illustrator 7

   Last versions of Adobe products become Unicode-aware, seems with support of chinesish/japanese/korean ideographes, however from european codepages is supported only western CP1252, for want of switching of the keyboard, for example, on Russian - the layout changes, and language isn't change, and on a screen instead of Russian the latin diacritical charactes are displayed.
For one - for example, Russian - language it is possible to workaround the problem, having replaced the NLS-file for page 1252 (CP_1252.NLS in Windows'9x, у_1252.NLS in NT) with the NLS-file from page 1251. One can make this either by direct copying (CP_1251.NLS into CP_1252.NLS, in DOS-mode), or by editing in the Registry with reboot of the computer:
REGEDIT4

[HKEY_LOCAL_MACHINE\System\CurrentControlSet\control\Nls\Codepage]
"1252" = "cp_1251.nls"

(For NT - by similar way:
"1252" = "c_1251.nls").

As well as it was to expect, after that will be blocked all latin diacritic (French, Spanish, German and other languages) also in other Unicode-aware programs working through NLS: in the western encoding in MSIE4, for want of input in Word'97 and WordPad'98. At the same time will be shown correctly datas immediately in Unicode (UTF-7/UTF-8 in MSIE4, earlier created file in Word'97).
Forte Agent

  Mail&Newsreader (commercial) Forte Agent isn't unicode-based program, however in recent versions 1.6 and more encodings-handling is perforemed via Unicode. The program support now also UTF-7/UTF-8. Russian-oriented settings for all versions of the Program and also (russian) article about Forte Agent 1.5 (with small modifications it is applicable also for last versions) see on the page with lovely software.

arrowhome  Return to the contents

 

Family   Search and infos   Eros and Tanatos   Languages, fonts and encodings
Tropical fruits   Mailer and other lovely programs   Miscellaneous