Unicode is intended to address the need for a workable, reliable world text encoding. Unicode could be roughly described as "wide-body ASCII" that has been stretched to 16 bits to encompass the characters of all the world's living languages. In a properly engineered design, 16 bits per character are more than sufficient for this purpose.
The idea of expanding the basis for character encoding from 8 to 16 bits is so sensible, indeed so obvious, that the mind initially recoils from it.
The major catch is simply that the 16-bit approach requires перестройка (perestroika), i.e. restructuring our old ways of thinking. Rather than struggling to salvage obsolete. 8-bit encodings via horrendous "extension" contrivances, we need to recognize that the current absence of a standard international/multilingual encoding is a unique opportunity to rethink and revitalize the design concepts behind text encoding.
Unicode 88
ReplyDeleteby Joseph D. Becker
1988
http://www.unicode.org/history/unicode88.pdf