david on 21 Nov 2000 19:09:06 -0000


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: <nettime> Asia and domain names, etc. (@)


At 08:59 +0900 00.11.17, Benjamin Geer wrote:

> On Fri, Nov 17, 2000 at 12:25:30AM +0900, david@2dk.net wrote:
> > Unicode, as I understand it, is a project to develop a global
> > localisation standard -- a way to learn how to write one(=uni)
> > source code that will be expedient to localise for any market.
>
> Well, not really.  It's actually a single character encoding that
> includes all the characters of nearly all known languages that have
> writing systems.

I really have very little faith that this discussion is serving more than
two individuals' egos. We've left the original topic completely and are
simply trying to help David and Benjamin find some point where they both
understand that they've not been proven 'wrong' in public. How boring.
Let's try and take the conversation somewhere that it has some relationship
to nettime-like topics.

As the 'David' half of the aforementioned ego issue, let me please try to
offer this solution in the following:

Yes, Unicode is a single encoding system, with a very large database
including many of the characters necessary to represent most of the
languages on earth. We've both addressed that question more than once.
Problem one is that Unicode was made without the participation of those
using the languages involved. (Which probably brings us back to my
knee-jerk reaction to Benjamin's 'I would hope that' tone.)
Unicode includes really blatant errors in 'far eastern' languages like the
inclusion of one character for both the 'chinese' numeral one, and a
hyphen, a distinction which is really really useful in languages like PRC
and ROC and Japanese and Korean, and it is omissions like this are
alienating and lead to its lack of adoption in asia, and gee isn't the
introduction of new cultural spheres (their *intact* introduction) into the
net what we're meant to be discussing here anyway.
Anyone can access unicode's web page and make their own judgements about
the general theory behind a universal linguistic encryption code and
whether the world is now suddenly going to become multilingual, using
hebraic and chinese and french in any given document, or whether the
primary advantage to having a unicode is for conversion of basic protocols
like multinational software so that Microsoft can monopolise all
wordprocessing software for everyone in the entire world. Yes, I know...
I'm not being cool and rational, but I do still believe that it is a valid
position.

> > This is a technical issue for software manufacturers who wish to
> > become multinationals, and not one for finding universal ways of
> > integrating living languages onto 'the' net.
>
> I think you've misunderstood it; it's the latter.  A Japanese, Thai,
> Russian, or English document that's encoded in Unicode (as opposed to
> one of the many older, language-specific encodings) can be displayed
> without modification in any web browser or word processor that
> understands Unicode and has a Unicode font.  This is why all of
> Microsoft's products, the Java platform, and the XML standard use
> Unicode as their underlying encoding.  It completely removes the need
> to `translate' documents from one encoding into another in order to
> display them in an environment other than the one they were written
> in.

We're both assuming each other's ignorance too much. That's right, they can
be displayed. Displayed, yes, displayed. Because Unicode features a large
database. Right. Display is not the problem, once you've got the code
written. Like in cases of localising multinational software products. The
problem is that the input methods of each nation suggest very interesting
linguistic aspects, which deserve a certain mention. And having one global
standard for something always makes for tricky political questions about
who gets to determine it. Then who benefits, who doesn't. If all cultures
would adapt their on-line linguistic culture to the needs of Unicode, the
world might be an easier place. But the fact is that the four cultures
listed, which include a substantial portion of the world's brainpower also
have other ideas, which may eventually be swayed to the Unicode camp,
haven't yet, and the reasons why they might not could be of some interest
to some nettime readers. I'd like to get that discussion started, please.

> > ISO 10646 is an international standard in that somebody recognises
> > that there is an issue here. It isn't a functioning initiative that
> > has been actually globally adopted.
>
> It's been adopted by every major software manufacturer, and my
> impression is that it's pretty well-supported on most operating
> systems.  To the best of my knowledge, if you use a recent version of
> Microsoft Word, you're writing documents in Unicode.

Actually, I use Jedit. I avoid Microsoft products whenever possible.
Because Microsoft products make many of my other software products crash.
Their files and often inconvertible. They're huge. They've done a shitty
job integrating double byte character recognition. They don't modify their
software so that word count functions meaningfully in double-byte projects.
They're intrusive into other elements of the operating system. Etc. etc.
etc. But that's not the point. The point is that we've got a considerable
portion of humanity (and many yet to be 'major software manufactures') yet
to come on line, and that they probably have fascinating things to
contribute in doing so. If Unicode becomes the defacto standard by popular
adoption, that's groovy, but I'm interested in what else is out there, and
there still seems to be a bit out there. At least among 1/3 of human kind.

> > I, with my Japanese system have immense problems sending the exact
> > same 'chinese' characters (though I also have a PRC chinese
> > character OS which I can reboot into) to my friends in Korea or
> > Taiwan. This is not a Unicode problem, nor anything that it will
> > solve in the forseeable future. Unicode means that all of us in
> > these various countries may be attempting to send these files in
> > various localised versions of MSWord which all function well in our
> > markets.
>
> Not at all.  In fact, that's exactly the problem Unicode is meant to
> solve.  Localisation and encoding are basically separate issues.
> Localisation means, for example, that the a menu is marked `File' in
> the English version of MS Word, and `Fichier' in the French version.
> Encoding, on the other hand, is the way characters are represented as
> bytes in the document produced.  The idea of Unicode is to enable
> every language to use the same encoding; therefore, you should be able
> to use any Unicode-compliant version of MS Word, regardless of its
> localisation, to read a document containing, say, a mixture of
> Japanese, Hungarian, Korean, and Arabic.

In single-byte character systems, 'File' can be converted into 'Fichier.'
Groovy. Not try converting that into 'ファイル' or '封筒' or '引き出し' or w
hatever you please. Now try inputting that in Japanese, Korean, ROC and
PRC. You'll find that there are cultural relations between the spoken,
printed and digitised word. There are linguistic conventions. Input
cultures. And the just having linguistic databases does not yet solve the
linguistic needs of each language.
But maybe (most probably) we're asking the wrong question. Let me put it
this way, does anyone on this list really believe that the Americans will
be able to convince the Chinese (whichever chinese) that they should adopt
their on-line conventions to a foreign developed linguistic database? Would
the Americans adopt a language set developed in Beijing?

> > (You should see what a nettime post sent from someone with a French
> > character set looks like when recieved on a double-byte OS. It's a
> > mess!!)
>
> The problem there is that French (along with some other European
> languages) is traditionally encoded in ISO-8859-1, and Japanese
> traditionally uses JIS or EUC.

Yes, these are two (though there is more than one JIS). Does Unicode
elegantly solve these things? Is the fact that they are not uni-versally
adopted just a matter of everybody in these language spheres being poor
sports?

> Most people on Nettime use ISO-8859-1
> (probably without realising it).  But if we all use Unicode-compliant
> mail readers, and we all write our French and Japanese emails in
> Unicode, everyone's emails will appear correctly on everyone else's
> computers.

That certainly is one solution. The world could also use one financial
instrument, currency.. or language, for that matter.

The issue that I initially proposed to Diwakar Agnihotri had to do with the
'one China' issue being highlighted in the introduction of written script
(both in input and display) issues on-line.
The question posed had to do with differing input methods for the Roman
character-based keyboard, and with various encryption methods. There was a
great discussion here in Japan, for example, when the Clinton
administration passed a 'communications act' in February of 96 or so. Some
of the encryption methods considered illegal were in use for the simple
transmission of language on-line. This discussion is what I was alluding to
in saying that someday each domain might have it's own encryption, like
secure cyber wallets that exchanged information as transactions. Once
you're dealing with complex character sets which require encryption, etc.
the linguistic questions change. It highlights that fact of languages which
are needed to input and display languages. I'm sure that there must be
something that we can talk about related to this.

#  distributed via <nettime>: no commercial use without permission
#  <nettime> is a moderated mailing list for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info: majordomo@bbs.thing.net and "info nettime-l" in the msg body
#  archive: http://www.nettime.org contact: nettime@bbs.thing.net