Re: [AC] [WWW] Fix validation, add Ville as Board member.

Wednesday, 29 January 2003

        On Wed, 2003-01-29 at 11:48, Stephen J. Turnbull wrote:
...
 >>>>> "Terje" == Terje Bless
<link(a)pobox.com&gt; writes:

     Terje> If the signature you are refering to is the UNICODE
     Terje> Byte-Order Mark then XML 1.0 Second Edition contained an

 Thank you for clarifying the standard.

 But, uhm, ZERO-WIDTH NO-BREAK SPACE, if you please.  ;-)  There is no
 BOM in UTF-8; not even Microsoft could get away with advocating
 little-endian UTF-8. 
No, but I think the term "BOM" is a synonym/alias for "Encoding
Signature". "ZERO-WIDTH NO-BREAK SPACE" is the actual character when
used for it's normal purpose, but this character has a dual function;
when it appears as the very first thing in an entity it takes on the
role of an Encoding Signature which for hysterical raisins is called a
"Byte-Order Mark".

IOW, AFAICT, when discussing the usage as an encoding signature, it is
appropriate to refer to it as either the "Byte-Order Mark" or the
"Encoding Signature" and not "ZERO-WIDTH NO-BREAK SPACE", despite
"Byte-Order Mark" being something of a misnomer, since both "BOM" and
"Encoding Signature" refer to the _role_ and not the character itself.

It's there in UTF-8 to allow an heuristic parser to identify this as a
flavour of UNICODE, and to distinguish UTF-8 from the other
transformation formats (such as UTF-16).

e.g. Appendix F of the XML 1.0 Recommendation specifies an algorithm for
automatically determining the encoding of the entity by sniffing for the
encoding signature and falling back on various bit patterns matching
"<?xml" in the various candidate encodings if a signature is not
present.

Then again, this all tends to give me a headache so I'm probably just
hopelessly confused. :-)

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

Re: [AC] [WWW] Fix validation, add Ville as Board member.