|
File Encoding

Auto
File encoding is automatically resolved. First, the zero
width no-break space character (known also as byte order
mark) is checked at the beginning of the text. If found the
file encoding is determined accordingly. If not found the content
of the file is examined. If the majority of even bytes is
non-zero and the majority of odd bytes is zero the file is
considered Unicode. If the ratio of non-zero and zero bytes is
reversed the file is considered Unicode (big endian). If the
above method doesn't yield a result the content is searched for
an occurence of a byte higher than 127. If found then the
sequence of bytes is examined. If a violation of the rules for
UTF-8 encoding is not found the file is considered UTF-8.
Otherwise, encoding defaults to ANSI.
ANSI
File is processed as if it were ANSI encoded regardless
what the content is.
UTF-8
File is processed as if it were UTF-8 encoded regardless
what the content is.
Unicode
File is processed as if it were Unicode encoded
regardless what the content is. Also the byte order mark is
ignored.
Unicode (big endian)
File is processed as if it were Unicode (big endian)
encoded regardless what the content is. Also the byte order
mark is ignored.
The file encoding in use also
determines the encoding for the the replace-with strings. If the
source file doesn't contain the byte order mark and it
doesn't contain character higher than 127 it can be considered
valid both ANSI and UTF-8 encoded file. A find-what string that
doesn't contain a character higher than 127 will be found in
either coding. If the replace-with string contains characters
higher than 127 (such as letter with diacritics), the proper file
encoding should be selected. In all other cases the option Auto
will yield correct results.
See also
Overview, Setting
It Up, Files
|