sql server - Relationship between collation and encoding -
does sql server enforce or control encoding of varchar
columns in way? documentation i've browsed not make clear distinction collation (sort , compare rules) , encoding (byte representation of given character).
i have sql server instance modern_spanish_ci_as
(including database, tables , columns), got impression meant windows-1252
. database populated application uses windows-1252 well. recently, misconfigured application uses utf-8 has been writing data while and, surprise, sql server happily accepts complete unicode catalogue and, not that, other clients i've tried appear read data correctly no matter application table belongs to.
when cast hex:
select foo, cast(foo varbinary(max)) hex ...;
... see different encodings depending on application table belongs to:
first app:
€Á 0x80c1
second app:
€Á 0xac20c100
... raw characters display properly.
how can sql clients know source encoding?
edit: if both apps write same table find this:
€Á 0x80c1 ۈ 0xe282acc381
this guess seems supported testing , assorted documentation browsing. special binary collations aside, sql server considers 2 types of string data:
- legacy (single-byte)
- unicode (multi-byte)
legacy data expected encoded in whatever code page underlying windows system configured use. unicode less of problem since character repertoire same. in either case, it's driver used client 1 takes care of conversions (if any) , usual driver configurations include few options reflect fact (e.g. raw, ansi, utf-8). reasons, sql server not have setting or directive choose character set other dbms's have , need choose collation in usual meaning of term (sort , compare rules).
as how distinguish between 2 possible encodings, depends on column type:
char
,varchar
,text
... imply ansinchar
,nvarchar
,ntext
... imply unicode
if use incorrect encoding given column type you'll garbage €Ã
.
Comments
Post a Comment