日本: 03-5843-1140
USA - Toll Free: +1-866-221-0634
USA - From abroad: +1-408-701-9009
USA - Subscription Renewals: +1-866-830-4410
UK: +44 845 399 1124
Ireland: +353 1 6919191
Germany: +49 89 420 95 98 95
France: +33 1 70 61 48 95
Sweden: +46 730 207 871
Benelux: +358 50 5710 528
Italy: +39 06-99268193
Israel: +358 50 5710 528
Spain & Portugal: + 34 933905461
Other EMEA countries: +353 1 6919191
Asia Pacific: +81 3 5843 1140
MySQL の新しいリリース、技術情報、イベントなどの情報が記載されています。
毎月発行される MySQL ニュースレターを購読しませんか?
Peter Gulutzan and Alexander Barkov
The MySQL Version-4.1 alpha, now available in binary form, should make those users happy who've been wanting multiple, or just non-English, character sets. The big new features are "many character sets per database / per server / per table", "many collations (sort orders) per character set", and "Unicode".
MANY CHARACTER SETS PER DATABASE / PER SERVER / PER TABLE
With version 4.0, you certainly have a choice of character sets ... but once you've made the choice, you have to stick with it. For example, with version 4.0 you can't say "this database has character set X (by default), but Table1.column1 will have character set Y while Table1.column2 will have character set Z." With version 4.1 it's a doddle:
CREATE DATABASE d CHARACTER SET latin1;
...
CREATE TABLE Table1 (
column1 CHAR(5) CHARACTER SET latin2,
column2 VARCHAR(777) CHARACTER SET latin5);
In the example above, we've made a database with a "default" character set of latin1 (a character set popular in Western Europe). But then we overrode the default by saying that Table1.column1 would have values in latin2 (a character set popular in Eastern Europe). Meanwhile Table1.column2 will have values in latin5, a Turkish character set.
The system looks imposing because there are so many defaults: you can specify the default character set at the server level, the database level, the table level, the column level, or the connection level. But it's easy to summarize: you can associate a different character set for any database object, or you can arrange it so that most objects have the default character set and a few objects will have whatever else you choose.
MANY COLLATIONS (SORT ORDERS) PER CHARACTER SET
Even if two strings are in the same character set, they might have different rules for sorting. This can happen within a language (for example a phone book might have different sorting rules than a dictionary); however, the main concern is with the difference among languages for sorting rules, which we'll call "Collations" because that's the SQL Standard term. For example, Swedish and English have different collations. You probably won't notice this unless you use accented characters, but here's an example:
(Swedish) (English)
FRY FRÜZ
FRÜZ FRY
ZENDA ÖTZI
ÖTZI ZENDA
What's especially surprising is that the Swedish (not the English) collation is the MySQL default default! But not to worry. Collations, like character sets, can be changed in several places. Here's one way:
CREATE TABLE Table2
(column1 CHAR(5),
column2 CHAR(5) COLLATE latin1_general_ci);
In the above example, we've allowed the database and table to have defaults. So unless we start the server with some non-default --with-character-set specification, we'll have a Table2.column1 with a default character set (latin1) and a default collation for that character set (latin1_swedish_ci). Table2.column2, on the other hand, will have a non-default collation: latin1_general_ci.
(By the way, the "ci" at the end of the collation name means the collation is case insensitive: A and a are treated as equal. A "cs" at the end of a collation name means the collation is case sensitive.)
Every character set has at least one collation, and some character sets, like latin1, have several collations.
UNICODE
Of especial interest, and as a result of huge demand, MySQL 4.1 supports two new "Unicode" character sets: ucs2 and utf8. Both ucs2 and utf8 have the same repertoire of characters (about 40,000 of them); the difference between them is that only ucs2 is a fixed-width character set (always 16 bits per character), while utf8 is a variable-width character set (between 8 and 24 bits per character).
The point of having Unicode character sets is that, with such a large repertoire available, MySQL can support strings from pretty well any language, or from all languages together. As well as saving you a lot of fiddling with different character sets, Unicode promises to be a major factor for new developments in XML, in other computer languages, and in international Internet connectivity.
In the early alpha releases for Windows there is a problem with Unicode and other complex character sets, but we're working on it.
HOW THE FEATURES COMPARE TO STANDARD SQL AND TO OTHER DBMS PRODUCTS
The new character set and collation features match the ANSI/ISO SQL Standard specifications (non-core Features F461 and F691). The flexibility of MySQL's defaults, and the ability to specify multiple character sets and collations within a single database or table, puts MySQL on a par with Oracle and SQL Server, and ahead of Sybase 12 or DB2 7. The addition of Unicode will mean that some host languages which use Unicode as a base character set (such as Java) will have an easier time interacting with MySQL.
AVAILABILITY
You can download a copy of MySQL 4.1 now, from www.mysql.com. You'll have to wait a short time before the new documentation appears -- there's so much to say, so it takes a while to incorporate in the manual. Since enhancements and bug fixes are still going on, you should expect that some character-set names and some syntax details may change without notice.
