UTF-8 ÀÎÄÚµù°ú 16 ºñÆ® ÀÎÄÚµù°úÀÇ ºñ±³, ÀÎÄÚµù º¯È¯ ÀÚ¹Ù ÇÁ·Î±×·¥


[ Follow Ups ] [ Post Followup ] [ ÀÚ¹Ù ¹¯°í ´äÇϱâ ]

Posted by ±è´öÅ on July 07, 1997 at 17:28:38:

In Reply to: º¸Ãæ... posted by ±è´öÅ on July 05, 1997 at 12:34:40:


bangjy@geocities.com wrote:
>
> ÇѱÛÀÌ 3¹ÙÀÌÆ®·Î Ç¥ÇöµÈ´Ù´Â Á¡ÀÔ´Ï´Ù. ¹°·Ð ¿µ¾î±Ç »ç¿ëÀÚµéÀ»
> À§ÇØ ¸¸µç ÄÚµù ¹æ½ÄÀÌ´Ï ±×ÂÊ ¾ð¾î´Â 1¹ÙÀÌÆ®·Î Ç¥ÇöÀÌ °¡´ÉÇÏ´Ù
> ´Â ÀåÁ¡ÀÌ ÀÖÁö¸¸...±Û½ê, ¾î´À ÂÊÀÌ ÀÌÀÍÀÌ µÉ±î¿ä? ÇѱÛÀ̳ª
> ¿µ¾î³ª ¸ðµÎ 2¹ÙÀÌÆ®·Î ¾²´Â °Í°ú ÇѱÛÀº 3¹ÙÀÌÆ®, ¿µ¾î´Â 1¹ÙÀÌÆ®
> ·Î ¾²´Â °Í Áß¿¡¼­ ¸»ÀÔ´Ï´Ù.



¿ì¼±, Àú´Â UTF-8ÀÌ ¿µ¾î±Ç »ç¿ëÀÚµéÀ» À§Çؼ­¸¸ ¸¸µç ÀÎÄÚµù ¹æ½ÄÀ̶ó
º¸Áö ¾Ê½À´Ï´Ù.
±× ÀÌÀ¯´Â ¿ì¸®³ª¶ó¿¡ ÀϹÝÈ­µÈ ÀÎÄÚµùÀÎ EUC-KRÀÌ ÀÌ¹Ì ¾Æ½ºÅ°
ÀÎÄÚµùÀ» Æ÷ÇÔÇÏ°í ÀÖÀ¸¸ç, Æ÷½ºÆÃÇϽŠ±ÛÁß¿¡¼­µµ °ø¹é¹®ÀÚ (' ', '\t'),
¼ýÀÚ ('1', '2', '3'), ±âÈ£ ('.', '?'), ´«¿¡º¸ÀÌÁö´Â ¾Ê´Â Á¦¾î ¹®ÀÚ
('\r', '\n')µîÀÇ ¾Æ½ºÅ° ÀÎÄÚµùÀÌ ÀÌ¹Ì »ç¿ëµÇ¾ú½À´Ï´Ù.


KS C 5601³»¿¡ Á¤ÀǵǾî ÀÖ´Â Çѱ۸¸À» »ç¿ëÇÑ ÅؽºÆ® ¹®¼­°¡ °ú¿¬ ¾ó¸¶³ª
µÇ°Ú½À´Ï±î?


ÅؽºÆ® ¹®¼­ ¹× ¹®ÀÚ ÀÚ·á ±³È¯À» À§ÇÑ ÀϹÝÈ­µÈ ÀÎÄÚµù ¼±ÅÃÀÇ ÇÑ
±âÁØÀ¸·Î¼­ ¹®ÀÚ¸¦ Ç¥ÇöÇÏ´Â µ¥ Â÷ÁöÇÏ´Â °ø°£ÀÇ Å©±â¸¦ ¾ö¹ÐÇÏ°Ô µûÁ® º¼
ÇÊ¿ä°¡ ÀÖÀ» °ÍÀ̹ǷÎ, ´ÙÀ½°ú °°ÀÌ ¿©·¯ °¡Áö·Î ºÐ¼®ÇØ º¸¾Ò½À´Ï´Ù.



====== Unicode 16 ºñÆ® ÀÎÄÚµù°ú EUC-KR, UTF-8ÀÇ ¼öÇÐÀû ºñ±³ =====


À¯´ÏÄÚµå´Â ¸ðµç ¹®ÀÚ¸¦ 2 ¹ÙÀÌÆ®·Î Ç¥ÇöÇϹǷÎ, ¾Æ½ºÅ° ¹®ÀÚ ¹× ÇÑ±Û ¶ÇÇÑ
°¢°¢ 2 ¹ÙÀÌÆ®·Î Ç¥ÇöµÉ °ÍÀÔ´Ï´Ù À¯´ÏÄÚµå ÀÎÄÚµùÀ» »ç¿ëÇÏ¿© ÀÛ¼ºµÈ ¹®ÀÚ
ÀÚ·áÀÇ Å©±â¸¦ 1À̶ó°í ÇÒ ¶§, UTF-8·Î ÀÎÄÚµùµÈ ¹®ÀÚ ÀÚ·áÀÇ Å©±â´Â
ÃÖ¼±ÀÇ °æ¿ì 0.5 (¾Æ½ºÅ° ¹®Àڷθ¸ ÀÛ¼ºµÈ °æ¿ì)ÀÌ°í ÃÖ¾ÇÀÇ °æ¿ì (¼ø¼öÇÑ
Çѱ۷θ¸ ÀÛ¼ºµÈ °æ¿ì)´Â 1.5°¡ µÉ °ÍÀÔ´Ï´Ù. Áï, 1 ų·Î ¹ÙÀÌÆ®ÀÇ
À¯´ÏÄÚµå ¹®¼­°¡ UTF-8 ¹®¼­·Î ¹Ù²ð °æ¿ì ÃÖ¼Ò 0.5 ų·Î ¹ÙÀÌÆ®¿¡¼­ ÃÖ´ë
1.5 ų·Î ¹ÙÀÌÆ®¸¦ Â÷ÁöÇÒ °ÍÀÔ´Ï´Ù. ÀÏ¹Ý ÇÑ±Û ¹®¼­ÀÇ °æ¿ì ¾Æ½ºÅ°
¹®ÀÚ¿Í ÇÑ±Û ¹®ÀÚ¿ÍÀÇ ºñÀ²ÀÌ ´ë·« 1:2 ~ 1:4 °¡·® µÉ °ÍÀÔ´Ï´Ù.


ÀÌ ºñÀ²À» 1:3À¸·Î Àâ°Ú½À´Ï´Ù.


À¯´ÏÄÚµå 1 ¹®ÀÚ¸¦ Ç¥ÇöÇϱâ À§ÇÑ Æò±Õ ¹ÙÀÌÆ® ¼ö´Â ¹°·Ð 2 ¹ÙÀÌÆ®ÀÌ°í,
ÇÑ±Û ¹®¼­¿¡¼­ÀÇ 1 ¹®ÀÚ¸¦ Ç¥ÇöÇϱâ À§ÇÑ Æò±Õ UTF-8 ¹ÙÀÌÆ® ¼ö¸¦ x¶ó
Çϸé, x = 1 * 1/4 + 3 * 3/4 = 2.5


Áï, 2 : 2.5 = 1 : 1.25 °¡ µË´Ï´Ù.


µû¶ó¼­, ÇÑ±Û ¹®¼­¸¦ Ç¥ÇöÇϱâ À§ÇÑ ¹ÙÀÌÆ® ¼öÀÇ ºñÀ²Àº ´ÙÀ½°ú °°ÀÌ
Á¤¸®µË´Ï´Ù.


Unicode 16 ºñÆ® UTF-8ÀÇ ÃÖ¼Ò UTF-8ÀÇ Æò±Õ UTF-8ÀÇ ÃÖ´ë
--------------- ------------ ------------ ------------
100 % 50 % 125 % 150 %



¶ÇÇÑ, ±âÁ¸ EUC-KR ÀÎÄÚµùÀ¸·Î ÀÎÄÚµùµÈ ¹®¼­¸¦ À¯´ÏÄÚµå 16 ºñÆ® ¹×
UTF-8·Î º¯È¯ÇÏ¿´À» °æ¿ìÀÇ ºñ±³µµ ÇÊ¿äÇÒ °ÍÀÔ´Ï´Ù.
ÀÌ·¯ÇÑ ºñ±³¸¦ °£´ÜÇÏ°Ô Çϱâ À§ÇØ, À¯´ÏÄڵ忡¼­ÀÇ ¹ÙÀÌÆ® ¼ö¿Í
EUC-KR¿¡¼­ÀÇ ¹ÙÀÌÆ® ¼ö¸¦ ¸ÕÀú ºñ±³ÇÏ¿´½À´Ï´Ù.


ÇÑ±Û ¹®¼­ÀÇ 1 ¹®ÀÚ¸¦ Ç¥ÇöÇÏ´Â µ¥, EUC-KR·Î ÀÎÄÚµùµÉ ¶§ÀÇ Æò±Õ ¹ÙÀÌÆ®
¼ö¸¦
x¶ó Çϸé, x = 1 * 1/4 + 2 * 3/4 = 1.75


Áï, À¯´ÏÄÚµå 16 ºñÆ® ÀÎÄÚµù°ú EUC-KRÀÇ ¹ÙÀÌÆ® ¼ö ºñÀ²Àº
2 : 1.75 = 1 : 0.875 °¡ µË´Ï´Ù.


Unicode 16 ºñÆ® EUC-KRÀÇ ÃÖ¼Ò EUC-KRÀÇ Æò±Õ EUC-KRÀÇ ÃÖ´ë
--------------- ------------- -------------- ------------
100 % 50 % 87.5 % 100 %



Æò±Õ EUC-KR : UTF-8 = 87.5 : 125 ~= 1:1.43


EUC-KR UTF-8ÀÇ ÃÖ¼Ò UTF-8ÀÇ Æò±Õ UTF-8ÀÇ ÃÖ´ë
----------- ------------ ------------ -------------
100 % 100 % 143 % 150 %



====== À¯´ÏÄÚµå 16 ºñÆ® ÀÎÄÚµù°ú EUC-KR, UTF-8ÀÇ ½ÇÇèÀû ºñ±³ ======


½ÇÇèÀû ºñ±³Ä¡¸¦ ¾ò±â À§Çؼ­ ½ÇÇè ÀÚ·á·Î Æ÷½ºÆÃÇϽŠ±ÛÀ» ´ÙÀ½°ú °°ÀÌ
À©µµ¿ìÁî¿¡¼­ ÀúÀåÇÏ°í, ±× ¹ØÀÇ ¹®ÀÚ ¼¼Æ®/ÀÎÄÚµù º¯È¯ ÇÁ·Î±×·¥À»
»ç¿ëÇÏ¿© ±× Å©±â¸¦ ºñ±³ÇÏ¿´½À´Ï´Ù.


------- sample.kr --------------------------------------------
ÇѱÛÀÌ 3¹ÙÀÌÆ®·Î Ç¥ÇöµÈ´Ù´Â Á¡ÀÔ´Ï´Ù. ¹°·Ð ¿µ¾î±Ç »ç¿ëÀÚµéÀ»
À§ÇØ ¸¸µç ÄÚµù ¹æ½ÄÀÌ´Ï ±×ÂÊ ¾ð¾î´Â 1¹ÙÀÌÆ®·Î Ç¥ÇöÀÌ °¡´ÉÇÏ´Ù
´Â ÀåÁ¡ÀÌ ÀÖÁö¸¸...±Û½ê, ¾î´À ÂÊÀÌ ÀÌÀÍÀÌ µÉ±î¿ä? ÇѱÛÀ̳ª
¿µ¾î³ª ¸ðµÎ 2¹ÙÀÌÆ®·Î ¾²´Â °Í°ú ÇѱÛÀº 3¹ÙÀÌÆ®, ¿µ¾î´Â 1¹ÙÀÌÆ®
·Î ¾²´Â °Í Áß¿¡¼­ ¸»ÀÔ´Ï´Ù.
---------------------------------------------------------------



´ÙÀ½ ¹®ÀÚ¼¼Æ®/ÀÎÄÚµù º¯È¯ ÀÚ¹Ù ÇÁ·Î±×·¥Àº ½ã»ç¿¡¼­ ¹®ÀÚ ¼¼Æ®/ÀÎÄÚµù
º¯È¯ Ŭ·¡½º¸¦ ¾ÆÁ÷ Á¦°øÇÏÁö ¾Ê°í Â÷ÈÄ Á¦°øµÉ ¿¹Á¤À̶ó¼­
¸ðµç ¹®ÀÚ¼¼Æ®/ÀÎÄÚµù º¯È¯¿¡ »ç¿ëµÉ ¼ö Àְųª º¯È¯¿¡ ¹®Á¦°¡ ¹ß»ýÇÒ °æ¿ì
º¸´Ù À¯¿ëÇÑ Á¤º¸¿Í ÀûÀýÇÑ °æ°í ¸Þ½êÁö¸¦ ¸¸µé¾î ÁÖµµ·Ï ÇÒ ¼ö°¡
¾ø¾ú½À´Ï´Ù.
±×·¯³ª, ´ëºÎºÐÀÇ ÀϹÝÀûÀÎ ÀÎÄÚµù º¯È¯¿¡´Â º° ¹®Á¦¾øÀÌ »ç¿ëÇÏ½Ç ¼ö
ÀÖ½À´Ï´Ù.
¶Ç, JDK1.1.2´Â ÀÎÄÚµù 󸮿¡ ´Ù¼Ò ¹®Á¦°¡ ÀÖ´Ù°í ÇÏ´Ï, JDK1.1.3À»
¼³Ä¡ÇÏ¿© »ç¿ëÇϽñ⠹ٶø´Ï´Ù.


------- EncodingConverter.java -----------------------------------
// Usage: java EncodingConverter
import java.io.*;


public class EncodingConverter
{
public static void main(String args[])
throws IOException
{ InputStreamReader in
= new InputStreamReader(System.in, args[0]);
OutputStreamWriter out
= new OutputStreamWriter(System.out, args[1]);
for(int ch; (ch = in.read()) != -1;)
{ out.write(ch);
if ( ch == 0xfffd )
System.err.println("warning: some input character"
+ " cannot be converted to Unicode character");
}
out.close();
}
}
------------------------------------------------------------------



--------- ½ÇÇè °á°ú ----------------------------------------------
C:\> javac EncodingConverter.java
C:\> java EncodingConverter KSC5601 Unicode < sample.kr > sample.uni
C:\> java EncodingConverter KSC5601 UTF8 < sample.kr > sample.utf8
C:\> dir sample.*
sample kr 278 97-07-07 13:05 sample.kr
sample uni 336 97-07-07 13:07 sample.uni
SAMPL~C5 UTF 389 97-07-07 13:09 sample.utf8


½ÇÇè ÀÚ·á¿¡ ´ëÇؼ­´Â UTF-8 ÀÎÄÚµùÀÇ ¹®¼­ (sample.utf8)°¡ 389
¹ÙÀÌÆ®À̹ǷÎ, À¯´ÏÄÚµå 16 ºñÆ® ÀÎÄÚµù¿¡ ´ëÇؼ­´Â
389/336 * 100 = 116 % À̹ǷÎ, 16% ¹Û¿¡ Áõ°¡ÇÏÁö ¾Ê¾ÒÀ¸¸ç,
EUC-KR¿¡ ´ëÇؼ­´Â 40% Áõ°¡ÇÑ °ÍÀÓÀ» ¾Ë ¼ö ÀÖ½À´Ï´Ù.



======== °á·Ð ====================================================


UTF-8·Î ÀÎÄÚµùµÈ ÇÑ±Û ÀÚ·á´Â
À¯´ÏÄÚµå 16 ºñÆ® ÀÎÄÚµù¿¡ ºñÇÏ¿© ÃÖ´ë 50%, Æò±Õ 15% ~ 25% Áõ°¡Çϸç,
EUC-KR¿¡ ºñÇؼ­´Â ÃÖ´ë 50%, Æò±Õ 40% Áõ°¡ÇÕ´Ï´Ù.
(¹°·Ð, ´õ ¾ö¹ÐÇÑ Æò±ÕÄ¡´Â ´Ù¾çÇÑ ÇÑ±Û ¹®¼­¿¡ ´ëÇÏ¿© ½ÇÇèÇØ º¸¾Æ¾ß ÇÒ
°ÍÀÔ´Ï´Ù.)


ÀϹÝÈ­µÉ ÀÎÄÚµù Ç¥ÁØÀÇ ¼±Åà ±âÁØ¿¡ À־, ÀÌ Á¤µµÀÇ Å©±â Áõ°¡´Â
Å©°Ô ¹®Á¦°¡ µÇ´Â °ÍÀÌ ¾Æ´Ï¶ó ÆǴܵǸç, ±× ÀÌÀ¯´Â ´ÙÀ½°ú °°½À´Ï´Ù.


1. ÇÏµå µð½ºÅ© ¹× ¸Þ¸ð¸®ÀÇ ´ë¿ë·®È­ ¹× Àú·ÅÈ­µÇ´Â Ãß¼¼ÀÓ
2. ³×Æ®¿öÅ© ¼Óµµ°¡ Çâ»óµÇ´Â Ãß¼¼ÀÓ
3. ÇÏµå µð½ºÅ©¿Í ³×Æ®¿öÅ© trafficÀ» ¸¹ÀÌ Â÷ÁöÇÏ´Â °ÍÀº ¹®ÀÚ ÀÚ·á°¡
¾Æ´Ï¶ó, Á¤Áö ¿µ»ó, µ¿¿µ»óµîÀÇ °¢Á¾ ±×¸² È­ÀÏ ¹× ¿Àµð¿À È­ÀϵéÀ̸ç,
À̵鿡 ºñÇÏ¿© ¹®ÀÚ ÀÚ·á°¡ Â÷ÁöÇÏ´Â ºñÁßÀº ´õ¿í ´õ ÁÙ¾îµå´Â Ãß¼¼ÀÓ
(Âü°í: ÀÎÅÍ³Ý ³×Æ®¿öÅ© trafficÀÇ ´ëºÎºÐÀ» Â÷ÁöÇÏ´Â °ÍÀº µµ»ö
±×¸² È­ÀÏÀ̶ó´Â ¿¬±¸ °á°ú¸¦ ¾îµð¼­ º¸¾ÒÀ¸³ª ±× Ãâó´Â ±â¾ïÇÏÁö
¸øÇÏ°ÚÀ½)


¹°·Ð, ÀÌ Á¤µµÀÇ Å©±â Áõ°¡µµ ¹®Á¦°¡ µÇ´Â °æ¿ì°¡ ÀÖÀ» ¼ö ÀÖÀ¸¸ç, ±×·¯ÇÑ
°æ¿ì¿¡´Â º¸´Ù È¿À²ÀûÀÎ ¾ÐÃà ¾Ë°í¸®ÁòÀ» »ç¿ëÇÏ¿© ÀúÀå, °Ë»ö, Àü¼ÛÇÏ´Â
°ÍÀÌ º¸´Ù ÀûÀýÇÑ ¼±ÅÃÀÌÁö, ÀϹÝÈ­µÉ ÀÎÄÚµùÀÇ ¼±Åà ±âÁØ¿¡ Áß´ëÇÑ ¿µÇâÀ»
¹ÌÄ¥¸¸Å­ Áß¿äÇÏ´Ù°í º¸Áö ¾Ê°í ÀÖ½À´Ï´Ù.


¶Ç ÇÑ°¡Áö °í·ÁÇغ¸¾Æ¾ß ÇÏ´Â Á¡Àº ¹Ì±¹ ¹× À¯·´ÂÊ¿¡¼­ÀÇ Ç¥ÁØÈ­
Ãß¼¼ÀÔ´Ï´Ù. ÀÌµé ³ª¶ó¿¡¼­ ¼öÀÔÇÑ ½Ã½ºÅÛ ¹× S/W¸¦ »ç¿ëÇÒ »Ó¸¸ ¾Æ´Ï¶ó,
¿ì¸®³ª¶ó¿¡¼­ ¸¸µç ½Ã½ºÅÛ ¹× S/W°¡ À̵é°ú ¼­·Î ¹®Á¦¾øÀÌ µ¿ÀÛÇÔÀ¸·Î½á
À̵éÀ» »ç¿ëÇÏ¿© º¸´Ù È¿À²ÀûÀ¸·Î ºÎ°¡ °¡Ä¡¸¦ âÃâÇÏ°í, À̸¦ ´Ù½Ã ¼öÃâµµ
ÇϱâÀ§Çؼ­´Â ÀÌµé ³ª¶óÀÇ Ç¥ÁØÈ­ Ãß¼¼¸¦ ¹«½ÃÇÒ ¼ö°¡ ¾ø½À´Ï´Ù.


Àú·Î¼­´Â ÀÌµé ³ª¶óÀÇ Ç¥ÁØÈ­ Ãß¼¼±îÁö Á¶»çÇØ º¸Áö´Â ¸øÇßÀ¸³ª,
1. À̵éÀº µ¿¾ç±Ç ¹®È­¿¡ ºñÇÏ¿© »ç¿ëµÇ´Â ´ëºÎºÐÀÇ ¹®ÀÚ°¡ ¾Æ½ºÅ°
¹®ÀÚÀ̹ǷÎ, UTF-8À» ÀϹÝÈ­µÈ ÀÎÄÚµù Ç¥ÁØÀ¸·Î¼­ ¼±ÅÃÇÏ´Â °ÍÀÌ
¼ø¸®ÀÌ°í,
2. RFC 2130ÀÌ °®´Â ¿µÇâ·ÂÀÌ ¾ó¸¶³ª Å«Áö´Â ¸ð¸£°ÚÀ¸³ª, ÇØ´ç ¹®¼­¿¡¼­
UTF-8À» µðÆúÆ® ÀÎÄÚµùÀ¸·Î¼­ ±ÇÀåÇÏ°í ÀÖ´Â Á¡À¸·Î ¹Ì·ç¾î º¼ ¶§,


UTF-8·ÎÀÇ ÀÌÇàÀº ¼¼°èÀûÀÎ Ãß¼¼°¡ µÉ °ÍÀ¸·Î ÃßÃøµË´Ï´Ù.



--
Deogtae Kim (±è´öÅÂ)
CA Lab. CS Dept. KAIST
dtkim@camars.kaist.ac.kr



Follow Ups:



À̾ ±Û¿Ã¸®±â(´äÇϱâ)

À̸§:
E-Mail:
Á¦¸ñ:
³»¿ë:
°ü·Ã URL(¼±ÅÃ):
URL Á¦¸ñ(¼±ÅÃ):
°ü·Ã À̹ÌÁö URL:


[ Follow Ups ] [ Post Followup ] [ ÀÚ¹Ù ¹¯°í ´äÇϱâ ]