蜂窝网络是什么| lg是什么| 孔雀开屏寓意什么意思| 玻璃是什么垃圾| 男女之间的吸引靠什么| 游泳比赛中wj是什么意思| 尿潜血弱阳性是什么意思| 手指抽筋是什么原因| 江西特产有什么| 法字五行属什么| 杨梅泡酒有什么功效和作用| 失眠多梦吃什么药效果最好| 浙江属于什么方向| 势利眼的人有什么特征| 嘴唇淡紫色是什么原因| 大便酸臭味是什么原因| 为什么男的叫鸭子| xo兑什么饮料好喝| 吃凉的胃疼吃什么药| 梵蒂冈为什么没人敢打| 成人达己是什么意思| 脚冰凉是什么原因| 姐姐的孩子叫我什么| 囊内可见卵黄囊是什么意思| 宫腔镜是什么检查| 颂字五行属什么| 京东自营店什么意思| 肠易激综合征吃什么中成药| 肩胛骨疼痛挂什么科| 郡主是什么身份| 雪媚娘是什么| 肌病是什么病| 喝什么水解酒| 划船是什么意思| 昱这个字念什么| 茯苓泡水喝有什么功效| 邓超属什么生肖| 叉烧炒什么菜好吃| 螃蟹不能和什么一起吃| 霉菌性阴道炎吃什么药| 夜间睡觉出汗是什么原因| 肠炎发烧吃什么药| 小孩吃牛肉有什么好处| 什么是子宫肌瘤| lof什么意思| 烟花三月是什么意思| 柳树代表什么生肖| 梦见别人理发是什么意思| 贫血都有什么症状| 新西兰现在是什么季节| 北京什么时候最热| 小孩自闭症是什么原因引起的| 脚脖子粗是什么原因| 吃什么补血补气最快| 上海月薪三万什么水平| n字鞋子是什么牌子| 什么药治拉肚子| 艺考音乐考什么| 突然的反义词是什么| 心慌吃什么药能缓解| 油菜花什么颜色| 黄金豆是什么豆| 什么发什么颜| 什么鱼最好养活| 2月7号是什么星座| 听雨是什么意思| 金牛座什么性格| 为什么腿会肿| 积劳成疾的疾是什么意思| 妈妈的姑姑叫什么| 肌酐低是什么意思啊| 水珠像什么| 性质是什么| 刘姥姥和贾府什么关系| 新生儿黄疸高有什么危害| 95年的属什么生肖| 低热吃什么药| 一什么酒店| 列席是什么意思| 女生腰疼是什么原因| 什么情况下会感染hpv病毒| 小姑娘月经推迟不来什么原因| 十九朵玫瑰花代表什么意思| 浪子回头金不换是什么意思| 女性私处长痘痘是什么原因| 小月子同房有什么危害| N1是什么| 荨麻疹为什么晚上起| 心脏彩超能检查出什么| 舌头挂什么科| 喉咙有异物感挂什么科| 小孩便秘是什么原因引起的| 梦见好多黄鳝是什么意思| 什么是法西斯主义| 为什么会尿酸高| 为什么夏天热冬天冷| hbaic是什么意思| 白皮鸡蛋是什么鸡下的| 880什么意思| 总蛋白高是什么原因| swi是什么检查| 直采是什么意思| 丹毒用什么抗生素| 差强人意什么意思| 辅弼是什么意思| 蜂王浆有什么好处| 怀孕胎盘低有什么影响| 有时头晕是什么原因| 先敬罗衣后敬人是什么意思| 七夕节是什么时候| 视力s和c代表什么| 什么的城市| 丙烯是什么| 苛捐杂税是什么生肖| 张紫妍为什么自杀| 去肝火喝什么茶| 阑尾疼吃什么药| 吃什么补黄体酮最快| 2段和3段奶粉有什么区别| 子宫切除有什么影响| 狗吃什么会死| ifound是什么牌子| 十二生肖里为什么没有猫| 酸枣什么时候成熟| 8月23日是什么星座| 胆摘除对身体有什么影响| 下岗是什么意思| 腮帮子疼吃什么药| 什么的时间| 把握时机是指什么生肖| 兔死狗烹什么意思| 葡萄糖酸钙锌口服溶液什么时候喝| n什么意思| 腹股沟在什么位置| 2.13是什么星座| 吃什么对头发好| 广东广西以什么为界| 女性夜尿多是什么原因| 羊刃格是什么意思| 染发有什么危害| 思维什么意思| 皮卡丘什么意思| 长期抽烟清肺喝什么茶| 双离合什么意思| 肾病可以吃什么水果| 10月28日什么星座| 桂花什么时候开| 过是什么结构| 词讼是什么意思| 纺织厂是做什么的| 20属什么| 常吃木耳有什么好处和坏处| 卵巢早衰吃什么可以补回来| 河马吃什么食物| 果肉属于什么组织| 吃什么缓解便秘| 远视眼是什么意思| 02年属什么| 意境是什么意思| 后悔是什么意思| 海棠花什么时候开花| 邵字五行属什么| 三七粉吃了有什么好处| 双顶径是什么意思| 为什么会突然不爱了| 耳朵红热是什么原因| 油碟是什么| drgs付费是什么意思| 脾囊肿是什么原因引起的| 梭子蟹什么季节吃最好| 什么什么如生| 为什么伤口愈合会痒| 肚子一直响是什么原因| 阴虱有什么症状| 老年人缺钾是什么原因引起的| 蔗糖是什么糖| 频频是什么意思| 天山童姥练的什么武功| 尿点什么意思| 扁桃体作用是什么| 皮肤黑的人适合穿什么颜色的衣服| 7月14号是什么星座| 鸡蛋白过敏指的是什么| la帽子是什么牌子| 9月25号什么星座| 林彪为什么出逃| 不排卵是什么原因| 头抖是什么原因| 蛋白粉什么时候喝最好| 清朝前面是什么朝代| 脚底板发红是什么原因| 左氧氟沙星有什么副作用| 黄芪和枸杞泡水喝有什么作用| 强直性脊柱炎是什么病| 肚子硬硬的是什么原因| b是什么| 手震颤是什么原因引起的| 腮帮子长痘痘是什么原因| 鸡婆是什么意思| 1956年属什么| 6月28号什么星座| 胃一阵一阵的疼是什么原因| 碱性磷酸酶偏高吃什么能降下来呢| 人为什么需要诗歌| 黑头是什么| 机遇什么意思| 两棵树是什么牌子| 五月23是什么星座| 部堂大人是什么职位| 孕早期适合吃什么水果| 白细胞偏低是什么原因造成的| 长期上夜班对身体有什么危害| 巨细胞病毒igg阳性是什么意思| 什么什么千山| 早上起床手指肿胀是什么原因| 为什么晚上血压比白天高| 载脂蛋白b偏低是什么意思| 手足口是什么引起的| 正常舌头是什么颜色| 蜜蜂糖冲开水有什么好处| 前纵韧带钙化是什么意思| 心律不齐吃什么食物好| 澳门什么时候回归的| ca19-9偏高是什么意思| 女性白细胞高是什么原因| a血型和o血型生出宝宝是什么血型| 低钾有什么症状和危害| 银梳子梳头有什么好处| 抗原是什么| 喘不上来气是什么原因| 阴道口痒用什么药| 吃什么补脾虚| 舒张压是什么| 女攻是什么意思| 面包是什么做的| 什么是条件反射| 大拇指麻木是什么原因| 男人梦见猫是什么意思| 89年五行属什么| 有机什么意思| 西腾手表属于什么档次| negative什么意思| gopro是什么| 儿童矮小挂什么科| 6月19是什么星座| 什么是劣药| 小月子能吃什么水果| pdi是什么| 吃饭老是噎着是什么原因| 93什么意思| 焦虑症吃什么药好得快| 胃疼需要做什么检查| 多发性脂肪瘤是什么原因造成的| 西药是用什么材料做的| 手脚发麻是什么原因| 直肠腺瘤是什么| 逆生长是什么意思| 房颤挂什么科| 隐翅虫吃什么| 一日三餐是什么意思| 出圈什么意思| 在吗是什么意思| miu是什么意思| 什么样的伤口算开放性| 百度

福建省委党校开展增殖放流活动 30万尾鱼苗回归自然

百度 对于即将一触即发的贸易战,IMF总裁拉加德警告称,全球贸易冲突可能会破坏多年来全球最广泛的经济复苏。

In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte. (In some contexts these terms are used more precisely; see Character encoding § Terminology.)

The term "code page" originated from IBM's EBCDIC-based mainframe systems,[1] but Microsoft, SAP,[2] and Oracle Corporation[3] are among the vendors that use this term. The majority of vendors identify their own character sets by a name. In the case when there is a plethora of character sets (like in IBM), identifying character sets through a number is a convenient way to distinguish them. Originally, the code page numbers referred to the page numbers in the IBM standard character set manual,[4][5][6] a condition which has not held for a long time. Vendors that use a code page system allocate their own code page number to a character encoding, even if it is better known by another name; for example, UTF-8 has been assigned page numbers 1208 at IBM, 65001 at Microsoft, and 4110 at SAP.

Hewlett-Packard uses a similar concept in its HP-UX operating system and its Printer Command Language[7] (PCL) protocol for printers (either for HP printers or not). The terminology, however, is different: What others call a character set, HP calls a symbol set, and what IBM or Microsoft call a code page, HP calls a symbol set code. HP developed a series of symbol sets,[8][9] each with an associated symbol set code, to encode both its own character sets and other vendors’ character sets.

The multitude of character sets leads many vendors to recommend Unicode.

The code page numbering system

edit

IBM introduced the concept of systematically assigning a small, but globally unique, 16 bit number to each character encoding that a computer system or collection of computer systems might encounter. The IBM origin of the numbering scheme is reflected in the fact that the smallest (first) numbers are assigned to variations of IBM's EBCDIC encoding and slightly larger numbers refer to variations of IBM's extended ASCII encoding as used in its PC hardware.

With the release of PC DOS version 3.3 (and the near identical MS-DOS 3.3) IBM introduced the code page numbering system to regular PC users, as the code page numbers (and the phrase "code page") were used in new commands to allow the character encoding used by all parts of the OS to be set in a systematic way.[10]

 
IBM code page numbers (CPGIDs and CCSIDs) used for CJK encodings. Microsoft's use of code page numbers for CJK encodings differs, and is noted in brackets where applicable.

After IBM and Microsoft ceased to cooperate in the 1990s, the two companies have maintained the list of assigned code page numbers independently from each other, resulting in some conflicting assignments. At least one third-party vendor (Oracle) also has its own different list of numeric assignments.[3] IBM's current assignments are listed in their CCSID repository, while Microsoft's assignments are documented within the MSDN.[11] Additionally, a list of the names and approximate IANA (Internet Assigned Numbers Authority) abbreviations for the installed code pages on any given Windows machine can be found in the Registry on that machine (this information is used by Microsoft programs such as Internet Explorer).

Most well-known code pages, excluding those for the CJK languages and Vietnamese, fit all their code-points into eight bits and do not involve anything more than mapping each code-point to a single character; furthermore, techniques such as combining characters, complex scripts, etc., are not involved.

The text mode of standard (VGA-compatible) PC graphics hardware is built around using an 8-bit code page, though it is possible to use two at once with some color depth sacrifice, and up to eight may be stored in the display adapter for easy switching.[12] There was a selection of third-party code page fonts that could be loaded into such hardware. However, it is now commonplace for operating system vendors to provide their own character encoding and rendering systems that run in a graphics mode and bypass this hardware limitation entirely. However the system of referring to character encodings by a code page number remains applicable, as an efficient alternative to string identifiers such as those specified by the IETF and IANA for use in various protocols such as e-mail and web pages.

Relationship to ASCII

edit

The majority of code pages in current use are supersets of ASCII, a 7-bit code representing 128 control codes and printable characters. In the distant past, 8-bit implementations of the ASCII code set the top bit to zero or used it as a parity bit in network data transmissions. When the top bit was made available for representing character data, a total of 256 characters and control codes could be represented. Most vendors (including IBM) used this extended range to encode characters used by various languages and graphical elements that allowed the imitation of primitive graphics on text-only output devices. No formal standard existed for these "extended ASCII character sets" and vendors referred to the variants as code pages, as IBM had always done for variants of EBCDIC encodings.

Relationship to Unicode

edit

Unicode is an effort to include all characters from all currently and historically used human languages into single character enumeration (effectively one large single code page), removing the need to distinguish between different code pages when handling digitally stored text. Unicode tries to retain backwards compatibility with many legacy code pages, copying some code pages 1:1 in the design process. An explicit design goal of Unicode was to allow round-trip conversion between all common legacy code pages, although this goal has not always been achieved. Some vendors, namely IBM and Microsoft, have anachronistically assigned code page numbers to Unicode encodings. This convention allows code page numbers to be used as metadata to identify the correct decoding algorithm when encountering binary stored data.

IBM code pages

edit

EBCDIC-based code pages

edit

These code pages are used by IBM in its EBCDIC character sets for mainframe computers.[13]

  • 1 – USA WP, Original
  • 2 – USA
  • 3 – USA Accounting, Version A
  • 4 – USA
  • 5 – USA
  • 6 – Latin America
  • 7 – Germany F.R. / Austria
  • 8 – Germany F.R.
  • 9 – France, Belgium
  • 10 – Canada (English)
  • 11 – Canada (French)
  • 12 – Italy
  • 13 – Netherlands
  • 14 – Spain
  • 15 – Switzerland (French)
  • 16 – Switzerland (French / German)
  • 17 – Switzerland (German)
  • 18 – Sweden / Finland
  • 19 – Sweden / Finland WP, version 2
  • 20 – Denmark/Norway
  • 21 – Brazil
  • 22 – Portugal
  • 23 – United Kingdom
  • 24 – United Kingdom
  • 25 – Japan (Latin)
  • 26 – Japan (Latin)
  • 27 – Greece (Latin)
  • 29 – Iceland
  • 30 – Turkey
  • 31 – South Africa
  • 32 – Czechoslovakia (Czech / Slovak)
  • 33 – Czechoslovakia
  • 34 – Czechoslovakia
  • 35 – Romania
  • 36 – Romania
  • 37 – USA/Canada - CECP (same with euro: 1140)
  • 37-2 – The real 3279 APL codepage, as used by C/370. This is very close to 1047, except for caret and not-sign inverted. It is not officially recognized by IBM, even though SHARE has pointed out its existence.[14]
  • 38 – USA ASCII
  • 39 – United Kingdom / Israel
  • 40 – United Kingdom
  • 251 – China
  • 252 – Poland
  • 254 – Hungary
  • 256 – International #1 (superseded by 500)
  • 257 – International #2
  • 258 – International #3
  • 259 – Symbols, Set 7
  • 260 – Canadian French - 116
  • 264 – Print Train & Text processing extended
  • 273 – Germany F.R./Austria - CECP (same with euro: 1141)
  • 274 – Old Belgium Code Page
  • 275 – Brazil - CECP
  • 276 – Canada (French) - 94
  • 277 – Denmark, Norway - CECP (same with euro: 1142)
  • 278 – Finland, Sweden - CECP (same with euro: 1143)
  • 279 – French - 94[14]
  • 280 – Italy - CECP (same with euro: 1144)
  • 281 – Japan (Latin) - CECP
  • 282 – Portugal - CECP
  • 283 – Spain - 190[14]
  • 284 – Spain/Latin America - CECP (same with euro: 1145)
  • 285 – United Kingdom - CECP (same with euro: 1146)
  • 286 – Austria / Germany F.R. Alternate
  • 287 – Denmark / Norway Alternate
  • 288 – Finland / Sweden Alternate
  • 289 – Spain Alternate
  • 290 – Japanese (Katakana) Extended
  • 293 – APL
  • 297 – France (same with euro: 1147)[14]
  • 298 – Japan (Katakana)
  • 300 – Japan (Kanji) DBCS (For JIS X 0213)
  • 310 – Graphic Escape APL/TN
  • 320 – Hungary
  • 321 – Yugoslavia
  • 322 – Turkey
  • 330 – International #4
  • 340 – EBCDIC, OCR (same as 893, superseded by 892 and 893)
  • 351 – GDDM default
  • 352 – Printing and publishing option
  • 353 – BCDIC-A
  • 354 – BCDIC-B
  • 355 – PTTC/BCD standard option
  • 357 – PTTC/BCD H option
  • 358 – PTTC/BCD Correspondence option
  • 359 – PTTC/BCD Monocase option
  • 360 – PTTC/BCD Duocase option
  • 361 – EBCDIC Publishing International
  • 363 – Symbols, set 8
  • 382 – EBCDIC Publishing Austria, Germany F.R. Alternate
  • 383 – EBCDIC Publishing Belgium
  • 384 – EBCDIC Publishing Brazil
  • 385 – EBCDIC Publishing Canada (French)
  • 386 – EBCDIC Publishing Denmark, Norway
  • 387 – EBCDIC Publishing Finland, Sweden
  • 388 – EBCDIC Publishing France
  • 389 – EBCDIC Publishing Italy
  • 390 – EBCDIC Publishing Japan (Latin)
  • 391 – EBCDIC Publishing Portugal
  • 392 – EBCDIC Publishing Spain, Philippines
  • 393 – EBCDIC Publishing Latin America (Spanish Speaking)
  • 394 – EBCDIC Publishing China (Hong Kong), UK, Ireland
  • 395 – EBCDIC Publishing Australia, New Zealand, USA, Canada (English)
  • 396 – BookMaster Specials
  • 410 – Cyrillic (revisions: 880, 1025, 1154)
  • 420 – Arabic
  • 421 – Maghreb/French
  • 423 – Greek (superseded by 875)
  • 424 – Hebrew (Bulletin Code)
  • 425 – Arabic / Latin for OS/390 Open Edition
  • 435 – Teletext Isomorphic
  • 500 – International #5 (ECECP; supersedes 256) (same with euro: 1148)
  • 803 – Hebrew Character Set A (Old Code)
  • 829 – Host Math Symbols- Publishing
  • 830 – Math Format
  • 831 – Portugal (Alternate) (same as 37)
  • 833 – Korean Extended (SBCS)
  • 834 – Korean Hangul (KSC5601; DBCS with UDCs)
  • 835 – Traditional Chinese DBCS
  • 836 – Simplified Chinese Extended
  • 837 – Simplified Chinese DBCS
  • 838 – Thai with Low Marks & Accented Characters (same with euro: 1160)
  • 839 – Thai DBCS
  • 870 – Latin 2 (same with euro: 1153) (revision: 1110)
  • 871 – Iceland (same with euro: 1149)[14]
  • 875 – Greek (supersedes 423)
  • 880 – Cyrillic (revision of 410) (revisions: 1025, 1154)
  • 881 – United States - 5080 Graphics System
  • 882 – United Kingdom - 5080 Graphics System
  • 883 – Sweden - 5080 Graphics System
  • 884 – Germany - 5080 Graphics System
  • 885 – France - 5080 Graphics System
  • 886 – Italy - 5080 Graphics System
  • 887 – Japan - 5080 Graphics System
  • 888 – France AZERTY - 5080 Graphics System
  • 889 – Thailand
  • 890 – Yugoslavia
  • 892 – EBCDIC, OCR A
  • 893 – EBCDIC, OCR B
  • 905 – Latin 3
  • 918 – Urdu Bilingual
  • 924 – Latin 9
  • 930 – Japan MIX (290 + 300) (same with euro: 1390)
  • 931 – Japan MIX (37 + 300)
  • 933 – Korea MIX (833 + 834) (same with euro: 1364)
  • 935 – Simplified Chinese MIX (836 + 837) (same with euro: 1388)
  • 937 – Traditional Chinese MIX (37 + 835) (same with euro: 1371)
  • 939 – Japan MIX (1027 + 300) (same with euro: 1399)
  • 1001 – MICR
  • 1002 – EBCDIC DCF Release 2 Compatibility
  • 1003 – EBCDIC DCF, US Text subset
  • 1005 – EBCDIC Isomorphic Text Communication
  • 1007 – EBCDIC Arabic (XCOM2)
  • 1024 – EBCDIC T.61
  • 1025 – Cyrillic, Multilingual (same with euro: 1154) (Revision of 880)
  • 1026 – EBCDIC Turkey (Latin 5) (same with euro: 1155) (supersedes 905 in that country)
  • 1027 – Japanese (Latin) Extended (JIS X 0201 Extended)
  • 1028 – EBCDIC Publishing Hebrew
  • 1030 – Japanese (Katakana) Extended
  • 1031 – Japanese (Latin) Extended
  • 1032 – MICR, E13-B Combined
  • 1033 – MICR, CMC-7 Combined
  • 1037 – Korea - 5080/6090 Graphics System
  • 1039 – GML Compatibility
  • 1047 – Latin 1/Open Systems[14]
  • 1068 – DCF Compatibility
  • 1069 – Latin 4
  • 1070 – USA / Canada Version 0 (Code page 37 Version 0)
  • 1071 – Germany F.R. / Austria (Code page 273 Version 0)
  • 1072 – Belgium (Code page 274 Version 0)
  • 1073 – Brazil (Code page 275 Version 0)
  • 1074 – Denmark, Norway (Code page 277 Version 0)
  • 1075 – Finland, Sweden (Code page 278 Version 0)
  • 1076 – Italy (Code page 280 Version 0)
  • 1077 – Japan (Latin) (Code page 281 Version 0)
  • 1078 – Portugal (Code page 282 Version 0)
  • 1079 – Spain / Latin America Version 0 (Code page 284 Version 0)
  • 1080 – United Kingdom (Code page 285 Version 0)
  • 1081 – France Version 0 (Code page 297 Version 0)
  • 1082 – Israel (Hebrew)
  • 1083 – Israel (Hebrew)
  • 1084 – International#5 Version 0 (Code page 500 Version 0)
  • 1085 – Iceland (Code page 871 Version 0)
  • 1087 – Symbol Set
  • 1091 – Modified Symbols, Set 7
  • 1093 – IBM Logo[15]
  • 1097 – Farsi Bilingual
  • 1110 – Latin 2 (Revision of 870)
  • 1112 – Baltic Multilingual (same with euro: 1156)
  • 1113 – Latin 6
  • 1122 – Estonia (same with euro: 1157)
  • 1123 – Cyrillic, Ukraine (same with euro: 1158)
  • 1130 – Vietnamese (same with euro: 1164)
  • 1132 – Lao EBCDIC
  • 1136 – Hitachi Katakana
  • 1137 – Devanagari EBCDIC
  • 1140 – USA, Canada, etc. ECECP (same without euro: 37) (Traditional Chinese version: 1159)
  • 1141 – Austria, Germany ECECP (same without euro: 273)
  • 1142 – Denmark, Norway ECECP (same without euro: 277)
  • 1143 – Finland, Sweden ECECP (same without euro: 278)
  • 1144 – Italy ECECP (same without euro: 280)
  • 1145 – Spain, Latin America (Spanish) ECECP (same without euro: 284)
  • 1146 – UK ECECP (same without euro: 285)
  • 1147 – France ECECP with euro (same without euro: 297)
  • 1148 – International ECECP with euro (same without euro: 500)
  • 1149 – Icelandic ECECP with euro (same without euro: 871)
  • 1150 – Korean Extended with box characters
  • 1151 – Simplified Chinese Extended with box characters
  • 1152 – Traditional Chinese Extended with box characters
  • 1153 – Latin 2 Multilingual with euro (same without euro: 870)
  • 1154 – Cyrillic, Multilingual with euro (same without euro: 1025; an older version is * 1166)
  • 1155 – Turkey with euro (same without euro: 1026) (same with lira: 1175)
  • 1156 – Baltic Multi with euro (same without euro: 1112)
  • 1157 – Estonia with euro (same without euro: 1122)
  • 1158 – Cyrillic, Ukraine with euro (same without euro: 1123)
  • 1159 – T-Chinese EBCDIC (Traditional Chinese euro update of * 1140)
  • 1160 – Thai with Low Marks & Accented Characters with euro (same without euro: 838)
  • 1164 – Vietnamese with euro (same without euro: 1130)
  • 1165 – Latin 2/Open Systems
  • 1166 – Cyrillic Kazakh
  • 1175 – Turkey with euro and lira (same without lira: 1155)
  • 1278 – EBCDIC Adobe (PostScript) Standard Encoding
  • 1279 – Hitachi Japanese Katakana Host[6]
  • 1300 – Generic Bar Code/OCR-B
  • 1301 – Zip + 4 POSTNET Bar Code
  • 1302 – Facing Identification Marks
  • 1303 – EBCDIC Bar Code
  • 1364 – Korea MIX (833 + 834 + euro) (same without euro: 933)
  • 1371 – Traditional Chinese MIX (1159 + 835) (same without euro: 937)
  • 1376 – Traditional Chinese DBCS Host extension for HKSCS
  • 1377 – Mixed Host HKSCS Growing (37 + 1376)
  • 1378 – Traditional Chinese DBCS Host extension for HKSCS and Simplified Chinese (superset of 1376)
  • 1379 – Mixed Host HKSCS and Simplified Chinese Growing?(37 + 1378) (superset of 1377)
  • 1388 – Simplified Chinese MIX (same without euro: 935) (836 + 837 + euro)
  • 1390 – Simplified Chinese MIX Japan MIX (same without euro: 930) (290 + 300 + euro)
  • 1399 – Japan MIX (1027 + 300 + euro) (same without euro: 939)

DOS code pages

edit

These code pages are used by IBM in its PC DOS operating system. These code pages were originally embedded directly in the text mode hardware of the graphic adapters used with the IBM PC and its clones, including the original MDA and CGA adapters whose character sets could only be changed by physically replacing a ROM chip that contained the font. The interface of those adapters (emulated by all later adapters such as VGA) was typically limited to single byte character sets with only 256 characters in each font/encoding (although VGA added partial support for slightly larger character sets).

  • 301 – IBM-PC Japan (Kanji) DBCS
  • 437 – Original IBM PC hardware code page
  • 720 – Arabic (Transparent ASMO)
  • 737Greek
  • 775 – Latin-7
  • 808 – Russian with euro (same without euro: 866)
  • 848 – Ukrainian with euro (same without euro: 1125)
  • 849 – Belarusian with euro (same without euro: 1131)
  • 850 – Latin-1
  • 851 – Greek
  • 852 – Latin-2
  • 853 – Latin-3
  • 855 – Cyrillic (same with euro: 872)
  • 856 – Hebrew
  • 857 – Latin-5
  • 858 – Latin-1 with euro symbol
  • 859 – Latin-9
  • 860 – Portuguese
  • 861Icelandic
  • 862Hebrew
  • 863Canadian French
  • 864Arabic
  • 865Danish/Norwegian
  • 866 – Belarusian, Russian, Ukrainian (same with euro: 808)
  • 867Hebrew + euro (based on CP862) (conflictive ID: NEC Czech (Kamenicky), which was created before this codepage)
  • 868Urdu
  • 869Greek
  • 872 – Cyrillic with euro (same without euro: 855)
  • 874 – Thai with Low Tone Marks & Ancient Chars (conflictive ID with Windows 874; version with euro: 1161 Windows version: is IBM 1162)
  • 876 – OCR A
  • 877 – OCR B
  • 878KOI8-R
  • 891 – Korean PC SBCS
  • 898 – IBM-PC WP Multilingual
  • 899 – IBM-PC Symbol
  • 903 – Simplified Chinese PC SBCS
  • 904 – Traditional Chinese PC SBCS
  • 906 – International Set #5 3812/3820
  • 907 – ASCII APL (3812)
  • 909 – IBM-PC APL2 Extended
  • 910 – IBM-PC APL2
  • 911 – IBM-PC Japan #1
  • 926 – Korean PC DBCS
  • 927 – Traditional Chinese PC DBCS
  • 928 – Simplified Chinese PC DBCS
  • 929 – Thai PC DBCS
  • 932 – IBM-PC Japan MIX (DOS/V) (DBCS) (897 + 301) (conflictive ID with Windows 932; Windows version is IBM 943)
  • 934 – IBM-PC Korea MIX (DOS/V) (DBCS) (891 + 926)
  • 936 – IBM-PC Simplified Chinese MIX (gb2312) (DOS/V) (DBCS) (903 + 928) (conflictive ID with Windows 936; Windows version is IBM 1386)
  • 938 – IBM-PC Traditional Chinese MIX (DOS/V, OS/2) (904 + 927)
  • 942 – IBM-PC Japan MIX (Japanese SAA (OS/2)) (1041 + 301)
  • 943 – IBM-PC Japan OPEN (897 + 941) (Windows CP 932)
  • 944 – IBM-PC Korea MIX (Korean SAA (OS/2)) (1040 + 926)
  • 946 – IBM-PC Simplified Chinese (Simplified Chinese SAA (OS/2)) (1042 + 928)
  • 948 – IBM-PC Traditional Chinese (Traditional Chinese SAA (OS/2)) (1043 + 927)
  • 949 – Korean (Extended Wansung (ks_c_5601-1987)) (1088 + 951) (conflictive ID with Windows 949 (Unified Hangul Code); Windows version is IBM 1363)
  • 951 – Korean DBCS (IBM KS Code) (conflictive ID with Windows 951, a hack of Windows 950 with Unicode mappings for some PUA Unicode characters found in HKSCS, based on the file name)
  • 1034 – Printer Application - Shipping Label, Set #2
  • 1040 – Korean Extended
  • 1041 – Japanese Extended (JIS X 0201 Extended)
  • 1042 – Simplified Chinese Extended
  • 1043 – Traditional Chinese Extended
  • 1044 – Printer Application - Shipping Label, Set #1
  • 1086 – IBM-PC Japan #1
  • 1088 – Revised Korean (SBCS)
  • 1092 – IBM-PC Modified Symbols
  • 1098Farsi
  • 1108 – DITROFF Base Compatibility
  • 1109 – DITROFF Specials Compatibility
  • 1115 – IBM-PC People's Republic of China
  • 1116 – Estonian
  • 1117 – Latvian
  • 1118 – Lithuanian (IBM's implementation of Lika's code page 774)
  • 1119 – Lithuanian and Russian (IBM's implementation of Lika's code page 772)
  • 1125 – Cyrillic, Ukrainian (same with euro: 848) (IBM modification of RUSCII)
  • 1127 – IBM-PC Arabic / French
  • 1131 – IBM-PC Data, Cyrillic, Belarusian (same with euro: 849)
  • 1139 – Japan Alphanumeric Katakana
  • 1161 – Thai with Low Tone Marks & Ancient Chars with euro (same without euro: 874)
  • 1167KOI8-RU
  • 1168KOI8-U
  • 1370 – Traditional Chinese MIX (Big5 encoding) (1114 + 947 + euro) (same without euro: 950)
  • 1380 – IBM-PC Simplified Chinese GB PC-DATA (DBCS PC IBM GB 2312-80)
  • 1381 – IBM-PC Simplified Chinese (1115 + 1380)
  • 1393 – Japanese JIS X 0213 DBCS
  • 1394 – IBM-PC Japan (JIS X 0213) (897 + 1393)

When dealing with older hardware, protocols and file formats, it is often necessary to support these code pages, but newer encoding systems, in particular Unicode, are encouraged for new designs.

DOS code pages are typically stored in .CPI files.[16][17][18][19][20]

IBM AIX code pages

edit

These code pages are used by IBM in its AIX operating system. They emulate several character sets, namely those ones designed to be used accordingly to ISO, such as UNIX-like operating systems.

Code page 819 is identical to Latin-1, ISO/IEC 8859-1, and with slightly-modified commands, permits MS-DOS machines to use that encoding. It was used with IBM AS/400 minicomputers.

IBM OS/2 code pages

edit

These code pages are used by IBM in its OS/2 operating system.

  • 1004 – Latin-1 Extended, Desk Top Publishing/Windows[21]

Windows emulation code pages

edit

These code pages are used by IBM when emulating the Microsoft Windows character sets. Most of these code pages have the same number as Microsoft code pages, although they are not exactly identical. Some code pages, though, are new from IBM, not devised by Microsoft.

Macintosh emulation code pages

edit

These code pages are used by IBM when emulating the Apple Macintosh character sets.

  • 1275 – Apple Roman
  • 1280 – Apple Greek
  • 1281 – Apple Turkish
  • 1282 – Apple Central European
  • 1283 – Apple Cyrillic
  • 1284 – Apple Croatian
  • 1285 – Apple Romanian
  • 1286 – Apple Icelandic

Adobe emulation code pages

edit

These code pages are used by IBM when emulating the Adobe character sets.

  • 1038 – Adobe Symbol Encoding
  • 1276 – Adobe (PostScript) Standard Encoding
  • 1277 – Adobe (PostScript) Latin 1

HP emulation code pages

edit

These code pages are used by IBM when emulating the HP character sets.

DEC emulation code pages

edit

These code pages are used by IBM when emulating the DEC character sets.

  • 1020 – 7-bit Canadian (French) NRC Set
  • 1021 – 7-bit Switzerland NRC Set
  • 1023 – 7-bit Spanish NRC Set
  • 1090 – Special Characters and Line Drawing Set
  • 1100 – DEC Multinational
  • 1101 – 7-bit British NRC Set
  • 1102 – 7-bit Dutch NRC Set
  • 1103 – 7-bit Finnish NRC Set
  • 1104 – 7-bit French NRC Set
  • 1105 – 7-bit Norwegian/Danish NRC Set
  • 1106 – 7-bit Swedish NRC Set
  • 1107 – 7-bit Norwegian/Danish NRC Alternate
  • 1287 – DEC Greek
  • 1288 – DEC Turkish

IBM Unicode code pages

edit

Microsoft code pages

edit

Windows code pages

edit

These code pages are used by Microsoft in its own Windows operating system. Microsoft defined a number of code pages known as the ANSI code pages (as the first one, 1252 was based on an apocryphal ANSI draft of what became ISO 8859-1). Code page 1252 is built on ISO 8859-1 but uses the range 0x80-0x9F for extra printable characters rather than the C1 control codes from ISO 6429 mentioned by ISO 8859-1.[24] Some of the others are based in part on other parts of ISO 8859 but often rearranged to make them closer to 1252.

Microsoft recommends new applications use UTF-8 or UCS-2/UTF-16 instead of these code pages.[25]

DBCS code pages

edit

These code pages represent DBCS character encodings for various CJK languages. In Microsoft operating systems, these are used as both the "OEM" and "Windows" code page for the applicable locale.

MS-DOS code pages

edit

These code pages are used by Microsoft in its MS-DOS operating system. Microsoft refers to these as the OEM code pages because they were defined by the original equipment manufacturers who licensed MS-DOS for distribution with their hardware, not by Microsoft or a standards organization. Most of these code pages have the same number as the equivalent IBM code pages, although some are not exactly identical.[26]

Macintosh emulation code pages

edit

These code pages are used by Microsoft when emulating the Apple Macintosh character sets.

Various other Microsoft code pages

edit

The following code page numbers are specific to Microsoft Windows. IBM may use different numbers for these code pages. They emulate several character sets, namely those ones designed to be used accordingly to ISO,[clarification needed] such as UNIX-like operating systems.

Microsoft Unicode code pages

edit

HP Symbol Sets

edit

HP developed a series of Symbol Sets (each with its associated Symbol Set Code) to encode either its own character sets or other vendors’ character sets. They are normally 7-bit character sets which, when moved to the higher part and associated with the ASCII character set, make up 8-bit character sets.

HP own Symbol Sets

edit
  • Symbol Set 0E — HP Roman Extension — 7-bit character set with accented letters (coded by IBM as code page 1050)
  • Symbol Set 0G — HP 7-bit German
  • Symbol Set 0L — HP 7-bit PC Line (coded by IBM as code page 1055)
  • Symbol Set 0M — HP Math-7
  • Symbol Set 0T — HP Thai-8
  • Symbol Set 1S — HP 7-bit Spanish
  • Symbol Set 1U — HP 7-bit Gothic Legal (coded by IBM as code page 1052)
  • Symbol Set 4Q — HP Line Draw (coded by IBM as code page 1056)
  • Symbol Set 4U — HP Roman-9 — Roman-8 + €
  • Symbol Set 7J — HP Desktop
  • Symbol Set 7S — HP 7-bit European Spanish
  • Symbol Set 8E — HP East-8
  • Symbol Set 8G — HP Greek-8 (based on IR 088; not on ELOT 927)
  • Symbol Set 8H — HP Hebrew-8
  • Symbol Set 8I — MS LineDraw (ASCII + HP PC Line)
  • Symbol Set 8K — HP Kana-8 (ASCII + Japanese Katakana)
  • Symbol Set 8L — HP LineDraw (ASCII + HP Line Draw)
  • Symbol Set 8M — HP Math-8 (ASCII + HP Math-8)
  • Symbol Set 8R — HP Cyrillic-8
  • Symbol Set 8S — HP 7-bit Latin American Spanish
  • Symbol Set 8T — HP Turkish-8
  • Symbol Set 8U — HP Roman-8 (ASCII + HP Roman Extension; coded by IBM as code page 1051)
  • Symbol Set 8V — HP Arabic-8
  • Symbol Set 9K — HP Korean-8
  • Symbol Set 9T — PC 8T (also known as Code Page 437-T; this is not code page 857)
  • Symbol Set 9V — Latin / Arabic for Windows (this is not code page 1256)
  • Symbol Set 11U — PC 8D/N (also known as Code Page 437-N; coded by IBM as code page 1058; this is not code page 865)
  • Symbol set 14G — PC-8 Greek Alternate (also known as Code Page 437-G; almost the same as code page 737)
  • Symbol Set 18K —
  • Symbol Set 18T —
  • Symbol Set 19C —
  • Symbol Set 19K —

Symbol Sets from other vendors

edit
  • Symbol Set 0D — ISO 60: 7-bit Norwegian
  • Symbol Set 0F — ISO 25: 7-bit French
  • Symbol Set 0H — HP 7-bit Hebrew — Practically the same as Israeli Standard SI 960
  • Symbol Set 0I — ISO 15: 7-bit Italian
  • Symbol Set 0K — ISO 14: 7-bit Japanese Katakana
  • Symbol Set 0N — ISO 8859-1 Latin 1 (Initially called "Gothic-1"; coded by IBM as code page 1053)
  • Symbol Set 0R — ISO 8859-5 Latin/Cyrillic (1986 version — IR 111)
  • Symbol Set 0S — ISO 11: 7-bit Swedish
  • Symbol Set 0U — ISO 6: 7-bit U.S.
  • Symbol Set 0V — Arabic
  • Symbol Set 1D — ISO 61: 7-bit Norwegian
  • Symbol Set 1E — ISO 4: 7-bit U. K.
  • Symbol Set 1F — ISO 69: 7-bit French
  • Symbol Set 1G — ISO 21: 7-bit German
  • Symbol Set 1K — ISO 13: 7-bit Japanese Latin
  • Symbol Set 1T — Windows Thai (Practically the same as 874)
  • Symbol Set 2K — ISO 57: 7-bit Simplified Chinese Latin
  • Symbol Set 2N — ISO 8859-2 Latin 2
  • Symbol Set 2S — ISO 17: 7-bit Spanish
  • Symbol Set 2U — ISO 2: 7-bit International Reference Version
  • Symbol Set 3N — ISO 8859-3 Latin 3
  • Symbol Set 3R — PC-866 Russia (Practically the same as code page 866)
  • Symbol Set 3S — ISO 10: 7-bit Swedish
  • Symbol Set 4N — ISO 8859-4 Latin 4
  • Symbol Set 4S — ISO 16: 7-bit Portuguese
  • Symbol Set 5M — PS Math Symbol (Practically the same as Adobe Symbols)
  • Symbol Set 5N — ISO 8859-9 Latin 5
  • Symbol Set 5S — ISO 84: 7-bit Portuguese
  • Symbol Set 5T — Windows 3.1 Latin-5 (Practically the same as code page 1254)
  • Symbol Set 6J — Microsoft Publishing
  • Symbol Set 6M — Ventura Math
  • Symbol Set 6N — ISO 8859-10 Latin 6
  • Symbol Set 6S — ISO 85: 7-bit Spanish
  • Symbol Set 7H — ISO 8859-8 Latin/Hebrew
  • Symbol Set 9E — Windows 3.1 Latin 2 (Practically the same as code page 1250)
  • Symbol Set 9G — Windows 98 Greek (Practically the same as code page 1253)
  • Symbol Set 9J — PC 1004
  • Symbol Set 9L — Ventura ITC Zapf Dingbats
  • Symbol Set 9N — ISO 8859-15 Latin 9
  • Symbol Set 9R — Windows 98 Cyrillic (Practically the same as code page 1251)
  • Symbol Set 9U — Windows 3.0
  • Symbol Set 10G — PC-851 Latin/Greek (Practically the same as code page 851)
  • Symbol Set 10J — PS Text (Practically the same as Adobe Standard)
  • Symbol Set 10L — PS ITC Zapf Dingbats (Practically the same as Adobe Dingbats)
  • Symbol Set 10N — ISO 8859-5 Latin/Cyrillic (1988 version — IR 144)
  • Symbol Set 10R — PC-855 Cyrillic (Practically the same as code page 855)
  • Symbol Set 10T — Teletex
  • Symbol Set 10U — PC-8 (Practically the same as code page 437; coded by IBM as code page 1057)
  • Symbol Set 10V — CP-864 (Practically the same as code page 864)
  • Symbol Set 11G — CP-869 (Practically the same as code page 869)
  • Symbol Set 11J — PS ISO Latin-1 (Practically the same as Adobe Latin-1)
  • Symbol Set 11N — ISO 8859-6 Latin/Arabic
  • Symbol Set 12G — PC Latin/Greek (Practically the same as code page 737)
  • Symbol Set 12J — MC Text (Practically the same as Macintosh Roman)
  • Symbol Set 12N — ISO 8859-7 Latin/Greek
  • Symbol Set 12R — PC Gost (Practically the same as PC GOST Main)
  • Symbol Set 12U — PC-850 Latin 1 (Practically the same as code page 850)
  • Symbol Set 13J — Ventura International
  • Symbol Set 13R — PC Bulgarian (Practically the same as MIK)
  • Symbol Set 13U — PC-858 Latin 1 + € (Practically the same as code page 858)
  • Symbol Set 14J — Ventura U. S.
  • Symbol Set 14L — Windows Dingbats
  • Symbol Set 14P — ABICOMP International (Practically the same as ABICOMP)
  • Symbol Set 14R — PC Ukrainian (Practically the same as RUSCII)
  • Symbol Set 15H — PC-862 Israel (Practically the same as code page 862)
  • Symbol Set 16U — PC-857 Latin 5 (Practically the same as code page 857)
  • Symbol Set 17U — PC-852 Latin 2 (Practically the same as code page 852)
  • Symbol Set 18N — UTF-8
  • Symbol Set 18U — PC-853 Latin 3 (Practically the same as code page 853)
  • Symbol Set 19L — Windows 98 Baltic (Practically the same as code page 1257)
  • Symbol Set 19M — Windows Symbol
  • Symbol Set 19U — Windows 3.1 Latin 1 (Practically the same as code page 1252)
  • Symbol Set 20U — PC-860 Portugal (Practically the same as code page 860)
  • Symbol Set 21U — PC-861 Iceland (Practically the same as code page 861)
  • Symbol Set 23U — PC-863 Canada - French (Practically the same as code page 863)
  • Symbol Set 24Q — PC-Polish Mazowia (Practically the same as Mazovia encoding)
  • Symbol Set 25U — PC-865 Denmark/Norway (Practically the same as code page 865)
  • Symbol Set 26U — PC-775 Latin 7 (Practically the same as code page 775)
  • Symbol Set 27Q — PC-8 PC Nova (Practically the same as [PC Nova)
  • Symbol Set 27U — PC Latvian Russian (also known as 866-Latvian)
  • Symbol Set 28U — PC Lithuanian/Russian (Practically the same as code page 774)
  • Symbol Set 29U — PC-772 Lithuanian/Russian (Practically the same as code page 772)

Code pages from other vendors

edit

These code pages are independent assignments by third party vendors. Since the original IBM PC code page (number 437) was not really designed for international use, several partially compatible country or region specific variants emerged.

These code pages number assignments are not official neither by IBM, neither by Microsoft and almost none of them is referred as a usable character set by IANA. The numbers assigned to these code pages are arbitrary and may clash to registered numbers in use by IBM or Microsoft. Some of them may predate codepage switching being added in DOS 3.3.

  • 100 – DOS Hebrew hardware fontpage (Not from IBM; HDOS)[34]
  • 111 – DOS Greek (Not from IBM; AST Premium Exec DOS 5.0[35][36][37])
  • 112 – DOS Turkish (Not from IBM; AST Premium Exec DOS 5.0[35][36][37])
  • 113 – DOS Yugoslavian (Not from IBM; AST Premium Exec DOS 5.0[35][36][37])
  • 151 – DOS Nafitha Arabic (Not from IBM; ADOS)
  • 152 – DOS Nafitha Arabic (Not from IBM; ADOS)
  • 161 – DOS Arabic (Not from IBM; ADOS)[34]
  • 162 – DOS Arabic with vowel diacritics (Not from IBM; ADOS)
  • 163 – DOS Arabic and French (Not from IBM; ADOS)[34]
  • 164 – DOS Arabic and French with vowel diacritics (Not from IBM; ADOS)
  • 165 – DOS Arabic (864 Extended) (Not from IBM; ADOS)[34]
  • 166 – IBM Arabic PC (ADOS)[34]
  • 190 – DEC DOS German (appears to be identical to Code page 437)
  • 210 – DEC DOS Greek (NEC Jetmate printers)
  • 220 – DEC DOS Spanish (Not from IBM)
  • 489 – Czechoslovakian [OCR software 1993]
  • 620 – DOS Polish (Mazovia) (Not from IBM)
  • 667 – DOS Polish (Mazovia) (Not from IBM)
  • 668 – DOS Polish (Not from IBM)
  • 706 – MS-DOS Server Arabic Sakhr (Not from IBM; Sakhr Software from MSX Computers)
  • 707 – MS-DOS Arabic Sakhr (Not from IBM; Sakhr Software from MSX Computers)
  • 709 – MS-DOS Arabic (ASMO 449+/BCON V4)
  • 710 – MS-DOS Arabic (Transparent Arabic)
  • 711 – MS-DOS Arabic Nafitha Enhanced (Not from IBM)
  • 714 – MS-DOS Arabic Sakr (Not from IBM)
  • 715 – MS-DOS Arabic APTEC (Not from IBM)
  • 721 – MS-DOS Arabic Nafitha International (Not from IBM)
  • 768 – Arabic Al-Arabi (Not from IBM)
  • 770 – DOS Estonian, Latvian, Lithuanian[38] (From Lithuanian Lika Software;[39] Lithuanian RST 1095-89 National Standard)
  • 771 – DOS Lithuanian/Cyrillic — KBL[40] (From Lithuanian Lika Software[39])
  • 772 – DOS Lithuanian/Cyrillic[41] (From Lithuanian Lika Software;[39] Lithuanian LST 1284:1993 National Standard; adopted by IBM as code page 1119)
  • 773 – DOS Latin-7 — KBL (From Lithuanian Lika Software)
  • 774 – DOS Lithuanian[42] (From Lithuanian Lika Software;[39] Lithuanian LST 1283:1993 National Standard; adopted by IBM as code page 1118)
  • 775 – DOS Latin-7 Baltic Rim (From Lithuanian Lika Software;[39] Lithuanian LST 1590-1 National Standard; adopted by IBM and Microsoft as code page 775)
  • 776 – DOS Lithuanian (extended CP770)[43] (From Lithuanian Lika Software[39])
  • 777 – DOS Accented Lithuanian (old) (extended CP773) — KBL[43] (From Lithuanian Lika Software[39])
  • 778 – DOS Accented Lithuanian (extended CP775)[43] (From Lithuanian Lika Software[39])
  • 790 – DOS Polish (Mazovia) with curly quotation marks
  • 854 – Spanish[44][6]
  • 881 – Latin 1 (Not from IBM; AST Premium Exec DOS 5.0[35][36][37]) (conflictive ID with IBM EBCDIC 881)
  • 882 – Latin 2 (ISO 8859-2) (Not from IBM; same as Code page 912; AST Premium Exec DOS 5.0[35][36][37]) (conflictive ID with IBM EBCDIC 882)
  • 883 – Latin 3 (Not from IBM; AST Premium Exec DOS 5.0[35][36][37]) (conflictive ID with IBM EBCDIC 883)
  • 884 – Latin 4 (Not from IBM; AST Premium Exec DOS 5.0[35][36][37]) (conflictive ID with IBM EBCDIC 884)
  • 885 – Latin 5 (Not from IBM; AST Premium Exec DOS 5.0[35][36][37]) (conflictive ID with IBM EBCDIC 885)
  • 895Czech (Kamenicky), (Not from IBM; conflictive ID with IBM CP895 — 7-bit EUC Japanese Roman)
  • 896 – DOS Polish (Mazovia) (Not from IBM; conflictive ID with IBM CP896 — 7-bit EUC Japanese Katakana)
  • 900 – DOS Russian (Russian MS-DOS 5.0 LCD.CPI)
  • 928 – Greek (on Star[45] printers); same as Greek National Standard ELOT 928 (Not from IBM; conflictive ID with IBM CP928 — Simplified Chinese PC DBCS)
  • 966 – Saudi Arabian (Not from IBM)
  • 972 – Hebrew (VT100) (Not from IBM)
  • 991 – DOS Polish (Mazovia) (Not from IBM)
  • 999 – DOS Serbo-Croatian I (Not from IBM); also known as PC Nova and CroSCII; lower part is JUSI.B1.002, upper part is code page 437; supports Slovenian and Serbo-Croatian (Latin script)
  • 1001 – Arabic (on Star[45] printers) (Not from IBM; conflictive ID with IBM CP1001 — MICR)
  • 1261 – Windows Korean IBM-1261 LMBCS-17, similar to 1363
  • 1270 – Windows Sámi
  • 1300 – ANSI [PTS-DOS 6.70, not 6.51] (Not from IBM; conflictive ID with IBM EBCDIC 1300 — Generic Bar Code/OCR-B)
  • 2001 – Lithuanian KBL (on Star[45] printers); same as code page 771
  • 3001 – Estonian 1 (on Star[45] printers); same as code page 1116
  • 3002 – Estonian 2 (on Star[45] printers); same as code page 922
  • 3011 – Latvian 1 (on Star[45] printers); same as code page 437-Latvian
  • 3012 – Latvian-2 (on Star[45] printers); same as code page 866-Latvian (Latvian RST 1040-90 National Standard)
  • 3021 – Bulgarian (on Star[45] printers); same as MIK
  • 3031 – Hebrew (on Star[45] printers); same as code page 862
  • 3041 – Maltese (on Star[45] printers); same as ISO 646 Maltese
  • 3840 – IBM-Russian (on Star[45] printers); nearly the same as CP 866
  • 3841 – Gost-Russian (on Star[45] printers); GOST 13052 plus characters for Central Asian languages
  • 3843 – Polish (on Star[45] printers); same as Mazovia
  • 3844 – CS2 (on Star[45] printers); same as Kamenicky
  • 3845 – Hungarian (on Star[45] printers); same as CWI
  • 3846 – Turkish (on Star[45] printers); same as PC-8 Turkish + old Turkish Lira sign (T?) at code point A8
  • 3847 – Brazil-ABNT (on Star[45] printers); same as the Brazilian National Standard NBR-9614:1986
  • 3848 – Brazil-ABICOMP (on Star[45] printers); same as ABICOMP
  • 3850 – Standard KU (on Star[45] printers); variation of the Kasetsart University encoding for Thai
  • 3860 – Rajvitee KU (on Star[45] printers); variation of the Kasetsart University encoding for Thai
  • 3861 – Microwiz KU (on Star[45] printers); variation of the Kasetsart University encoding for Thai
  • 3863 – STD988 TIS (on Star[45] printers); variation of the TIS 620 encoding for Thai
  • 3864 – Popular TIS (on Star[45] printers); variation of the TIS 620 encoding for Thai
  • 3865 – Newsic TIS (on Star[45] printers); variation of the TIS 620 encoding for Thai
  • 28799FOCAL (on Star[45] printers); same as FOCAL character set
  • 28800HP RPL (on Star[45] printers); same as RPL
  • (number missing) – CWI-2 (for DOS) supports Hungarian
  • (number missing) – MIK (for DOS) supports Bulgarian
  • (number missing) – DOS Serbo-Croatian II; supports Slovenian and Serbo-Croatian (Latin script)
  • (number missing) — Russian Alternative code page (for DOS); this is the origin for IBM CP 866

List of code page assignments

edit

List of known code page assignments (incomplete):

ID Names Description Origin Platform DOS OS/2 Windows Mac Else Encoding Comment
0 Reserved IBM, Microsoft 3.3+ 1.0+ ? ? ? Internal OS use[34]
437 CP437, IBM437 PC US IBM[46] IBM PC 3.3+ 1.0+ Yes ? Yes 8-bit SBCS
57344 - 61439 Private use derivations IBM various Private use code page derivations (E000h-EFFFh)
65280 - 65533 Private use definitions IBM various Private use code page definitions (FF00h-FFFDh)
65534 Reserved IBM, Microsoft ? ? ? ? ? various Internal OS use (FFFEh)
65535 Reserved IBM, Microsoft 3.3+ 1.0+ ? ? ? various Internal OS use (FFFFh)[34]

Criticism

edit

Many older character encodings (unlike Unicode) suffer from several problems. Some vendors insufficiently document the meaning of all code point values in their code pages, which decreases the reliability of handling textual data consistently through various computer systems. Some vendors add proprietary extensions to established code pages, to add or change certain code point values: for example, byte 0x5C in Shift JIS can represent either a back slash or a yen sign depending on the platform. Finally, in order to support several languages in a program that does not use Unicode, the code page used for each string/document needs to be stored.

Applications may also mislabel text in Windows-1252 as ISO-8859-1. The only difference between these code pages is that the code point values in the range 0x80–0x9F, used by ISO-8859-1 for control characters, are instead used as additional printable characters in Windows-1252 – notably for quotation marks, the euro sign and the trademark symbol among others. Browsers on non-Windows platforms would tend to show empty boxes or question marks for these characters, making the text hard to read. Most browsers fixed this by ignoring the character set and interpreting as Windows-1252 to look acceptable. In HTML5, treating ISO-8859-1 as Windows-1252 is even codified as a W3C standard.[47] Although browsers were typically programmed to deal with this behaviour, this was not always true of other software. Consequently, when receiving a file transfer from a Windows system, non-Windows platforms would either ignore these characters or treat them as a standard control characters and attempt to take the specified control action accordingly.

Due to Unicode's extensive documentation, vast repertoire of characters and stability policy of characters, the problems listed above are rarely a concern for Unicode. UTF-8 (which can encode over one million codepoints) has replaced the code-page method in terms of popularity on the Internet.[48][49]

Private code pages

edit

When, early in the history of personal computers, users did not find their character encoding requirements met, private or local code pages were created using terminate-and-stay-resident utilities or by re-programming BIOS EPROMs. In some cases, unofficial code page numbers were invented (e.g. CP895).

When more diverse character set support became available most of those code pages fell into disuse, with some exceptions such as the Kamenicky or KEYBCS2 encoding for the Czech and Slovak alphabets. Another character set is Iran System encoding standard that was created by Iran System corporation for Persian language support. This standard was in use in Iran in DOS-based programs and after introduction of Microsoft code page 1256 this standard became obsolete. However some Windows and DOS programs using this encoding are still in use and some Windows fonts with this encoding exist.

In order to overcome such problems, the IBM Character Data Representation Architecture level 2 specifically reserves ranges of code page IDs for user-definable and private-use assignments. Whenever such code page IDs are used, the user must not assume that the same functionality and appearance can be reproduced in another system configuration or on another device or system unless the user takes care of this specifically. The code page range 57344-61439 (E000h-EFFFh) is officially reserved for user-definable code pages (or actually CCSIDs in the context of IBM CDRA), whereas the range 65280-65533 (FF00h-FFFDh) is reserved for any user-definable "private use" assignments. For example, a non-registered custom variant of code page 437 (1B5h) or 28591 (6FAF) could become 57781 (E1B5h) or 61359 (EFAFh), respectively, in order to avoid potential conflicts with other assignments and maintain the sometimes existing internal numerical logic in the assignments of the original code pages. An unregistered private code page not based on an existing code page, a device specific code page like a printer font, which just needs a logical handle to become addressable for the system, a frequently changing download font, or a code page number with a symbolic meaning in the local environment could have an assignment in the private range like 65280 (FF00h).

The code page IDs 0, 65534 (FFFEh) and 65535 (FFFFh) are reserved for internal use by operating systems such as DOS and must not be assigned to any specific code pages.

See also

edit

References

edit
  1. ^ "Contents". www.ibm.com.
  2. ^ "Code Page". sap.com. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  3. ^ a b "Glossary". oracle.com. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  4. ^ "VT510 Video Terminal Programmer Information". Digital Equipment Corporation (DEC). 7.1. Character Sets - Overview. Archived from the original on 2025-08-06. Retrieved 2025-08-06. In addition to traditional DEC and ISO character sets, which conform to the structure and rules of ISO 2022, the VT510 supports a number of IBM PC code pages (page numbers in IBM's standard character set manual) in PCTerm mode to emulate the console terminal of industry-standard PCs.
  5. ^ "7.1. Character Sets - Overview". VT520/VT525 Video Terminal Programmer Information (PDF). Digital Equipment Corporation (DEC). July 1994. p. 7-1. EK-VT520-RM. A01. Archived (PDF) from the original on 2025-08-06. Retrieved 2025-08-06. In addition to traditional DEC and ISO character sets the VT520 supports a number of IBM PC code pages (which refer to page numbers in IBM's standard character set manual) in PCTerm mode to emulate the console terminal of industry-standard PCs.
  6. ^ a b c Paul, Matthias R. (2025-08-06) [1995]. "Overview on DOS, OS/2, and Windows codepages" (CODEPAGE.LST file) (1.59 preliminary ed.). Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  7. ^ "Printer Command Language Symbol Sets". www.pclviewer.com. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  8. ^ "HP Symbol Sets". pclhelp.com. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  9. ^ "PCL5 Camparison Guide" (PDF). Archived (PDF) from the original on 2025-08-06. Retrieved 2025-08-06.
  10. ^ Zbikowski, Mark; Allen, Paul; Ballmer, Steve; Borman, Reuben; Borman, Rob; Butler, John; Carroll, Chuck; Chamberlain, Mark; Chell, David; Colee, Mike; Courtney, Mike; Dryfoos, Mike; Duncan, Rachel; Eckhardt, Kurt; Evans, Eric; Farmer, Rick; Gates, Bill; Geary, Michael; Griffin, Bob; Hogarth, Doug; Johnson, James W.; Kermaani, Kaamel; King, Adrian; Koch, Reed; Landowski, James; Larson, Chris; Lennon, Thomas; Lipkie, Dan; McDonald, Marc; McKinney, Bruce; Martin, Pascal; Mathers, Estelle; Matthews, Bob; Melin, David; Mergentime, Charles; Nevin, Randy; Newell, Dan; Newell, Tani; Norris, David; O'Leary, Mike; O'Rear, Bob; Olsson, Mike; Osterman, Larry; Ostling, Ridge; Pai, Sunil; Paterson, Tim; Perez, Gary; Peters, Chris; Petzold, Charles; Pollock, John; Reynolds, Aaron; Rubin, Darryl; Ryan, Ralph; Schulmeisters, Karl; Shah, Rajen; Shaw, Barry; Short, Anthony; Slivka, Ben; Smirl, Jon; Stillmaker, Betty; Stoddard, John; Tillman, Dennis; Whitten, Greg; Yount, Natalie; Zeck, Steve (1988). "Technical advisors". The MS-DOS Encyclopedia: versions 1.0 through 3.2. By Duncan, Ray; Bostwick, Steve; Burgoyne, Keith; Byers, Robert A.; Hogan, Thom; Kyle, Jim; Letwin, Gordon; Petzold, Charles; Rabinowitz, Chip; Tomlin, Jim; Wilton, Richard; Wolverton, Van; Wong, William; Woodcock, JoAnne (Completely reworked ed.). Redmond, Washington, USA: Microsoft Press. ISBN 1-55615-049-0. LCCN 87-21452. OCLC 16581341. [1] Archived 2025-08-06 at the Wayback Machine (xix+1570 pages; 26 cm) (NB. This edition was published in 1988 after extensive rework of the withdrawn 1986 first edition by a different team of authors.)
  11. ^ "Code Page Identifiers". microsoft.com. Microsoft. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  12. ^ "VGA/SVGA Video Programming--VGA Text Mode Operation". osdever.net. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  13. ^ "IBM i Globalization: Code Pages". IBM. Archived from the original on 2025-08-06.
  14. ^ a b c d e f xlate - Transliterate Contents of Records, IBM Corporation, 2010 [1986], archived from the original on 2025-08-06, retrieved 2025-08-06
  15. ^ "Code Page CPGID 01093 (pdf)" (PDF). Archived from the original (PDF) on 2025-08-06.
  16. ^ Paul, Matthias R. (2025-08-06) [1995]. "Format description of DOS, OS/2, and Windows NT .CPI, and Linux .CP files" (CPI.LST file) (1.30 ed.). Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  17. ^ Elliott, John C. (2025-08-06). "CPI file format". Seasip.info. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  18. ^ Brouwer, Andries Evert (2025-08-06). "CPI fonts". 0.2. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  19. ^ Haralambous, Yannis (September 2007). Fonts & Encodings. Translated by Horne, P. Scott (1 ed.). Sebastopol, California, USA: O'Reilly Media, Inc. pp. 601–602, 611. ISBN 978-0-596-10242-5.
  20. ^ MS-DOS Programmer's Reference. Microsoft Press. 1991. ISBN 1-55615-329-5.
  21. ^ "Codepage 1004 - Windows Extended". IBM. 2001. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  22. ^ "Character Data Representation Architecture". IBM. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  23. ^ a b c d e f g h i j k l "IBM Coded Character Set Identifier (CCSID)". IBM. Archived from the original on 2025-08-06.
  24. ^ ISO/IEC 8859-1:1998(E). ISO. 2025-08-06. p. 1. Archived from the original on 2025-08-06. Retrieved 2025-08-06. The coded characters in this set may be used in conjunction with coded control functions selected from ISO/IEC 6429.
  25. ^ "Code Pages". microsoft.com. Microsoft. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  26. ^ "pentaho/pentaho-reporting". GitHub. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  27. ^ a b c d e "Code Page Identifiers". Microsoft Developer Network. Microsoft. 2014. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  28. ^ a b c d e "Web Encodings - Internet Explorer - Encodings". WHATWG Wiki. 2025-08-06. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  29. ^ Foller, Antonin (2014) [2011]. "Western European (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  30. ^ Foller, Antonin (2014) [2011]. "German (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  31. ^ Foller, Antonin (2014) [2011]. "Swedish (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  32. ^ Foller, Antonin (2014) [2011]. "Norwegian (IA5) encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  33. ^ Foller, Antonin (2014) [2011]. "US-ASCII encoding - Windows charsets". WUtils.com - Online web utility and help. Motobit Software. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  34. ^ a b c d e f g Paul, Matthias R. (2025-08-06), Technical info on undocumented DOS country info for LCASE, ARAMODE and CCTORC records, FreeDOS development list fd-dev at Topica, archived from the original on 2025-08-06, retrieved 2025-08-06
  35. ^ a b c d e f g h Brown, Ralf D. (2025-08-06). The x86 Interrupt List. 61.
  36. ^ a b c d e f g h Paul, Matthias R. (2025-08-06). NWDOS-TIPs — Tips & Tricks rund um Novell DOS 7, mit Blick auf undokumentierte Details, Bugs und Workarounds. MPDOSTIP (in German) (3 ed.). Archived from the original on 2025-08-06. Retrieved 2025-08-06. (NB. NWDOSTIP.TXT is a comprehensive work on Novell DOS 7 and OpenDOS 7.01, including the description of many undocumented features and internals. It is part of the author's yet larger MPDOSTIP.ZIP collection maintained up to 2001 and distributed on many sites at the time. The provided link points to a HTML-converted older version of the NWDOSTIP.TXT file.)
  37. ^ a b c d e f g h Paul, Matthias R. (2025-08-06). NWDOS-TIPs — Tips & Tricks rund um Novell DOS 7, mit Blick auf undokumentierte Details, Bugs und Workarounds. MPDOSTIP (in German) (3 ed.).
  38. ^ "770". Archived from the original on 2025-08-06. Retrieved 2025-08-06. From Lithuanian Lika Software
  39. ^ a b c d e f g h "LIKIT". www.likit.lt. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  40. ^ "771". Archived from the original on 2025-08-06. Retrieved 2025-08-06. From Lithuanian Lika Software
  41. ^ "772". Archived from the original on 2025-08-06. Retrieved 2025-08-06. From Lithuanian Lika Software
  42. ^ "774". Archived from the original on 2025-08-06. Retrieved 2025-08-06. From Lithuanian Lika Software
  43. ^ a b c "lietuvyb?.lt - Ra?men? koduot?s" [lietuvyb?.lt - Character encodings] (in Lithuanian). Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  44. ^ Hogan, Thom (1992). Die PC-Referenz für Programmierer (in German) (2 ed.). Systhema Verlag GmbH. ISBN 3-89390-272-4. (NB. This book is the German translation of "The Programmer's PC Sourcebook" by Microsoft Press. It mentions the code page ID 854 for Spain.)
  45. ^ a b c d e f g h i j k l m n o p q r s t u v w x y z "Star LC 8021 User's Manual" (PDF). Archived (PDF) from the original on 2025-08-06. Retrieved 2025-08-06.
  46. ^ IBM. "SBCS code page information document - CPGID 00437". Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  47. ^ "Encoding". WHATWG. 2025-08-06. sec. 4.2 Names and labels. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
  48. ^ "Usage Statistics of Character Encodings for Websites, (updated daily)". w3techs.com. Retrieved 2025-08-06.
  49. ^ "UTF-8 Usage Statistics". trends.builtwith.com. Archived from the original on 2025-08-06. Retrieved 2025-08-06.
edit
领盒饭是什么意思 植物神经紊乱的症状吃什么药 什么是痣 孕妇喝纯牛奶对胎儿有什么好处 浑水摸鱼是什么意思
七月七是什么日子 想长胖喝什么奶粉好 感冒喉咙痒吃什么药 淋球菌阳性是什么意思 粽子叶是什么植物的叶子
奶咖色是什么颜色 尿毒症是什么 丹青指什么 为什么会射精 蜻蜓是什么目
什么是脱敏治疗 通草长什么样图片 脑腐什么意思 tissot是什么牌子1853 排骨焖什么好吃
地区和市有什么区别hcv8jop8ns3r.cn 皂角米有什么功效hcv8jop7ns2r.cn 黑眼圈是什么病jingluanji.com 岁月静好什么意思hcv7jop6ns5r.cn 维生素b6主治什么hcv7jop9ns5r.cn
hev是什么病毒hcv8jop5ns7r.cn 二婚结婚需要什么证件hcv9jop3ns3r.cn 未免是什么意思hcv8jop4ns1r.cn 失眠什么药最好hcv8jop3ns6r.cn 薄荷有什么功效和作用hcv8jop1ns3r.cn
venus是什么星球hcv7jop5ns5r.cn 新疆在古代叫什么hcv7jop7ns4r.cn 鸟加一笔是什么字hcv7jop4ns5r.cn 水蛭是什么动物hcv8jop1ns3r.cn 牛后腿肉适合做什么hcv7jop9ns4r.cn
做月子吃什么菜hcv7jop5ns1r.cn 1月7号什么星座hcv9jop3ns2r.cn 一什么牙刷hcv9jop0ns0r.cn 热量的单位是什么hcv9jop6ns1r.cn 复方血栓通片功效作用治疗什么病hcv8jop4ns0r.cn
百度