bottom
Great WordTips!
         
Your e-mail address is safe!
Close Note

Tips.Net > WordTips Home > General > Understanding Unicode Characters

Understanding Unicode Characters

Summary: Unicode is a character-encoding scheme that works with a huge variety of characters. This tip explains what Unicode is and how it works with Word and Windows. (This tip works with Microsoft Word 97, Word 2000, Word 2002, Word 2003, and Word 2007.)

You may have heard of the term Unicode before, and wondered what it meant. Normal single-byte encoding schemes (such as ASCII and ANSI) allow only up to 256 unique individual characters to be encoded and displayed on the computer. In the global computer community, where each member is required to work in their own language, this is a problem. There are far more than 256 characters in common use throughout the world.

This is where Unicode comes into play. The Unicode standard requires the allocation of two bytes (sixteen bits) for encoding each character. This means that there can be 65,536 unique characters defined. This standard, devised and promoted by the Unicode Consortium (http://www.unicode.org/), allows for the display of virtually all the unique language characters in the world. A team of computer professionals, linguists, and scholars worked on the actual development of Unicode.

The use of two bytes to define each character means that Unicode can be used to encode most of the characters used in the world's major languages. There is an extension mechanism built into the standard, as well, which means that it is possible to encode close to a million more characters, if necessary. This ability should be sufficient for all known language requirements, plus the encoding of all the historic scripts of the world. (This includes languages and symbols that are no longer in use.)

As presently defined, Unicode 5.0 includes codes for characters used in the major written languages of the world, including Arabic, Armenian, Balinese, Bengali, Bopomofo, Buhid, Canadian Syllabics, Cherokee, Cyrillic, Deseret, Devanagari, Ethiopic, Georgian, Gothic, Greek, Gujarati, Gurmukhi, Han, Hangul, Hanunóo, Hebrew, Hiragana, Kannada, Katakana, Khmer, Lao, Latin, Malayalam, Mongolian, Myanmar, Ogham, Old Italic (Etruscan), Oriya, Phoenician, Runic, Sinhala, Syriac, Tagalog, Tagbanwa, Tamil, Telugu, Thaana, Thai, Tibetan, and Yi. Work is progressing to add more characters from lesser-known languages.

In addition, Unicode also includes many different symbols, including numbers, general diacritics, general punctuation, general symbols, dingbats, arrows, blocks, box drawing forms, geometric shapes, mathematical symbols, musical symbols (western and byzantine), technical symbols, braille patterns, and Kangxi radicals.

Unicode is supported in all modern versions of Windows and Word.

Tip #1788 applies to Microsoft Word versions: 97 | 2000 | 2002 | 2003 | 2007


Take Control! Master the real power behind Word! Successfully master the secrets of powerful formatting and create documents that stand out from the rest. Best of all, you can create documents that are easy to maintain and quick to change.
 
Check out WordTips: Styles and Templates today!

Helpful Links

Ask a Word Question
Make a Comment

Tips.Net Home
Tips.Net Store

WordTips FAQ
WordTips Premium

Learn Access Now

Beauty Tips
Car Tips
Cleaning Tips
College Tips
Cooking Tips
Excel2007 Tips
ExcelTips
Family Tips
Gardening Tips
Health Tips
Home Tips
Money Tips
Organizing Tips
Pest Tips
Pet Tips
Word2007 Tips
WordTips

Advertise on the
WordTips Site

 

Great Info!

Get tips like this every week in WordTips, a free productivity newsletter. Enter your e-mail address and click "Subscribe."
     
(Your e-mail address will never be shared with anyone, ever.)