Page History
Wiki Markup |
---|
{scrollbar:icons=false} h1. Protégé and UTF-8 Protégé is utf-8 compatible, which means it can process and display utf-8 characters. In the course of editing, many editors prepare their concepts or concept information in Microsoft Word or Excel. They then copy from Microsoft and paste into Protégé. This can cause problems because Microsoft is not purely utf-8 compatible. The paste operation can introduce characters that Protégé does not know how to process. The instructions below show how to avoid these problems. h2. Table of characters in Microsoft and their utf-8 equivalents Thanks to [Interwingly|http://intertwingly.net/stories/2004/04/14/i18n.html] for providing this table. If you are running on a Microsoft platform, or cut and paste from documents produced by Microsoft software, or even allow comments to be posted by people who might be doing one of the above, you need to be aware of the 27 differences, summarized by the following table. {highlight:color=yellow}Table to be provided || ||win-1252|| || || unicode|| || || ||character||decimal ||hex ||octal ||html ||xml ||url || |€ |128 |80 |200 | € | € |%E2%82%AC | |‚ |130 |82 |202 |‚ |‚ |%E2%80%9A | |ƒ |131 |83 |203 |ƒ |ƒ |%C6%92 | |„ |132 |84 |204 |„ |„ |%E2%80%9E | |… |133 |85 |205 |… |&#x2026 |%E2%80%A6 | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | | | | | |& |& | | ‚ 130 82 202 ‚ ‚ %E2%80%9A ƒ 131 83 203 ƒ ƒ %C6%92 „ 132 84 204 „ „ %E2%80%9E … 133 85 205 … … %E2%80%A6 † 134 86 206 † † %E2%80%A0 ‡ 135 87 207 ‡ ‡ %E2%80%A1 ˆ 136 88 210 ˆ ˆ %CB%86 ‰ 137 89 211 ‰ ‰ %E2%80%B0 Š 138 8A 212 Š Š %C5%A0 ‹ 139 8B 213 ‹ ‹ %E2%80%B9 Œ 140 8C 214 Œ Œ %C5%92 Ž 142 8E 216 Ž Ž %C5%BD ‘ 145 91 221 ‘ ‘ %E2%80%98 ’ 146 92 222 ’ ’ %E2%80%99 “ 147 93 223 “ “ %E2%80%9C ” 148 94 224 ” ” %E2%80%9D • 149 95 225 • • %E2%80%A2 -- 150 96 226 -- -- %E2%80%93 --- 151 97 227 --- --- %E2%80%94 ˜ 152 98 230 ˜ ˜ %CB%9C ™ 153 99 231 ™ ™ %E2%84%A2 š 154 9A 232 š š %C5%A1 › 155 9B 233 › › %E2%80%BA œ 156 9C 234 œ œ %C5%93 ž 158 9E 236 ž ž %C5%BE Ÿ 159 9F 237 Ÿ Ÿ %C5%B8{highlight} h2. Converting Microsoft Characters to UTF-8 in Word 2003 Thanks to [Liverpool John Mores University|http://www.ljmu.ac.uk/cis/webpublishing/81434.htm] for providing the following instructions: \\ * Once your editing in Word is complete, choose *File->Save As...* !saveas.gif|alt=”screenshot illustrating step”! * Choose from the format drop-menu the option '*Plain Text (*.txt)*' * Save the file to a known location, your desktop for example. * Before the file saves, a dialog box will appear asking you about encoding: Choose '*other encoding*'. * Then make sure you check the '*Allow Character Substitution*' box. * Your document is then previewed, and you will see all characters such as 'curly quotes' are replaced with 'safe' ones. !Unicode2003.gif|alt=”screenshot illustrating step”! You can then open the saved .txt file and safely copy the contents you require into a web page that uses UTF-8 encoding. h2. Converting Microsoft Characters to UTF-8 in Word and Excel 2007 For Word and Excel 2007, the instructions are the same * Go to File > Save As * In the lower left you will see the option "Tools" !tools.jpg|alt="screenshot illustrating step"! * Within the Tools drop down, select Web Options !WebOptions.jpg|alt="screenshot illustrating step"! * In the Web Options dialog, go to the Encoding tab and select UTF-8 !unicode.jpg|alt="screenshot illustrating step"! {scrollbar:icons=false} |