NIH | National Cancer Institute | NCI Wiki  

WIKI MAINTENANCE NOTICE

Please be advised that NCI Wiki will be undergoing maintenance on Thursday, May 23rd between 1200 ET and 1300 ET.
Wiki will remain available, but users may experience screen refreshes or HTTP 502 errors during the maintenance period. If you encounter these errors, wait 1-2 minutes, then refresh your page.

If you have any questions or concerns, please contact the CBIIT Atlassian Management Team.

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Made this page more 508-compliant, by adding alt-text to images.

Scrollbar
iconsfalse

Protégé and UTF-8

Protégé is utf-8 compatible, which means it can process and display utf-8 characters. In the course of editing, many editors prepare their concepts or concept information in Microsoft Word or Excel. They then copy from Microsoft and paste into Protégé. This can cause problems because Microsoft is not purely utf-8 compatible. The paste operation can introduce characters that Protégé does not know how to process. The instructions below show how to avoid these problems.

...

If you are running on a Microsoft platform, or cut and paste from documents produced by Microsoft software, or even allow comments to be posted by people who might be doing one of the above, you need to be aware of the 27 differences, summarized by the following table.

character

win-1252 decimal

win-1252 hex

win-1252 octal

unicode html

unicode xml

unicode url

128

80

200

Code Block
€
Code Block
€

%E2%82%AC

130

82

202

Code Block
‚
Code Block
‚

%E2%80%9A

ƒ

131

83

203

Code Block
ƒ
Code Block
ƒ

%C6%92

132

84

204

Code Block
„
Code Block
„

%E2%80%9E

133

85

205

Code Block
…
Code Block
…

%E2%80%A6

134

86

206

Code Block
†
Code Block
†

%E2%80%A0

135

87

207

Code Block
‡
Code Block
‡

%E2%80%A1

ˆ

136

88

210

Code Block
ˆ
Code Block
ˆ

%CB%86

137

89

211

Code Block
‰
Code Block
‰

%E2%80%B0

Š

138

8A

212

Code Block
Š
Code Block
Š

%C5%A0

139

8B

213

Code Block
‹
Code Block
‹

%E2%80%B9

Œ

140

8C

214

Code Block
Œ
Code Block
Œ

%C5%92

Ž

142

8E

216

Code Block
&x17D;
Code Block
Ž

%C5%BD

145

91

221

Code Block
‘
Code Block
‘

%E2%80%98

146

92

222

Code Block
’
Code Block
’

%E2%80%99

 

147

 

93

 

223

 
Code Block
<code>&</code>
&ldquo;
Code Block
&#x201C;
 

%E2%80%9C

 

148

 

94

 

224

 
Code Block
&rdquo;
Code Block
&#x201D;
 

%E2%80%9D

 

149

 

95

 

225

 
Code Block
&bull;
Code Block
&#x2022;
 

%E2%80%A2

 

150

 

96

 

226

 
Code Block
&ndash;
Code Block
&#x2013;

%E2%80%93

 

 

151

 

97

 

227

 
Code Block
&mdash;
Code Block
&#x2014;

%E2%80%94

 

˜

 

152

 

98

 

230

 
Code Block
&tilde;
Code Block
&#x2DC;
 

%CB%9C

 

153

 

99

 

231

 
Code Block
&trade;
Code Block
&#x2122;
 

%E2%84%A2

š

 

154

 

9A

 

232

 
Code Block
&scaron;
Code Block
&#x161;

%C5%A1

 

 

155

 

9B

 

233

 
Code Block
&rsaquo;
Code Block
&#x203A;

%E2%80%BA

 

œ

 

156

 

9C

 

234

 
Code Block
&oelig;
Code Block
&#x153;

%C5%93

 

ž

 

158

 

9E

 

236

 
Code Block
&#x17E;
Code Block
&
#x17E;

%C5%BE

Ÿ

159

9F

237

Code Block
&Yuml;
Code Block
&#x178;

%C5%B8


160

A0


Code Block
&nbsp;
Code Block
 

%C2%A0

82112013
Code Block
&#8211;
Code Block
 

82122014
Code Block
&#8212;
Code Block
 

82172019
Code Block
&#8217;
Code Block

 

 

 

 

 

&

&

 

 

 

 

 

&

&

 

 

 

 

 

&

&
 








Converting Microsoft Characters to UTF-8 in Word 2003

Thanks to Liverpool John Mores University for providing the following instructions:

  • Once your editing in Word is complete, choose File->Save As...
    Plain Text option in the File Name field of the Save As dialog box.Image Modified
  • Choose from the format drop-menu the option 'Plain Text (.txt)*'
  • Save the file to a known location, your desktop for example.
  • Before the file saves, a dialog box will appear asking you about encoding: Choose 'other encoding'.
  • Then make sure you check the 'Allow Character Substitution' box.
  • Your document is then previewed, and you will see all characters such as 'curly quotes' are replaced with 'safe' ones.
    File Conversion dialog box. Image Modified
    You can then open the saved .txt file and safely copy the contents you require into a web page that uses UTF-8 encoding.

...

  • Go to File > Save As
  • In the lower left you will see the option "Tools"
    screenshot illustrating step
  • Within the Tools drop down, select Web Options
    screenshot illustrating step
  • In the Web Options dialog, go to the Encoding tab and select UTF-8
    screenshot illustrating step

Scrollbar
iconsfalse