[NCI DTP Data] Mail FeedConfluence Syndication Feedhttps://wiki.nci.nih.govChemical DataZhang, Cary (NIH/NCI) [C]tag:wiki.nci.nih.gov,2009:page-155844992-332024-03-26T14:32:21Z2013-09-03T12:16:14Z<div class="feed"> <p>
Page
<b>edited</b> by
<a href=" https://wiki.nci.nih.gov/display/~zhangz15
">Zhang, Cary (NIH/NCI) [C]</a>
</p>
<div style="border-top: 1px solid #ddd; border-bottom: 1px solid #ddd; padding: 10px;">
<h2 id="ChemicalData-2Dstructures">2D structures</h2><p>The downloadable file contains 2D structures as MDL text in an SD file for all released NSCs. The SD file also includes the below compound information. These data can be downloaded individually in separate files, as shown in the Other Compound Identifiers table.</p><ul><li>CAS Number</li><li>PubChem SID</li><li>SMILES Strings</li><li>Chemical Names</li></ul><div class="table-wrap"><table class="wrapped confluenceTable"><colgroup><col/><col/><col/><col/><col/></colgroup><tbody><tr><th scope="col" class="confluenceTh"><br/></th><th scope="col" class="confluenceTh">Data File Size</th><th scope="col" class="confluenceTh">Download Size</th><th style="text-align: center;" scope="col" class="confluenceTh">Link</th><th style="text-align: center;" scope="col" class="confluenceTh">Version</th></tr><tr><td class="confluenceTd">2D Structures</td><td style="text-align: center;" class="confluenceTd">512 MB</td><td style="text-align: center;" class="confluenceTd">66 MB</td><td class="confluenceTd"><a href="https://wiki.nci.nih.gov/download/attachments/155844992/2D_Structures.zip?version=1&modificationDate=1710550080260&api=v2" data-linked-resource-id="577241090" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="2D_Structures.zip" data-nice-type="Zip Archive" data-linked-resource-content-type="application/zip" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">2D_Structures.zip</a> </td><td class="confluenceTd">March 2024</td></tr></tbody></table></div><h2 id="ChemicalData-OtherCompoundIdentifiers">Other Compound Identifiers</h2><div class="table-wrap"><table class="wrapped relative-table confluenceTable" style="width: 63.221%;"><colgroup><col style="width: 31.7929%;"/><col style="width: 14.4729%;"/><col style="width: 21.2348%;"/><col style="width: 32.5013%;"/></colgroup><tbody><tr><th style="text-align: center;" scope="col" class="confluenceTh">Identifier</th><th style="text-align: center;" class="confluenceTh">File Size</th><th style="text-align: center;" scope="col" class="confluenceTh">Link</th><th style="text-align: center;" scope="col" class="confluenceTh">Version</th></tr><tr><td class="confluenceTd">CAS Numbers</td><td style="text-align: center;" class="confluenceTd">3.7 MB</td><td class="confluenceTd"><a href="https://wiki.nci.nih.gov/download/attachments/155844992/nsc_cas.csv?version=1&modificationDate=1710381783000&api=v2" data-linked-resource-id="575013769" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="nsc_cas.csv" data-linked-resource-content-type="text/csv" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">nsc_cas.csv</a></td><td style="text-align: center;" class="confluenceTd">March 2024</td></tr><tr><td class="confluenceTd">PubChem SID & CID</td><td style="text-align: center;" class="confluenceTd">6.9 MB</td><td class="confluenceTd"><a href="https://wiki.nci.nih.gov/download/attachments/155844992/nsc_sid_cid.csv?version=1&modificationDate=1710381805000&api=v2" data-linked-resource-id="575013770" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="nsc_sid_cid.csv" data-linked-resource-content-type="text/csv" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">nsc_sid_cid.csv</a></td><td style="text-align: center;" class="confluenceTd">March 2024</td></tr><tr><td class="confluenceTd">SMILES Strings</td><td style="text-align: center;" class="confluenceTd">16 MB</td><td class="confluenceTd"><a href="https://wiki.nci.nih.gov/download/attachments/155844992/nsc_smiles.csv?version=1&modificationDate=1710381820000&api=v2" data-linked-resource-id="575013771" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="nsc_smiles.csv" data-linked-resource-content-type="text/csv" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">nsc_smiles.csv</a></td><td style="text-align: center;" class="confluenceTd">March 2024</td></tr><tr><td class="confluenceTd">Chemical Names</td><td style="text-align: center;" class="confluenceTd">11 MB</td><td class="confluenceTd"><a href="https://wiki.nci.nih.gov/download/attachments/155844992/nsc_chemcal_name.csv?version=1&modificationDate=1710382716000&api=v2" data-linked-resource-id="575013774" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="nsc_chemcal_name.csv" data-linked-resource-content-type="text/csv" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">nsc_chemcal_name.csv</a></td><td style="text-align: center;" class="confluenceTd">March 2024</td></tr></tbody></table></div><h2 id="ChemicalData-OldVersions">Old Versions</h2><p><a href="https://wiki.nci.nih.gov/download/attachments/155844992/NSC_CAS_Sept2013.csv?version=1&modificationDate=1378729397000&api=v2" data-linked-resource-id="158334979" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="NSC_CAS_Sept2013.csv" data-linked-resource-content-type="text/csv" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">NSC_CAS_Sept2013.csv</a> NSC to CAS number. We only have CAS number for 130,893 NSCs.</p><p><a href="https://wiki.nci.nih.gov/download/attachments/155844992/NSC_PubChemSID.csv?version=1&modificationDate=1378730186000&api=v2" data-linked-resource-id="158334980" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="NSC_PubChemSID.csv" data-linked-resource-content-type="text/csv" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">NSC_PubChemSID.csv</a> NSC to PubChem SID. This is the SID from the DTP upload to PubChem.</p><p><a href="https://wiki.nci.nih.gov/download/attachments/155844992/divii_mlsmr.csv?version=1&modificationDate=1385557043000&api=v2" data-linked-resource-id="168919900" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="divii_mlsmr.csv" data-linked-resource-content-type="text/csv" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">divii_mlsmr.csv</a> NSC to PubChem SID for the Diversity Set.</p><h2 id="ChemicalData-2Dstructures(OldVersions)">2D structures (Old Versions)</h2><p><a href="https://wiki.nci.nih.gov/download/attachments/155844992/Chem2D_Jun2016.zip?version=1&modificationDate=1486993270000&api=v2" data-linked-resource-id="339380766" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Chem2D_Jun2016.zip" data-nice-type="Zip Archive" data-linked-resource-content-type="application/zip" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">All Open (June 2016 Release)</a> 284176 compounds. 81 MB compressed, uncompresses to 710 MB</p><p><a href="https://wiki.nci.nih.gov/download/attachments/155844992/Open_2D_Oct2014.zip?version=1&modificationDate=1414408672000&api=v2" data-linked-resource-id="245366785" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="Open_2D_Oct2014.zip" data-nice-type="Zip Archive" data-linked-resource-content-type="application/zip" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">All Open (Sept 2014 Release)</a> 280816 compounds. 78 MB compressed, uncompresses to 732 MB</p><p><a href="https://wiki.nci.nih.gov/download/attachments/155844992/March2012_2d.zip?version=1&modificationDate=1378211224000&api=v2" data-linked-resource-id="155975729" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="March2012_2d.zip" data-nice-type="Zip Archive" data-linked-resource-content-type="application/zip" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">All Open (March 2012 Release)</a> 273885 compounds. 64 MB compressed, uncompresses to 648 MB</p><p><a href="https://wiki.nci.nih.gov/download/attachments/155844992/mechset2d.sdf?version=1&modificationDate=1378211104000&api=v2" data-linked-resource-id="155975727" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="mechset2d.sdf" data-linked-resource-content-type="application/octet-stream" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">Mechanistic Set </a></p><h2 id="ChemicalData-3Dstructures(OldVersions)">3D structures (Old Versions)</h2><p><a href="https://wiki.nci.nih.gov/download/attachments/155844992/mechset3d.sdf?version=1&modificationDate=1378211142000&api=v2" data-linked-resource-id="155975728" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="mechset3d.sdf" data-linked-resource-content-type="application/octet-stream" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">Mechanistic Set</a></p><h2 id="ChemicalData-SMILESstrings(OldVersions)">SMILES strings (Old Versions)</h2><p>SMILES strings - 237,771 structures in <a href="http://www.daylight.com/dayhtml/smiles/" class="external-link" rel="nofollow">SMILES</a> format. This database contains essentially all open structures in the NCI database up until about June, 1995. It includes metal-containing compounds and other 'weird stuff'. It is therefore up to the user to ascertain the usefulness of any of these SMILES strings for the intended purpose. Because different conversion programs produce different output, two versions of the SMILES database are provided.</p><p><a href="https://wiki.nci.nih.gov/download/attachments/155844992/NCIOPENB_SMI.BIN?version=1&modificationDate=1378210574000&api=v2" data-linked-resource-id="155975725" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="NCIOPENB_SMI.BIN" data-linked-resource-content-type="application/macbinary" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">Converted using Babel</a> 4.2 MB compressed using standard Unix compress, uncompresses to ca. 15 MB. (<a href="https://wiki.nci.nih.gov/download/attachments/155844992/nciopenb_smi.zip?version=1&modificationDate=1476371553000&api=v2" data-linked-resource-id="329155474" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="nciopenb_smi.zip" data-nice-type="Zip Archive" data-linked-resource-content-type="application/zip" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">zip compressed</a>) The program <a href="http://openbabel.org/wiki/FAQ" class="external-link" rel="nofollow">Babel</a> v. 1.6 was used to convert 3D coordinates, which had been generated by the program <a href="http://www.molecular-networks.com/software/corina/index.html" class="external-link" rel="nofollow">Corina</a> v. 1.7 from the connection tables. (Babel needs 3D coordinates when reading SD files.) The resulting Babel output was modified by simple string substitution to solve the problem of nitro groups lacking formal charges, which leads many SMILES readers to create an -N-O-H group. Thus, N(=O)O was replaced by [N+](=O)[O-], and N(=O)(O) was replaced by [N+](=O)([O-]).</p><p><a href="https://wiki.nci.nih.gov/download/attachments/155844992/NCIOPENC_SMI.BIN?version=1&modificationDate=1378210574000&api=v2" data-linked-resource-id="155975726" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="NCIOPENC_SMI.BIN" data-linked-resource-content-type="application/macbinary" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">Converted using CACTVS</a> 4.4 MB compressed using standard Unix compress, uncompresses to ca. 15 MB. (<a href="https://wiki.nci.nih.gov/download/attachments/155844992/nciopenc_smi.zip?version=1&modificationDate=1476371593000&api=v2" data-linked-resource-id="329155475" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="nciopenc_smi.zip" data-nice-type="Zip Archive" data-linked-resource-content-type="application/zip" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">zip compressed </a>) The program <a href="http://www2.ccc.uni-erlangen.de/software/cactvs/" class="external-link" rel="nofollow">CACTVS</a> v. 3.2 was used to convert the connection tables to SMILES strings. Thanks to <a href="mailto://wdi@xemistry.com" class="external-link" rel="nofollow">Wolf-Dietrich Ihlenfeldt</a> for providing us with the conversion scripts handling the formal charge problem and other 'unusual stuff' in the NCI database.</p><h2 id="ChemicalData-ChemicalNames(OldVersions)">Chemical Names (Old Versions)</h2><p><a href="https://wiki.nci.nih.gov/download/attachments/155844992/chemnames_Aug2013.zip?version=1&modificationDate=1378214926000&api=v2" data-linked-resource-id="155975748" data-linked-resource-version="1" data-linked-resource-type="attachment" data-linked-resource-default-alias="chemnames_Aug2013.zip" data-nice-type="Zip Archive" data-linked-resource-content-type="application/zip" data-linked-resource-container-id="155844992" data-linked-resource-container-version="33">chemnames_Aug2013.zip</a> All chemical names available. First field is NSC number, second filed is the name and the third field is the name type (most just generic "Chemical Name"). The field separator is a "|". Note that we do not have chemical names for most of the compounds and many of the names that are there are systematic names that might not be very useful for searching.</p><h4 id="ChemicalData-Nametypes">Name types</h4><ul><li>USAN - <a href="http://en.wikipedia.org/wiki/United_States_Adopted_Name" class="external-link" rel="nofollow">United States Adopted Name</a></li><li>VAN - CAS valid ambiguous name</li><li>DOT Name - Department of Transportation Name</li><li>Brand Name</li><li><p>6th C.I.</p></li><li><p>7th C.I.</p></li><li><p>8th C.I.</p></li><li><p>9th C.I.</p></li><li><p>Belgian Name</p></li><li><p>Czech Name</p></li><li><p>Dutch Name</p></li><li><p>French Name</p></li><li><p>German Name</p></li><li><p>Hindustan Name</p></li><li><p>INN:BAN</p></li><li><p>Italian Name</p></li><li><p>Japanese Name</p></li><li><p>Persian Name</p></li><li><p>Polish Name</p></li><li><p>Portuguese Name</p></li><li><p>Roumanian Name</p></li><li><p>Russian Name</p></li><li><p>Spanish Name</p></li><li><p>Swedish Name</p></li><li><p>Chemical Name - not in one of the other categories or unknown type</p><p><br/></p></li></ul>
</div>
<div style="padding: 10px 0;">
<a href="https://wiki.nci.nih.gov/display/NCIDTPdata/Chemical+Data">View Online</a>
·
<a href="https://wiki.nci.nih.gov/pages/diffpagesbyversion.action?pageId=155844992&revisedVersion=33&originalVersion=32">View Changes Online</a>
</div>
</div>Zhang, Cary (NIH/NCI) [C]2013-09-03T12:16:14Z