NIH | National Cancer Institute | NCI Wiki  

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

divii_mlsmr.csv NSC to PubChem SID for the Diversity Set.

2D structures

All open Open (June 2016 Release) 284176 compounds. 81 MB compressed, uncompresses to 710 MB

All Open (Sept 2014 Release) 280816 compounds. 78 MB compressed, uncompresses to 732 MB

All Open (March 2012 Release) 273885 compounds. 64 MB compressed, uncompresses to 648 MB

...

Converted using Babel  4.2 MB compressed using standard Unix compress, uncompresses to ca. 15 MB. (zip compressed) The program Babel v. 1.6 was used to convert 3D coordinates, which had been generated by the program Corina v. 1.7 from the connection tables. (Babel needs 3D coordinates when reading SD files.) The resulting Babel output was modified by simple string substitution to solve the problem of nitro groups lacking formal charges, which leads many SMILES readers to create an -N-O-H group. Thus, N(=O)O was replaced by [N+](=O)[O-], and N(=O)(O) was replaced by [N+](=O)([O-]).

Converted using CACTVS 4.4 MB compressed using standard Unix compress, uncompresses to ca. 15 MB. (zip compressed ) The program CACTVS v. 3.2 was used to convert the connection tables to SMILES strings. Thanks to Wolf-Dietrich Ihlenfeldt for providing us with the conversion scripts handling the formal charge problem and other 'unusual stuff' in the NCI database.

...