InChI generation mystery

I’ve been playing with InChIs these days quite a bit and have been quite suprised by the results that I got. The one thing that is very surprising indeed, is that the InChI - and therefore the InChIKey, should one want to use these to index a chemical compound database - one gets, seems to depend on the input format for the InChI generator.

After having seen some more InChI related things atthe BioIT World yesterday, I thought I’d have another go at some InChI stuff. And I was again quite surprised to what I found, and actually even more confused.

Here’s what I did:

I took two molecules from PubChem (diaisostereomers) cis- and trans-1,2-dihydroxy-cyclohexane (1,2-cyclohexanediol). I created InChIKeys for both of them, using different inputs. And not at all did I end up with two different InChIs for these two structures that were the same.

The SMILES and InChIKey from PubChem are the following:
cis-1,2-cyclohexanediol: C1CC[C@@H]([C@@H](C1)O)O - PFURGBBHAOXLIO-OLQVQODUSA-N
trans-1,2-cyclohexanediol: C1CC[C@H]([C@@H](C1)O)O - PFURGBBHAOXLIO-PHDIDXHHSA-N

I used the Standard InChI generator v1.2 released in January 2009 by the IUPAC, more specifically the Linux binary. Because this software doesn’t read SMILES, one has to convert the SMILES input molecules to SD files. This was done with OpenBabel and Pipeline Pilot. In addition, I used CML obtained from OpenBabel as input, too.

And here is what I obtained:

Source file format

Source file from

Coordinates

InChIKey

Same as PubChem

Distinct stereo

SDF

OpenBabel

0D

cis

PFURGBBHAOXLIO-UHFFFAOYSA-N

N

N

trans

PFURGBBHAOXLIO-UHFFFAOYSA-N

N

SDF

OpenBabel

3D (OB –gen3D)

cis

PFURGBBHAOXLIO-OLQVQODUSA-N

Y

N

trans

PFURGBBHAOXLIO-OLQVQODUSA-N

N

CML

OpenBabel

0D

cis

FWITZFBVZWAIRX-OLQVQODUSA-N

N

Y

trans

FWITZFBVZWAIRX-PHDIDXHHSA-N

N

CML

OpenBabel

3D (OB –gen3D)

cis

PFURGBBHAOXLIO-OLQVQODUSA-N

Y

N

trans

PFURGBBHAOXLIO-OLQVQODUSA-N

N

SDF

Scitegic PipelinePilot

0D

Cis

PFURGBBHAOXLIO-UHFFFAOYSA-N

Y

N

trans

PFURGBBHAOXLIO-UHFFFAOYSA-N

N

SDF

Scitegic PipelinePilot

2D

Cis

PFURGBBHAOXLIO-OLQVQODUSA-N

Y

Y

trans

PFURGBBHAOXLIO-PHDIDXHHSA-N

Y

SDF

Scitegic PipelinePilot

3D

Cis

PFURGBBHAOXLIO-OLQVQODUSA-N

Y

Y

trans

PFURGBBHAOXLIO-PHDIDXHHSA-N

Y

CML

OpenBabel conversion of Scitegic PP files

0D

Cis

DQQAEJAQEMHJBB-UHFFFAOYSA-N

N

N

trans

DQQAEJAQEMHJBB-UHFFFAOYSA-N

N

CML

OpenBabel conversion of Scitegic PP files

3D

Cis

DQQAEJAQEMHJBB-UHFFFAOYSA-N

N

N

trans

DQQAEJAQEMHJBB-UHFFFAOYSA-N

N

CML

OpenBabel conversion of Scitegic PP 0D SD file

3D (OB –gen3D)

Cis

PFURGBBHAOXLIO-OLQVQODUSA-N

Y

N

trans

PFURGBBHAOXLIO-OLQVQODUSA-N

N

Now, this raises the following questions:

  1. First of all, why do I not just get the same InChI for everything? Isn’t it supposed to canonicalize the structure to a fair extent (I know it does) and come up with the same InChI?
  2. If there are different ways of getting a presumably valid InChI for a compound, which one should one take?
  3. Why do I get the same InChI for different stereoisomers when the 3D coordinates come from OpenBabel, but not if the coordinates come from Pipeline Pilot? Is the 3D coordinate generation in OpenBabel the problem?
  4. Why do I get the correst stereoinformation in the InChIKey when I use CML without any coordinates?
  5. Why do I get a different hash for the connectivity part when I use CML without any coordinates? FWITZFBVZWAIRX instead of PFURGBBHAOXLIO?
  6. And then, if CML without 3D coordinates gives the right stereochemistry in the InChI, why doesn’t 3D CML with coordinates from OpenBabel?
  7. 3D CML from PipelinePilot coordinates also doesn’t get the stereochemistry part right; instead, there’s yet a different connectivity part DQQAEJAQEMHJBB.
  8. Why should the stereochemistry come out right when 2D coordinates are used, but not if 0D coordinates are used?
  9. Why not get InChIs directly from SMILES?

Any comments on this anybody? I seriously wonder.

Tags: , ,

7 Responses to “InChI generation mystery”

  1. baoilleach Says:

    Over the last two months Tim Vandermeesch has rewritten the stereo
    handling code in OpenBabel. Stereo flipping will be a thing of the
    past with OB 2.3. You can try the latest from his or my Git repository
    on github. I’ve just ported the InChI code but not yet CML.

    - Noel

  2. Geoffrey Hutchison Says:

    I think it’s pretty clear from the Open Babel 2.2.0 release notes that “stereochemistry may or may not be reliable” with 3D coordinate generation. It’s right there in the release notes and has been mentioned repeatedly on the mailing list.

    So that’s your answer to #3.

  3. Geoffrey Hutchison Says:

    Oh — your “take-home message” is about getting InChI directly from SMILES. Yes, that should clearly work. Of course as your post indicates, stereochemistry is not as rock-solid as it should be.

  4. Flo Says:

    Thanks a lot Noel and Geoffrey for these answers. I will do the same experiments again tomorrow and see what I get with a newer version of OpenBabel!

  5. Flo Says:

    @ Geoffrey: It’s not intended to be the take-home-message, it just ended up being the last point unintentionally. But since all info is in a SMILES, it think it should work directly to go from SMILES to InChI, without the need to do that explicitly via SDF, CML, or anything else. I had a brief look yesterday at the InChI source to see where to pass in a SMILES (parsed to the appropriate structure for InChI) that could easily come from the OpenBabel API.

    As an aside, I also didn’t know that OpenBabel does InChIs by now (I haven’t read up on the docs/usage info for a while it seems…).

  6. baoilleach Says:

    I just checked this myself. InChI and SMILES are working (and agree with PubChem). SDF is working for 3D but not yet 2D. CML has not yet been converted.

    You can do the SMILES –> InChIKey conversion through Python/OpenBabel. See http://baoilleach.blogspot.com/2008/10/generating-inchis-mini-me-inchikey.html

  7. chi Says:

    Hi guys,

    I’m looking for the simplest way of converting smiles to jme format using a java based class.. At moment using a cartridge to search through my smiles oracle table and servellets. This returns the SMILES but i need to visualise these using JME…

    chi

Leave a Reply