Post, post, post
April 6th, 2010Was just reading some interesting articles and thought it might be worth to share some of them here:
Was just reading some interesting articles and thought it might be worth to share some of them here:
I just remembered that I was meant to post that thing here, too. Problem is the following: you want to retrieve all values of a python dictionary whose keys match a certain regular expression. Solution: Just make a custom dictionary class derived from the builtin one:
class redict(dict):
def __init__(self, d):
dict.__init__(self, d)
def __getitem__(self, regex):
r = re.compile(regex)
mkeys = filter(r.match, self.keys())
for i in mkeys:
yield dict.__getitem__(self, i)
With this you can do the following:
>>> keys = ["a", "b", "c", "ab", "ce", "de"]
>>> vals = range(0,len(keys))
>>> red = redict(zip(keys, vals))
>>> for i in red[r"^.e$"]:
... print i
...
5
4
>>>
I’ve been playing with InChIs these days quite a bit and have been quite suprised by the results that I got. The one thing that is very surprising indeed, is that the InChI - and therefore the InChIKey, should one want to use these to index a chemical compound database - one gets, seems to depend on the input format for the InChI generator.
After having seen some more InChI related things atthe BioIT World yesterday, I thought I’d have another go at some InChI stuff. And I was again quite surprised to what I found, and actually even more confused.
Here’s what I did:
I took two molecules from PubChem (diaisostereomers) cis- and trans-1,2-dihydroxy-cyclohexane (1,2-cyclohexanediol). I created InChIKeys for both of them, using different inputs. And not at all did I end up with two different InChIs for these two structures that were the same.
The SMILES and InChIKey from PubChem are the following:
cis-1,2-cyclohexanediol: C1CC[C@@H]([C@@H](C1)O)O - PFURGBBHAOXLIO-OLQVQODUSA-N
trans-1,2-cyclohexanediol: C1CC[C@H]([C@@H](C1)O)O - PFURGBBHAOXLIO-PHDIDXHHSA-N
I used the Standard InChI generator v1.2 released in January 2009 by the IUPAC, more specifically the Linux binary. Because this software doesn’t read SMILES, one has to convert the SMILES input molecules to SD files. This was done with OpenBabel and Pipeline Pilot. In addition, I used CML obtained from OpenBabel as input, too.
And here is what I obtained:
|
Source file format |
Source file from |
Coordinates |
InChIKey |
Same as PubChem |
Distinct stereo |
|
|
SDF |
OpenBabel |
0D |
cis |
PFURGBBHAOXLIO-UHFFFAOYSA-N |
N |
N |
|
trans |
PFURGBBHAOXLIO-UHFFFAOYSA-N |
N |
||||
|
SDF |
OpenBabel |
3D (OB –gen3D) |
cis |
PFURGBBHAOXLIO-OLQVQODUSA-N |
Y |
N |
|
trans |
PFURGBBHAOXLIO-OLQVQODUSA-N |
N |
||||
|
CML |
OpenBabel |
0D |
cis |
FWITZFBVZWAIRX-OLQVQODUSA-N |
N |
Y |
|
trans |
FWITZFBVZWAIRX-PHDIDXHHSA-N |
N |
||||
|
CML |
OpenBabel |
3D (OB –gen3D) |
cis |
PFURGBBHAOXLIO-OLQVQODUSA-N |
Y |
N |
|
trans |
PFURGBBHAOXLIO-OLQVQODUSA-N |
N |
||||
|
SDF |
Scitegic PipelinePilot |
0D |
Cis |
PFURGBBHAOXLIO-UHFFFAOYSA-N |
Y |
N |
|
trans |
PFURGBBHAOXLIO-UHFFFAOYSA-N |
N |
||||
|
SDF |
Scitegic PipelinePilot |
2D |
Cis |
PFURGBBHAOXLIO-OLQVQODUSA-N |
Y |
Y |
|
trans |
PFURGBBHAOXLIO-PHDIDXHHSA-N |
Y |
||||
|
SDF |
Scitegic PipelinePilot |
3D |
Cis |
PFURGBBHAOXLIO-OLQVQODUSA-N |
Y |
Y |
|
trans |
PFURGBBHAOXLIO-PHDIDXHHSA-N |
Y |
||||
|
CML |
OpenBabel conversion of Scitegic PP files |
0D |
Cis |
DQQAEJAQEMHJBB-UHFFFAOYSA-N |
N |
N |
|
trans |
DQQAEJAQEMHJBB-UHFFFAOYSA-N |
N |
||||
|
CML |
OpenBabel conversion of Scitegic PP files |
3D |
Cis |
DQQAEJAQEMHJBB-UHFFFAOYSA-N |
N |
N |
|
trans |
DQQAEJAQEMHJBB-UHFFFAOYSA-N |
N |
||||
|
CML |
OpenBabel conversion of Scitegic PP 0D SD file |
3D (OB –gen3D) |
Cis |
PFURGBBHAOXLIO-OLQVQODUSA-N |
Y |
N |
|
trans |
PFURGBBHAOXLIO-OLQVQODUSA-N |
N |
||||
Now, this raises the following questions:
Any comments on this anybody? I seriously wonder.
When importing MySQLdb 1.2.2 into python 2.6.1 an error is reported:
Python 2.6.1 (r261:67515, Dec 7 2008, 08:27:41)
[GCC 4.3.2] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import MySQLDB
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
ImportError: No module named MySQLDB
>>> import MySQLdb
/usr/lib/python2.6/site-packages/MySQLdb/__init__.py:34: DeprecationWarning: the sets module is deprecated
from sets import ImmutableSet
This is because the sets module is not built-in into the core distribution of python.
To get rid of this error and use the more efficient built-in set
type, do the following in the __init__.py file that was reported:
* comment line 34:
When importing MySQLdb 1.2.2 into python 2.6.1 an error is reported:
Python 2.6.1 (r261:67515, Dec 7 2008, 08:27:41)
[GCC 4.3.2] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import MySQLDB
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
ImportError: No module named MySQLDB
>>> import MySQLdb
/usr/lib/python2.6/site-packages/MySQLdb/__init__.py:34: DeprecationWarning: the sets module is deprecated
from sets import ImmutableSet
This is because the sets module is not built-in into the core distribution of python.
To get rid of this error and use the more efficient built-in set
type, do the following in the __init__.py file that was reported:
* comment line 34: from sets import ImmutableSet
* add after that line: ImmutableSet = frozenset
* comment line 41 in the original file: from sets import BaseSet
* add after that line: BaseSet = set
Like this the built-in types will be used anytime a BaseSet or an ImmutableSet is referenced.
Now everything works fine:
Python 2.6.1 (r261:67515, Dec 7 2008, 08:27:41)
[GCC 4.3.2] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import MySQLdb
>>> MySQLdb.version_info
(1, 2, 2, ‘final’, 0)
>>>
NOTE - UPDATE: While the following will install numpy 1.2.1, it’s perhaps not the best solution. If interested just inform yourself about the Apple system python as opposed to a standalone framework python installation.
That stuff bugged me for a while, now it seems as if I found a pretty decent solution to this.
Leopard comes with python 2.5.1 preinstalled and also includes numpy 1.0.1 (if I remember well). If you want to install something that uses a newer numpy (e.g., matplotlib) then you need to upgrade numpy. There are all these issues about the differing versions/distributions of python on Apple computers, I won’t dwell on that. If you don’t want to upgrade your python distribution that there’s an easier way:
1) Set the following symbolic link: (see http://wiki.python.org/moin/MacPython/Leopard)
cd /Library/Frameworks
sudo ln -s /System/Library/Frameworks/Python.framework/ Python.framework
This allows you to use the ‘normal’ installer for numpy, see step 2.
2) Install the numpy 1.2.1 .dmg from here: Download NumPy
3) Your numpy installation ends up here:
/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/
3.5) You may want to make a backup of the old numpy if there is one here:
/Library/Python/2.5/site-packages/ (just rename it, delete it, chmod 000 it, whatever)
4) Move it away from the Framework location:
cd /System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/
sudo mv numpy* /Library/Python/2.5/site-packages/
5) DONE:
hostname$ python
Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import numpy
>>> numpy.__version__
‘1.2.1′
>>>
And that’s it! The next things now will be to get matplotlib to work, the easy_install does not yet want to work, but that has nothing to do with numpy anymore! This time it’s rather freetype2…
As I am leaving Cambridge, I am also not going to update my homepage there any more. Instead, I will put the relevant things online here.
My list of publications can now be accessed here: http://flo.nigsch.com/?page_id=43.
The old homepage can be found here: http://www-mitchell.ch.cam.ac.uk/florian.html
There are lots of free books available on the internet. This page summarises some very good sites for books on topics such as computer programming, literature, etc.
Just found some interesting stuff concerning functional programming using Python. Here are the links:
These articles are from a larger collection of pages on Python called Charming Python on the IBM website.
There are actually a few bugs in this version. Shortly I’ll put up a new version here.
Couldn’t find a program to get all ring systems out of an SDF file, along with how often they occur. So I wrote a script in python that does exactly that job. It gets the smallest set of smallest rings (SSSR) from pybel: once a molecule is read in (e.g., mol=pybel.readstring(”smi”, smiles)), then you have the SSSR in mol.sssr, which is a vector of OBRing objects (see the OpenBabel documentation for more info about that).
You can iterate over this vector in a standard pythonic fashion, e.g., for ring in mol.sssr: pass. The ring size is easily accessed by ring.PathSize(), the atoms in the ring are stored in the member variable _path, e.g., ring._path will give you the atoms in the ring.
The script checks for fused ring systems by identifying any shared atoms between any members of the SSSR. This is achieved by intersection of the sets of member atoms of any two ring systems. Two rings are considered to be a ring system if they share at least one atom, i.e., strictly speaking it is not fused but rather a spiro system. This behaviour can be changed by changing if len(intsec) in function GetFusedRingsMatrix(mol) to if len(intsec) > 1.
Should you want to get all individual elements of the SSSR, instead of the fused/linked rings as one ring system, then it should suffice to supply a manually crafted matrix as the one returned by GetFusedRingsMatrix(). That would be something like l=len(mol.sssr) and FusedRingsMatrix = [[0 for x in range(l)] for y in range(l)]. Then all rings are supposed to be unconnected.
The script also includes exocyclic double bonds as part of rings they may be linked to.
All fragments are written out to a file fragments.sdf, and the number how often that fragment was encountered in the structure file supplied is written into the SDF field COUNT. If you watch the output with something like mview fragments.sdf (MarvinView from ChemAxon) it will look similar to the picture displayed. ![]()
The script can be found here: Python script for extraction of Murcko fragments
I just tried to install openbabel and its python wrapper pybel on a Mac running OS X 10.4. OpenBabel compiled fine from source, no problems with that. But when I tried to run python setup.py build in the scripts/python subdirectory, the build failed at the linking stage with
/usr/bin/ld: for architecture ppc
/usr/bin/ld: can't locate file for: -lopenbabel
collect2: ld returned 1 exit status
The solution turned out to be that—at least for my computer—I needed to set the environment variable OPENBABEL_INSTALL to /usr/local explicitly. Normally, the library is looked for in the src directory, i.e., where openbabel was built, but that didn’t work, for whatever reason (I haven’t checked up on that).
After setting the appropriate environment variable with export OPENBABEL_INSTALL=/usr/local everything works fine, the install as root finished without any problems, and everything works:
host:~/models/ flo$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import pybel
>>>