April 25th, 2008
There are actually a few bugs in this version. Shortly I’ll put up a new version here.
Couldn’t find a program to get all ring systems out of an SDF file, along with how often they occur. So I wrote a script in python that does exactly that job. It gets the smallest set of smallest rings (SSSR) from pybel: once a molecule is read in (e.g., mol=pybel.readstring(”smi”, smiles)), then you have the SSSR in mol.sssr, which is a vector of OBRing objects (see the OpenBabel documentation for more info about that).
You can iterate over this vector in a standard pythonic fashion, e.g., for ring in mol.sssr: pass. The ring size is easily accessed by ring.PathSize(), the atoms in the ring are stored in the member variable _path, e.g., ring._path will give you the atoms in the ring.
The script checks for fused ring systems by identifying any shared atoms between any members of the SSSR. This is achieved by intersection of the sets of member atoms of any two ring systems. Two rings are considered to be a ring system if they share at least one atom, i.e., strictly speaking it is not fused but rather a spiro system. This behaviour can be changed by changing if len(intsec) in function GetFusedRingsMatrix(mol) to if len(intsec) > 1.
Should you want to get all individual elements of the SSSR, instead of the fused/linked rings as one ring system, then it should suffice to supply a manually crafted matrix as the one returned by GetFusedRingsMatrix(). That would be something like l=len(mol.sssr) and FusedRingsMatrix = [[0 for x in range(l)] for y in range(l)]. Then all rings are supposed to be unconnected.
The script also includes exocyclic double bonds as part of rings they may be linked to.
All fragments are written out to a file fragments.sdf, and the number how often that fragment was encountered in the structure file supplied is written into the SDF field COUNT. If you watch the output with something like mview fragments.sdf (MarvinView from ChemAxon) it will look similar to the picture displayed. 
The script can be found here: Python script for extraction of Murcko fragments
Tags: Cheminformatics, fragment, murcko, openbabel, pybel, scaffold
Posted in Cheminformatics, Code | 1 Comment »
März 20th, 2008
I just tried to install openbabel and its python wrapper pybel on a Mac running OS X 10.4. OpenBabel compiled fine from source, no problems with that. But when I tried to run python setup.py build in the scripts/python subdirectory, the build failed at the linking stage with
/usr/bin/ld: for architecture ppc
/usr/bin/ld: can't locate file for: -lopenbabel
collect2: ld returned 1 exit status
The solution turned out to be that—at least for my computer—I needed to set the environment variable OPENBABEL_INSTALL to /usr/local explicitly. Normally, the library is looked for in the src directory, i.e., where openbabel was built, but that didn’t work, for whatever reason (I haven’t checked up on that).
After setting the appropriate environment variable with export OPENBABEL_INSTALL=/usr/local everything works fine, the install as root finished without any problems, and everything works:
host:~/models/ flo$ python
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pybel
>>>
Tags: install problem, pybel, solution
Posted in Cheminformatics, Code, Computers, mac | 3 Comments »
März 14th, 2008
In a recent paper (How To Winnow Actives from Inactives: Introducing Molecular Orthogonal Sparse Bigrams (MOSBs) and Multiclass Winnow) I applied the Winnow algorithm to the prediction of biological activities of molecules.
I thought that it would certainly be of much use to provide the code to anybody who may want to use this methodology as well. So here we go: the source code of the inplementation can be downloaded here: multiclass winnow in C++, with MySQL support (covered by the GNU General Public Licence; for what that means either read the copy enclosed or read it online).
This program is provided as is without any guarantees that it will work or anything else. For the structure of the underlying MySQL tables, should you wish to use that feature, there is no documentation yet. You can easily find the required information by reading through the source code in the file Winnow.h. If I get round to doing it, I will provide some more documentation in a while here on this page. If you have any questions, please feel free to contact me and I will try to help you out.
The program is capable of either reading the training and test data from text files, or otherwise from MySQL tables. It also includes the possibilities of bagging multiple classifiers, to use a thick threshold, as well as orthogonal sparse bigrams or an exhaustive enumeration of all features provided. Invoking the program without any arguments will display a list of all available options.
There are still a couple of very useful things that can be done with this algorithm in the area of cheminformatics. I won’t spend an excessive amount of time pushing that project any further, therefore I would be happy to share some ideas with people who may want to collaborate on a project.
I would be more than happy to receive any comments or suggestions. In particular, if there is anybody out there who would be willing to write a graphical interface, then I would be very happy to help out as much as I can. But I do not have the time to focus on that for the time being.
Tags: C++, Cheminformatics, chemistry, source, winnow
Posted in Cheminformatics, Code | 1 Comment »
März 2nd, 2008
Well, should you ever be switching keyboards for your Mac computer, then you may be interested in the following. The following link is to an information sheet (that you may print out!!) which shows your for every symbol which key combination you need to produce it on your Mac keyboard. Or use it the other way round: which combination results in what symbol. In any way, it’s quite useful!
Reference for EVERY Character Key on a Mac
Tags: links, mac, symbols, useful things
Posted in Computers, Useful links | No Comments »
Februar 21st, 2008
After the last upgrade R didn’t seem to want to install new packages. The likely reason is that one of the UK CRAN mirrors (http://www.sourcekeg.co.uk/cran/) is not working (at least as of 21/02/08). R fails to install packages with
Warning: unable to access index for repository http://www.sourcekeg.co.uk/cran/bin/macosx/universal/contrib/2.6
To be able to choose a different mirror, just issue options(”repos”=c(CRAN=”@CRAN@”)) on the R command line and the next time a package has to be installed from CRAN you will be asked which mirror to use. See ?setRepositories and ?options for more info.
Tags: r, troubleshooting
Posted in Code, Computers, Useful links | 1 Comment »
Februar 6th, 2008
When writing documents, be it literary or scientific text, formatting and style is very important. Here are a couple of pages that I found of use, they do not only apply to LateX documents.
Tags: documents, guides, latex, style, writing
Posted in Computers, Useful links | No Comments »
Februar 6th, 2008
Part 1. Dabbling with a wealth of statistical facilities: Introduces basic conceptes of R and a general overview.
Part 2. Functional programming and data exploration: Shows vector slices, introduces dataframes and basic operations such as how to get histograms and linear regressions.
Part 3. Reusable and object-oriented programming: A concise but very useful introduction on the object oriented programming that R is capable of doing.
Tags: IBM, links, r, statistics
Posted in Code, Useful links | No Comments »
Februar 5th, 2008
There is a very good short overview about bibliographies with BibTex to be found here at the University of Colorado. Also features a PDF file with examples of quite a few bibliography styles: PDF with style examples
Tags: bibliography, bibliographystyle, bibtex, latex
Posted in Computers, Useful links | No Comments »
Februar 5th, 2008
I needed a Bayesian classifier with support for some custom add-ons. The implementation that is in the script provided here has been done in Python. It reads data from tab-separated textfiles or alternatively supports reading from MySQL databases. Additionally, it also has the possibility to include combinations of features as orthogonal sparse bigrams (OSBs). The classifier gives the scores and class labels for a user-definable number of classes. The source code is well documented with comments. There is no usage() function as of yet, so the command line arguments are only documented within the source.
Read the rest of this entry »
Tags: bayes, classifier, mysql
Posted in Cheminformatics, Code | No Comments »