March 14th, 2008
In a recent paper (How To Winnow Actives from Inactives: Introducing Molecular Orthogonal Sparse Bigrams (MOSBs) and Multiclass Winnow) I applied the Winnow algorithm to the prediction of biological activities of molecules.
I thought that it would certainly be of much use to provide the code to anybody who may want to use this methodology as well. So here we go: the source code of the inplementation can be downloaded here: multiclass winnow in C++, with MySQL support (covered by the GNU General Public Licence; for what that means either read the copy enclosed or read it online).
This program is provided as is without any guarantees that it will work or anything else. For the structure of the underlying MySQL tables, should you wish to use that feature, there is no documentation yet. You can easily find the required information by reading through the source code in the file Winnow.h. If I get round to doing it, I will provide some more documentation in a while here on this page. If you have any questions, please feel free to contact me and I will try to help you out.
The program is capable of either reading the training and test data from text files, or otherwise from MySQL tables. It also includes the possibilities of bagging multiple classifiers, to use a thick threshold, as well as orthogonal sparse bigrams or an exhaustive enumeration of all features provided. Invoking the program without any arguments will display a list of all available options.
There are still a couple of very useful things that can be done with this algorithm in the area of cheminformatics. I won’t spend an excessive amount of time pushing that project any further, therefore I would be happy to share some ideas with people who may want to collaborate on a project.
I would be more than happy to receive any comments or suggestions. In particular, if there is anybody out there who would be willing to write a graphical interface, then I would be very happy to help out as much as I can. But I do not have the time to focus on that for the time being.
Tags: C++, Cheminformatics, chemistry, source, winnow
Posted in Cheminformatics, Code | 1 Comment »
March 2nd, 2008
Well, should you ever be switching keyboards for your Mac computer, then you may be interested in the following. The following link is to an information sheet (that you may print out!!) which shows your for every symbol which key combination you need to produce it on your Mac keyboard. Or use it the other way round: which combination results in what symbol. In any way, it’s quite useful!
Reference for EVERY Character Key on a Mac
Tags: links, mac, symbols, useful things
Posted in Computers, Useful links | No Comments »
February 21st, 2008
After the last upgrade R didn’t seem to want to install new packages. The likely reason is that one of the UK CRAN mirrors (http://www.sourcekeg.co.uk/cran/) is not working (at least as of 21/02/08). R fails to install packages with
Warning: unable to access index for repository http://www.sourcekeg.co.uk/cran/bin/macosx/universal/contrib/2.6
To be able to choose a different mirror, just issue options(”repos”=c(CRAN=”@CRAN@”)) on the R command line and the next time a package has to be installed from CRAN you will be asked which mirror to use. See ?setRepositories and ?options for more info.
Tags: r, troubleshooting
Posted in Code, Computers, Useful links | 2 Comments »
February 6th, 2008
When writing documents, be it literary or scientific text, formatting and style is very important. Here are a couple of pages that I found of use, they do not only apply to LateX documents.
Tags: documents, guides, latex, style, writing
Posted in Computers, Useful links | No Comments »
February 6th, 2008
Part 1. Dabbling with a wealth of statistical facilities: Introduces basic conceptes of R and a general overview.
Part 2. Functional programming and data exploration: Shows vector slices, introduces dataframes and basic operations such as how to get histograms and linear regressions.
Part 3. Reusable and object-oriented programming: A concise but very useful introduction on the object oriented programming that R is capable of doing.
Tags: IBM, links, r, statistics
Posted in Code, Useful links | No Comments »
February 5th, 2008
There is a very good short overview about bibliographies with BibTex to be found here at the University of Colorado. Also features a PDF file with examples of quite a few bibliography styles: PDF with style examples
Tags: bibliography, bibliographystyle, bibtex, latex
Posted in Computers, Useful links | No Comments »
February 5th, 2008
I needed a Bayesian classifier with support for some custom add-ons. The implementation that is in the script provided here has been done in Python. It reads data from tab-separated textfiles or alternatively supports reading from MySQL databases. Additionally, it also has the possibility to include combinations of features as orthogonal sparse bigrams (OSBs). The classifier gives the scores and class labels for a user-definable number of classes. The source code is well documented with comments. There is no usage() function as of yet, so the command line arguments are only documented within the source.
Read the rest of this entry »
Tags: bayes, classifier, mysql
Posted in Cheminformatics, Code | No Comments »
February 4th, 2008
This script can be used to dump all molecules from an SDF file with all or a specified subset of corresponding SDF tags into an SQL database, in this case MySQL.
SDF parsing: The script parses SDF files and initialises an instance of class Molecule for each parsed molecule. A dictionary of associated SDF tags is available as Molecule.SDFdict, all SDF tags found are in the list Molecule.SDFtags, the connection table in Molecule.CTAB. Either all or a few selected tags are then written into a database. MySQL: A database has to be created manually (create database ). The name of the table where to write the structural info (i.e. CTAB) has to be specified on the command line. At least one SDF tag has to be specified on the command line. Additional SDF tags to be written to the database are specified as a colon-separated list. Read the rest of this entry »
Tags: mysql, sdf
Posted in Cheminformatics, Code | 1 Comment »
February 4th, 2008
Add information to each molecule in a SDF file. The information for each molecule is specified as a colon-separated list in an additional file. This allows multiple values to be added under the same tag for each molecule. Alternatively, this allows for an easy extension to be used to add multiple tags with different values (not implemented, but easy to do).
Python script to add fields to a SDF file
Read the rest of this entry »
Tags: python, sdf
Posted in Cheminformatics, Code | 2 Comments »