Posts Tagged ‘C++’

Winnow algorithm in C++ for multiclass classification

Friday, March 14th, 2008

In a recent paper (How To Winnow Actives from Inactives: Introducing Molecular Orthogonal Sparse Bigrams (MOSBs) and Multiclass Winnow) I applied the Winnow algorithm to the prediction of biological activities of molecules.

I thought that it would certainly be of much use to provide the code to anybody who may want to use this methodology as well. So here we go: the source code of the inplementation can be downloaded here: multiclass winnow in C++, with MySQL support (covered by the GNU General Public Licence; for what that means either read the copy enclosed or read it online).

This program is provided as is without any guarantees that it will work or anything else. For the structure of the underlying MySQL tables, should you wish to use that feature, there is no documentation yet. You can easily find the required information by reading through the source code in the file Winnow.h. If I get round to doing it, I will provide some more documentation in a while here on this page. If you have any questions, please feel free to contact me and I will try to help you out.

The program is capable of either reading the training and test data from text files, or otherwise from MySQL tables. It also includes the possibilities of bagging multiple classifiers, to use a thick threshold, as well as orthogonal sparse bigrams or an exhaustive enumeration of all features provided. Invoking the program without any arguments will display a list of all available options.

There are still a couple of very useful things that can be done with this algorithm in the area of cheminformatics. I won’t spend an excessive amount of time pushing that project any further, therefore I would be happy to share some ideas with people who may want to collaborate on a project.

I would be more than happy to receive any comments or suggestions. In particular, if there is anybody out there who would be willing to write a graphical interface, then I would be very happy to help out as much as I can. But I do not have the time to focus on that for the time being.

Kennard Stone algorithm in C++

Monday, February 4th, 2008

This algorithm can be used for the selection of a training set. It works in the following way: (see this PDF for a graphical depiction of how this algorithm works)

Find the two points most separated in the training set
For each candidate point, find the smallest distance to any object already selected
Select that point for the training set which has the largest of these smallest distances
As described above, this algorithm always gives the same result, due to the two starting points which are always the same. Alternatively, one could choose two random points to start with. In that way, different training sets can be obtained.

The source code can be downloaded here, it is provided as a gzipped tar file. The code is far from being perfect, but it does the job. If you make any modifications to the code please let me know so that I can include them on this page here and in the original code. There are some comments in the source files that should explain how the program works. If you have any questions don’t hesitate do drop me an email.