Analyze hits from CAP screen

This protocol reads the output SD file from a CAP DB search and will:

  • Clean the data.
  • Calculate LogD.
  • Apply a fitvalue threshold, molecular weight filter, log D filter.
  • Cluster the molecules (average cluster size can be set).

  • Write out the complete clusters and only the top 10 hits per cluster.

    Subsequently:

    • The overall top N hits are written to an SD file.
    • The top 10 hits for each of the generated clusters are written to an SD file.
    • The best hits from each of the clusters (minimum clustersize of 5) is also written to a separate SD file.

    To get some additional spread in log D:

    • The hits in each cluster are grouped in 3 logD clusters.
    • The top hit from each logD clusters is also written to an SD file.

    Should be easy to modify to analyze others hitlists and so select on other criteria. Hope this might be useful to someone.