High-Performance Computing at the NIH

RSS Feed
MaxCluster

Description

MaxCluster is a command-line tool for the comparison of protein structures. It provides a simple interface for a large number of common structure comparison tasks. A key feature of the program is the ability to process thousands of structures, either against a single reference protein or in an all-verses-all comparison.

Features:

  • Sequence-dependent or sequence-independent alignment
  • RMSD and URMSD comparison
  • Relative RMSD and Relative URMSD comparison
  • MaxSub and TM-score search algorithm
  • Global Distance Test (GDT) scoring
  • Single structure comparison
  • Single structure verses multiple structure processing
  • All-verses-all multiple structure comparison
  • Single, average and maximum linkage hierachical clustering
  • Nearest-Neighbour clustering algorithm
  • 3D-Jury ranking method
  • Accepts .bz2 and .gz compressed input PDB files
  • Reads and writes all-verses-all comparison scores
  • Produces rasmol script alignment files
  • Adjustable output logging level

Version

Type maxcluster -h on command line

Sample session

$ find . -name "*.pdb" > pdblist $ maxcluster -l pdblist -C 4 -rmsd INFO : Reading PDB list file 'pdblist' INFO : Successfully read 900 / 900 PDBs from list file 'pdblist' INFO : Successfully read 900 PDB structures INFO : All vs. All RMSD INFO : Processed 404540 of 404550 RMSDs INFO : CPU time = 15.87 seconds INFO : ====================================== INFO : Nearest Neighbour clustering INFO : ====================================== INFO : Centroids INFO : ====================================== INFO : Cluster Centroid Size Spread INFO : 1 : 134 229 3.185 002/S_00000034.pdb INFO : 2 : 588 56 3.299 006/S_00000088.pdb INFO : 3 : 497 23 3.574 005/S_00000097.pdb INFO : 4 : 542 13 3.309 006/S_00000042.pdb INFO : 5 : 715 11 3.060 008/S_00000015.pdb INFO : 6 : 517 10 3.174 006/S_00000017.pdb INFO : 7 : 782 9 3.407 008/S_00000082.pdb INFO : 8 : 788 9 3.289 008/S_00000088.pdb INFO : 9 : 139 8 3.121 002/S_00000039.pdb INFO : 10 : 80 7 3.255 001/S_00000080.pdb INFO : 11 : 647 7 3.193 007/S_00000047.pdb INFO : 12 : 885 7 3.154 009/S_00000085.pdb INFO : 13 : 530 6 3.156 006/S_00000030.pdb INFO : 14 : 122 5 3.086 002/S_00000022.pdb INFO : 15 : 169 5 3.055 002/S_00000069.pdb INFO : 16 : 600 5 2.922 006/S_00000100.pdb INFO : ====================================== INFO : 16 Clusters @ Threshold 4.000 (Assigned 410 / 900) INFO : ====================================== INFO : Item Cluster INFO : 1 : 0 001/S_00000001.pdb INFO : 2 : 0 001/S_00000002.pdb INFO : 4 : 0 001/S_00000004.pdb INFO : 9 : 0 001/S_00000009.pdb ...

Documentation

http://www.sbg.bio.ic.ac.uk/~maxcluster/index.html