|||
http://sourceforge.net/projects/tgicl/
A sample LSF script file jobfile to cluster a EST dataset using TGICL on lewis may include the following lines:
tgicl fastdb
where fastdb the multi-fasta file containing all the sequences to be clustered.
For more information about the usage of TGICL, tyep "tgicl -h" on command line or see the "README" file
Usage: tgicl <fasta_db> [-q <qualdb>] [-d <refDb>] [-c {<num_CPUs>|<PVM_nodefile>}] [-m <user>] [-O 'cap3_options'] [-l <min_overlap>] [-v <max_overhang>] [-p <pid>] [-n slicesize] [-s <maxsize>] [-a <cluster_file>] [-M] [-K] [-L] [-X] [-I] [-C] [-G] [-W <pairwise_script.psx>] [-A <asm_program.psx>] [-P <param_file>] [-u <seq_list>] [-f <prefix_filter>] [-D] Options: -c : use the specified number of CPUs on local machine (default 1) or a list of PVM nodes in <PVM_nodefile> Clustering phase options: -d do not perform all-vs-all search, but search <fasta_db> against <refDb> instead; exit after the pairwise hits are generated -n number of sequences in a clustering search slice (default 1000) -p minimum percent identity for overlaps <PID> (default 94) -l miminum overlap length (default 40) -G store gap information for all pairwise alignments -v maximum length of unmatched overhangs (default 30) -M ignore lower-case masking in <fasta_db> sequences -W use custom script <pairwise_script.psx> for the distributed pairwise searches instead of the default: tgicl_cluster.psx -Z only run the distributed pairwise searches and exit -- (no sorting of the pairwise overlaps and no clusters generated) -Y only run the distributed pairwise searches and the sorted & compressed *_hits.Z file -L performs more restrictive, layout-based clustering instead of simple transitive closure General options: -I do not rebuild database indices -s attempt to split clusters larger than <maxsize> based on seeded clustering (only works if there are 'et|' or 'np|'-prefixed entries provided in the input file) -O use given 'cap3_options' instead of the default ones (-p 93) -u skip the mgblast searches (assumed done) but restrict further clustering analysis to only the sequences in <seq_list> -C (TIGR sequences only) always put in the same cluster all reads from the same clone -t use <clone_list> file to put in the same cluster all sequence names on the same line -a assemble clusters from file <cluster_file> (do not perform any pairwise clustering) -f keep only sequence names with prefix <prefix> -K skip the pairwise searches, only recreate the clusters by reprocessing the previously obtained overlaps -X do not perform assembly, only generate the cluster file -A use custom script as the slice assembly script (instead of tgicl_asm.psx) -P pass the <param_file> as the custom parameter file to the assembly program <asmprog.psx>
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2024-11-27 00:51
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社