API#
dbCAN: Automatic CAZyme Annotation
usage: run_dbcan -h [-h] [--verbose] [--dbCANFile DBCANFILE]
[--dia_eval DIA_EVAL] [--dia_cpu DIA_CPU]
[--hmm_eval HMM_EVAL] [--hmm_cov HMM_COV]
[--hmm_cpu HMM_CPU] [--out_pre OUT_PRE]
[--out_dir OUT_DIR] [--db_dir DB_DIR]
[--tools {hmmer,diamond,dbcansub,all} [{hmmer,diamond,dbcansub,all} ...]]
[--use_signalP USE_SIGNALP] [--signalP_path SIGNALP_PATH]
[--gram {p,n,all}] [-v VERSION]
[--dbcan_thread DBCAN_THREAD] [--tf_eval TF_EVAL]
[--tf_cov TF_COV] [--tf_cpu TF_CPU] [--stp_eval STP_EVAL]
[--stp_cov STP_COV] [--stp_cpu STP_CPU]
[--cluster CLUSTER] [--cgc_dis CGC_DIS]
[--cgc_sig_genes {tf,tp,stp,tp+tf,tp+stp,tf+stp,all}]
[--only_sub] [--cgc_substrate] [--pul PUL] [-o OUT]
[-w WORKDIR] [-env ENV] [-odbcan_sub] [-odbcanpul]
[-upghn UNIQ_PUL_GENE_HIT_NUM]
[-uqcgn UNIQ_QUERY_CGC_GENE_NUM] [-cpn CAZYME_PAIR_NUM]
[-tpn TOTAL_PAIR_NUM] [-ept EXTRA_PAIR_TYPE]
[-eptn EXTRA_PAIR_TYPE_NUM] [-iden IDENTITY_CUTOFF]
[-cov COVERAGE_CUTOFF] [-bsc BITSCORE_CUTOFF]
[-evalue EVALUE_CUTOFF] [-hmmcov HMMCOV]
[-hmmevalue HMMEVALUE]
[-ndsc NUM_OF_DOMAINS_SUBSTRATE_CUTOFF]
[-npsc NUM_OF_PROTEIN_SUBSTRATE_CUTOFF]
[-subs SUBSTRATE_SCORS]
inputFile {protein,prok,meta}
Positional Arguments#
- inputFile
User input file. Must be in FASTA format.
- inputType
Possible choices: protein, prok, meta
Type of sequence input. protein=proteome; prok=prokaryote; meta=metagenome
Named Arguments#
- --verbose
Print out detailed procedure for each step.
Default: False
- --dbCANFile
Indicate the file name of HMM database such as dbCAN.txt, please use the newest one from dbCAN2 website.
Default: “dbCAN.txt”
- --dia_eval
DIAMOND E Value
Default: 1e-102
- --dia_cpu
Number of CPU cores that DIAMOND is allowed to use
Default: 8
- --hmm_eval
HMMER E Value
Default: 1e-15
- --hmm_cov
HMMER Coverage val
Default: 0.35
- --hmm_cpu
Number of CPU cores that HMMER is allowed to use
Default: 8
- --out_pre
Output files prefix
Default: “”
- --out_dir
Output directory
Default: “output”
- --db_dir
Database directory
Default: “db”
- --tools, -t
Possible choices: hmmer, diamond, dbcansub, all
Choose a combination of tools to run
Default: “all”
- --use_signalP
Use signalP or not, remember, you need to setup signalP tool first. Because of signalP license, Docker version does not have signalP.
Default: False
- --signalP_path, -sp
The path for signalp. Default location is signalp
Default: “signalp”
- --gram, -g
Possible choices: p, n, all
Choose gram+(p) or gram-(n) for proteome/prokaryote nucleotide, which are params of SingalP, only if user use singalP
Default: “all”
- -v, --version
Default: “4.1.1”
dbCAN-sub parameters#
- --dbcan_thread, -dt
Default: 12
- --tf_eval
tf.hmm HMMER E Value
Default: 0.0001
- --tf_cov
tf.hmm HMMER Coverage val
Default: 0.35
- --tf_cpu
tf.hmm Number of CPU cores that HMMER is allowed to use
Default: 8
- --stp_eval
stp.hmm HMMER E Value
Default: 0.0001
- --stp_cov
stp.hmm HMMER Coverage val
Default: 0.3
- --stp_cpu
stp.hmm Number of CPU cores that HMMER is allowed to use
Default: 8
CGC_Finder parameters#
- --cluster, -c
Predict CGCs via CGCFinder. This argument requires an auxillary locations file if a protein input is being used
- --cgc_dis
CGCFinder Distance value
Default: 2
- --cgc_sig_genes
Possible choices: tf, tp, stp, tp+tf, tp+stp, tf+stp, all
CGCFinder Signature Genes value
Default: “tp”
CGC_Substrate parameters#
- --only_sub
Only run substrate prediction for PUL. If this parameter is presented, dbcan will skip the CAZyme annotation and CGC prediction.
Default: True
- --cgc_substrate
run cgc substrate prediction?
Default: False
- --pul
dbCAN-PUL PUL.faa
- -o, --out
Default: “substrate.out”
- -w, --workdir
Default: “.”
- -env, --env
Default: “local”
- -odbcan_sub, --odbcan_sub
Output dbCAN-sub prediction intermediate result? for debug
Default: False
- -odbcanpul, --odbcanpul
Output dbCAN-PUL prediction intermediate result? for debug.
Default: False
dbCAN-PUL homologous searching parameters#
how to define homologous gene hits and PUL hits
- -upghn, --uniq_pul_gene_hit_num
Default: 2
- -uqcgn, --uniq_query_cgc_gene_num
Default: 2
- -cpn, --CAZyme_pair_num
Default: 1
- -tpn, --total_pair_num
Default: 2
- -ept, --extra_pair_type
None[TC-TC,STP-STP]. Some like sigunature hits
- -eptn, --extra_pair_type_num
specify signature pair cutoff.1,2
Default: “0”
- -iden, --identity_cutoff
identity to identify a homologous hit
Default: 0.3
- -cov, --coverage_cutoff
query coverage cutoff to identify a homologous hit
Default: 0.3
- -bsc, --bitscore_cutoff
bitscore cutoff to identify a homologous hit
Default: 50
- -evalue, --evalue_cutoff
evalue cutoff to identify a homologous hit
Default: 0.01
dbCAN-sub major voting parameters#
how to define dbsub hits and dbCAN-sub subfamily substrate
- -hmmcov, --hmmcov
Default: 0.3
- -hmmevalue, --hmmevalue
Default: 0.01
- -ndsc, --num_of_domains_substrate_cutoff
define how many domains share substrates in a CGC, one protein may include several subfamily domains.
Default: 2
- -npsc, --num_of_protein_substrate_cutoff
define how many sequences share substrates in a CGC, one protein may include several subfamily domains.
Default: 2
- -subs, --substrate_scors
each cgc contains with substrate must more than this value
Default: 2