API#

dbCAN: Automatic CAZyme Annotation

usage: run_dbcan -h [-h] [--verbose] [--dbCANFile DBCANFILE]
                    [--dia_eval DIA_EVAL] [--dia_cpu DIA_CPU]
                    [--hmm_eval HMM_EVAL] [--hmm_cov HMM_COV]
                    [--hmm_cpu HMM_CPU] [--out_pre OUT_PRE]
                    [--out_dir OUT_DIR] [--db_dir DB_DIR]
                    [--tools {hmmer,diamond,dbcansub,all} [{hmmer,diamond,dbcansub,all} ...]]
                    [--use_signalP USE_SIGNALP] [--signalP_path SIGNALP_PATH]
                    [--gram {p,n,all}] [-v VERSION]
                    [--dbcan_thread DBCAN_THREAD] [--tf_eval TF_EVAL]
                    [--tf_cov TF_COV] [--tf_cpu TF_CPU] [--stp_eval STP_EVAL]
                    [--stp_cov STP_COV] [--stp_cpu STP_CPU]
                    [--cluster CLUSTER] [--cgc_dis CGC_DIS]
                    [--cgc_sig_genes {tf,tp,stp,tp+tf,tp+stp,tf+stp,all}]
                    [--only_sub] [--cgc_substrate] [--pul PUL] [-o OUT]
                    [-w WORKDIR] [-env ENV] [-odbcan_sub] [-odbcanpul]
                    [-upghn UNIQ_PUL_GENE_HIT_NUM]
                    [-uqcgn UNIQ_QUERY_CGC_GENE_NUM] [-cpn CAZYME_PAIR_NUM]
                    [-tpn TOTAL_PAIR_NUM] [-ept EXTRA_PAIR_TYPE]
                    [-eptn EXTRA_PAIR_TYPE_NUM] [-iden IDENTITY_CUTOFF]
                    [-cov COVERAGE_CUTOFF] [-bsc BITSCORE_CUTOFF]
                    [-evalue EVALUE_CUTOFF] [-hmmcov HMMCOV]
                    [-hmmevalue HMMEVALUE]
                    [-ndsc NUM_OF_DOMAINS_SUBSTRATE_CUTOFF]
                    [-npsc NUM_OF_PROTEIN_SUBSTRATE_CUTOFF]
                    [-subs SUBSTRATE_SCORS]
                    inputFile {protein,prok,meta}

Positional Arguments#

inputFile

User input file. Must be in FASTA format.

inputType

Possible choices: protein, prok, meta

Type of sequence input. protein=proteome; prok=prokaryote; meta=metagenome

Named Arguments#

--verbose

Print out detailed procedure for each step.

Default: False

--dbCANFile

Indicate the file name of HMM database such as dbCAN.txt, please use the newest one from dbCAN2 website.

Default: “dbCAN.txt”

--dia_eval

DIAMOND E Value

Default: 1e-102

--dia_cpu

Number of CPU cores that DIAMOND is allowed to use

Default: 8

--hmm_eval

HMMER E Value

Default: 1e-15

--hmm_cov

HMMER Coverage val

Default: 0.35

--hmm_cpu

Number of CPU cores that HMMER is allowed to use

Default: 8

--out_pre

Output files prefix

Default: “”

--out_dir

Output directory

Default: “output”

--db_dir

Database directory

Default: “db”

--tools, -t

Possible choices: hmmer, diamond, dbcansub, all

Choose a combination of tools to run

Default: “all”

--use_signalP

Use signalP or not, remember, you need to setup signalP tool first. Because of signalP license, Docker version does not have signalP.

Default: False

--signalP_path, -sp

The path for signalp. Default location is signalp

Default: “signalp”

--gram, -g

Possible choices: p, n, all

Choose gram+(p) or gram-(n) for proteome/prokaryote nucleotide, which are params of SingalP, only if user use singalP

Default: “all”

-v, --version

Default: “4.1.1”

dbCAN-sub parameters#

--dbcan_thread, -dt

Default: 12

--tf_eval

tf.hmm HMMER E Value

Default: 0.0001

--tf_cov

tf.hmm HMMER Coverage val

Default: 0.35

--tf_cpu

tf.hmm Number of CPU cores that HMMER is allowed to use

Default: 8

--stp_eval

stp.hmm HMMER E Value

Default: 0.0001

--stp_cov

stp.hmm HMMER Coverage val

Default: 0.3

--stp_cpu

stp.hmm Number of CPU cores that HMMER is allowed to use

Default: 8

CGC_Finder parameters#

--cluster, -c

Predict CGCs via CGCFinder. This argument requires an auxillary locations file if a protein input is being used

--cgc_dis

CGCFinder Distance value

Default: 2

--cgc_sig_genes

Possible choices: tf, tp, stp, tp+tf, tp+stp, tf+stp, all

CGCFinder Signature Genes value

Default: “tp”

CGC_Substrate parameters#

--only_sub

Only run substrate prediction for PUL. If this parameter is presented, dbcan will skip the CAZyme annotation and CGC prediction.

Default: True

--cgc_substrate

run cgc substrate prediction?

Default: False

--pul

dbCAN-PUL PUL.faa

-o, --out

Default: “substrate.out”

-w, --workdir

Default: “.”

-env, --env

Default: “local”

-odbcan_sub, --odbcan_sub

Output dbCAN-sub prediction intermediate result? for debug

Default: False

-odbcanpul, --odbcanpul

Output dbCAN-PUL prediction intermediate result? for debug.

Default: False

dbCAN-PUL homologous searching parameters#

how to define homologous gene hits and PUL hits

-upghn, --uniq_pul_gene_hit_num

Default: 2

-uqcgn, --uniq_query_cgc_gene_num

Default: 2

-cpn, --CAZyme_pair_num

Default: 1

-tpn, --total_pair_num

Default: 2

-ept, --extra_pair_type

None[TC-TC,STP-STP]. Some like sigunature hits

-eptn, --extra_pair_type_num

specify signature pair cutoff.1,2

Default: “0”

-iden, --identity_cutoff

identity to identify a homologous hit

Default: 0.3

-cov, --coverage_cutoff

query coverage cutoff to identify a homologous hit

Default: 0.3

-bsc, --bitscore_cutoff

bitscore cutoff to identify a homologous hit

Default: 50

-evalue, --evalue_cutoff

evalue cutoff to identify a homologous hit

Default: 0.01

dbCAN-sub major voting parameters#

how to define dbsub hits and dbCAN-sub subfamily substrate

-hmmcov, --hmmcov

Default: 0.3

-hmmevalue, --hmmevalue

Default: 0.01

-ndsc, --num_of_domains_substrate_cutoff

define how many domains share substrates in a CGC, one protein may include several subfamily domains.

Default: 2

-npsc, --num_of_protein_substrate_cutoff

define how many sequences share substrates in a CGC, one protein may include several subfamily domains.

Default: 2

-subs, --substrate_scors

each cgc contains with substrate must more than this value

Default: 2