Inputs
Overview
Pipeline parameters can be adjusted using the following methods:
- At the command line using
--{parameter_name}
(e.g.,--input
) - In the
nextflow.config
file - In a JSON file via the
-params-file
parameter
It is also possible to pass arguments directly to a pipeline process using the ext.args
variable in conf/modules.config
(see example below):
withName: 'IVAR_CONSENSUS' {
ext.args = "-n 'N' -k"
ext.when = { }
publishDir = [
[
path: { "${params.outdir}/${meta.id}/assembly/" },
pattern: "none",
mode: 'copy'
],
[
path: { "${params.outdir}/${meta.id}/qc" },
pattern: "*.csv",
mode: 'copy'
]
]
}
Input Options
--input
Path to the samplesheet.
Example samplesheet
samplesheet.csv
:
sample,fastq_1,fastq_2
sample01,sample01_R1_001.fastq.gz,sample01_R2_001.fastq.gz
sample02,sample02_R1_001.fastq.gz,sample02_R2_001.fastq.gz
Samplesheet columns
- Required columns:
sample
, andfastq_1
+fastq_2
orsra
- All file paths in the samplesheet must be absolute.
Column Name | Description |
---|---|
sample | Sample name |
fastq_1 | Absolute path to the forward (R1) Illumina read file. Must be supplied with fastq_2 . Cannot be supplied with sra column. |
fastq_2 | Absolute path to the forward (R2) Illumina read file. Must be supplied with fastq_1 . Cannot be supplied with sra column. |
sra | NCBI SRA accession number. Cannot be used with fastq_1 or fastq_2 . |
reference | Semicolon-separated list of reference genomes to use for reference-based genome assembly. This can include absolute file paths to a FASTA file or the reference name(s) in the reference set supplied to --refs |
--refs
Path to a reference set
- Options: Path to a
tar.gz
or.csv
file containing reference info. - Default:
${baseDir}/assets/reference_sets/EPITOME_*.tar.gz
Learn more about reference sets here.
--max_reads
The maximum number of reads to include in the analysis.
- Options:
0...Inf
- Default:
2000000
Samples with more than this number of reads will be randomly down-sampled using
seqtk sample
. Read counts are based on the sum of the forward and reverse reads.
Classification Options
--ref_mode
Reference selection mode
- Options:
accurate
,fast
,kitchen-sink
, andnone
. - Default:
accurate
Learn more about reference selection modes here.
--ref_genfrac
Minimum genome fraction used for reference selection.
- Options:
0...1
- Default:
0.1
Consensus assemblies will only be generated if the proportion of the reference detected in the sample exceeds this value. This differs from the genome fraction threshold used for the final quality assessment (
--qc_genfrac
). This value should generally be lower than the QC threshold, to account for gaps in the de novo assembly.
--ref_covplot
Create reference coverage plots.
- Options:
true
orfalse
- Default:
false
Only applies when
--ref_mode
isaccurate
.
--ref_denovo_assembler
De novo assembler to use for accurate reference selection.
- Options:
spades
,megahit
,velvet
, andskesa
. - Default:
megahit
Only applies to
accurate
reference selection mode.
--ref_denovo_contigcov
Minimum depth of coverage for a contig to be retained in the de novo assembly.
- Options:
0...Inf
- Default:
10
Only applies to
accurate
reference selection mode.
--ref_denovo_contiglen
Minimum length for a contig to be retained in the de novo assembly.
- Options:
0...Inf
- Default:
300
Only applies to
accurate
reference selection mode.
Assembly Options
--cons_assembler
Reference-based assembler
- Options:
ivar
orirma
- Default:
ivar
--cons_assembly_type
Method used for creating the reference-based assembly.
- Options:
plurality
,consensus
, orpadded
- Default:
consensus
This option only applies to IRMA (ignored by iVar).
--cons_assembly_elong
Enables IRMA’s optional elongation feature to extend assembled contigs.
- Options:
true
orfalse
- Default:
false
Only applies when using --cons_assembler irma
--cons_allele_qual
Minimum allele quality when making the reference-based assembly.
- Options:
0...Inf
- Default:
20
--cons_allele_ratio
Minimum allele support when making the reference-based assembly.
- Options:
0...1
- Default:
0.6
--cons_allele_depth
Minimum allele depth when making the reference-based assembly.
- Options:
0...Inf
- Default:
10
--cons_drop_lowcov
Treat alleles with coverage less than the minimum depth specified by --cons_allele_depth
as deletions.
- Options:
true
orfalse
- Default:
false
Only applies when using --cons_assembler ivar
--cons_condist
Average nucleotide difference used to condense duplicate assemblies (1 - ( % ANI / 100 )
).
- Options:
0...1
- Default:
0.02
Use --cons_condist 0
to return all duplicated assemblies.
Quality Control
--qc_depth
Minimum average depth of coverage used for quality assessment of reference-based assemblies.
- Options:
0...Inf
- Default:
30
--qc_genfrac
Minimum genome fraction used for quality assessment of reference-based assemblies.
- Options:
0...1
- Default:
0.8
This is separate from the minimum genome fraction used for reference selection (i.e., --ref_genfrac
) and applies only to the final quality assessment step.