Tutorial
This tutorial will guide you through making a pipeline that trim and then align fastq files. You will learn how to:
- Create a new hydra-genetics pipeline
- Add hydra-genetics modules
- Create input files based on the test data
- Configure the pipeline
- Run the pipeline
- Make a rulegraph of the pipeline
- Do code testing
- Add your own rule
Pre-requirements:
- Python => 3.8 (with pip and venv)
- Singularity >= 3.8.6
- graphviz
Download test data
Download fastq and reference files from google drive
Setup environment
python3 -m venv hackaton_venv
source hackaton_venv/bin/activate
pip install hydra-genetics==1.0.0
Create pipeline
Create skeleton pipeline
hydra-genetics create-module \
--name simple_pipeline \
--description "A simple pipeline" \
--author "Patrik Smeds" \
--email patrik.smeds@scilifelab.uu.se \
--git-user smeds
cd simple_pipeline
Look through the generated files
Add hydra-genetics modules
Add the prealignment module to workflow/Snakefile (use tag=”v1.1.0”). See instructions in the module README.
Add the alignment module to workflow/Snakefile (use tag=”v0.4.0”). See instructions in the module README.
Create input files
hydra-genetics create-input-files \
-d path/to/fastq/ \
--every 2 \
--nreads 10
The above command will not find anything due to how the files are named, try to modify --read-number-regex to match the file names.
Check out the input files created: samples.tsv and units.tsv.
Config pipeline
The pipeline is supposed to trim the fastq files and then output merged and sorted bamfiles.
Look at the schemas, example configs and documentation for prealignment and alignment to find out what is required to be added in the config (config/config.yaml).
prealignment/workflow/schemas/config.schemas.yamlalignment/workflow/schemas/config.schemas.yamlprealignment/config/config.yamlalignment/config/config.yaml
Run pipeline
Install required programs (snakemake, …):
pip install -r requirements.txt
Make sure workflow can execute using dry-run. Here we specify what output file we expect in the command line.
snakemake -s workflow/Snakefile \
--use-singularity \
-n \
--configfile config/config.yaml \
--until alignment/samtools_merge_bam/HD827sonic-testing1_T.bam
Run pipeline
snakemake -s workflow/Snakefile \
--use-singularity \
-c1 \
--configfile config/config.yaml \
--until alignment/samtools_merge_bam/HD827sonic-testing1_T.bam
Modify workflow/rules/common.smk so that you don’t have to include “--until alignment/samtools_merge_bam/HD827sonic-testing1_T.bam” in your shell command and also so that all samples in samples.tsv are run without hard coding the file path. Run again.
snakemake -s workflow/Snakefile \
--use-singularity \
-c1 \
--configfile config/config.yaml
Make a rulegraph
Make a rulegraph of your pipeline, look at the figure and enjoy your success!
snakemake -s workflow/Snakefile --configfile config/config.yaml --rulegraph | dot -Tsvg > images/rulegraph.svg
Code testing
Before making a pull-request to a hydra-genetics module or pipeline it is recommended to run a number of tests locally.
Install test programs:
pip install -r requirements.test.txt
Check syntax of snakemake rules
snakefmt --compact-diff -l 130 workflow/
Check syntax of python scripts
pycodestyle --max-line-length=130 --statistics workflow/scripts/
Run pytest for scripts with implemented tests
python -m pytest workflow/scripts/test_dummy.py
Run linting of the pipeline
snakemake --lint -s workflow/Snakefile --configfile config/config.yaml
Add a rule
In this step we will add a new rule to the pipeline. The new rule should use a program of your choice from picard.
Create new rule template
hydra-genetics create-rule \
--module simple_pipeline \
--tool picard \
--command program_name \
--author "Patrik Smeds" \
--email patrik.smeds@scilifelab.uu.se
Modify rule
Modify ´simple_pipeline/workflow/rules/picard.smk´ so that the rule does what you what it to do.
Update pipeline
Update the pipeline to use the new rule.
The following files need to be modified:
simple_pipeline/workflow/rules/common.smk#Add new outputsimple_pipeline/workflow/Snakefile#Check that the new rule is importedsimple_pipeline/config/config.yaml#Check that the new rule with container and other options are added. Add more stuff if needed
Run pipeline
Run the pipeline to generate the new output files.
snakemake -s workflow/Snakefile \
--use-singularity \
-c1 \
--configfile config/config.yaml
Add documentation
When using the hydra-genetics create-pipeline and create-rule readthedocs documentation is already prepared for you. All you need to do is update the schemas. Follow the instruction to view a local copy of your corrent documentation and then update it.
Install local mkdocs server and plugins
pip install -r docs/requirements.txt
Start server
mkdocs serve
View documentation in browser
Update schemas
Update the descriptions in the schemas and the software documetantion page will update with this new information. (Might need restart of server). Look at docs/softwares.md to see the code that generates the documentation.
Schemas:
simple_pipeline/workflow/schemas/rule.schema.yaml#Description of the rule input and outputsimple_pipeline/workflow/schemas/config.schema.yaml#Description of the configurations (params, container, ...)simple_pipeline/workflow/schemas/resources.schema.yaml#Description of the computer resources (modify if extra resources are needed)