Usage
The querynator is used from the command line. Use the help function to display all options and choices.
querynator --help
querynator query-api-cgi --help
querynator query-api-civic --help
VCF Normalization
We recommend to normalize the input vcf’s using bcftools norm before running the querynator to unify the input:
The vcf file must be indexed to run bcftools norm, e.g. using tabix:
tabix /path/to/vcf_file.vcf
Then run bcftools norm in the following way:
bcftools norm \
-a \
--atom-overlaps . \
-f /path/to/reference_file \
/path/to/vcf
Query the cancergenomeinterpeter - CGI
A typical command to query the cancergenomeinterpeter - CGI:
querynator query-api-cgi \
-m mutations.[vcf,tsv,gtf] \
-o sample_name \
-g hg38 \
-c 'Any cancer type' \
--email your-cgi-account-mail@whatever.com \
--token your-cgi-token
The command above downloads the following files from CGI. For further information please refer to their FAQ.
sample_name
├── sample_name.cgi_results
| ├── drug_prescription.tsv
| ├── input01.tsv
| ├── metadata.txt
| └── mutation_analysis.tsv
Note
The input variants for the CGI query must be sorted based on their coordinates.
Note
Too many requests in too short amount of time can result in the email address used being blocked.
Mutation, CNA & translocation analysis
If you run the command with all possible input files, you will obtain:
# run command
querynator query-api-cgi \
--mutations mutations.[vcf,tsv,gtf] \
--cnas cnas.tsv \
--translocations translocations.tsv \
-o sample_name \
-g hg38 \
-c 'Any cancer type' \
--email your-cgi-account-mail@whatever.com \
--token your-cgi-token
# output
sample_name
├── sample_name.cgi_results
| ├── cna_analysis.tsv
| ├── drug_prescription.tsv
| ├── fusion_analysis.tsv
| ├── input01.tsv
| ├── input02.tsv
| ├── input03.tsv
| ├── metadata.txt
| └── mutation_analysis.tsv
└── sample_name.cgi_results.zip
Using the filter_vep flag, the querynator can filter out benign variants in vcf files before querying the knowledgebase (KB).
Input file formats
For detailed information please refer to CGI formats.
The genomic tabular format gtf is displayed below and contains partly the same columns as a vcf file (>v. 4.0) and is tab-separated.
The sample column is not mandatory, but recommended when more than one sample is contained in one file.
A mutations/variant file can have the extensions vcf, vcf.gz, tsv or gtf. The column names can also be uppercase letters as in a vcf.
sample |
#chrom/chr |
pos |
ref |
alt |
|---|---|---|---|---|
test1 |
chr4 |
121369475 |
A |
T |
test2 |
chr10 |
122630837 |
C |
G |
A copy number alterations file should be tsv and column names must be lowercase.
sample |
gene |
cna |
|---|---|---|
test1 |
ERBB2 |
amp |
test2 |
TP53 |
del |
A translocation file should be tsv and column names must be lowercase.
sample |
fus |
|---|---|
test1 |
BCR__ABL1 |
test2 |
PML__RARA |
Genome build versions
Note
The cancergenomeinterpeter will perform a liftover of the genomic coordinates to hg38 if the parameter --genome hg19 is used.
Query the Clinical Interpretations of Variants in Cancer - CIViC
A typical command to query the Clinical Interpretations of Variants in Cancer - CIViC:
querynator query-api-civic \
-v input_file.vcf \
-o outdir \
-g ref_genome [GRCh37, GRCh38, NCBI36]
The command above generates the following result files using CIViCpy.
sample_name
├── sample_name.civic_results.tsv
└── metadata.txt
The querynator performs an exact search, meaning that variants in the KB must match the given coordinates, reference allele(s) and alternate allele(s) precisely.
Using the filter_vep flag, the querynator can filter out benign variants in vcf files before querying the KB.
Input file format
The querynator requires a vcf file (>v. 4.0) in uncompressed or in bgzipped format vcf.gz to query CIViC.
It is recommended (although not required) to provide an index-file (vcf.gz.tbi) with the input vcf file, e.g. using tabix.
The index file must be stored in the same directory as the vcf file.
Filtering benign variants
Variants that are classified as low Impact and synonymous variants will be filtered out based on their Ensembl VEP
annotation if the additional flag filter_vep is set.
The filtering step can be applied before querying both KBs.
Currently filtering can only be applied on VEP annotated vcf files. In order to filter the file,
the querynator expects a vcf that was annotated using VEP’s standard key (CSQ).
To filter, the following fields are required in the VEP info column:
Consequence
IMPACT
If filter_vep is set, the filtered and removed variants are given out as results in the vcf_files directory.
A typical command for a CIViC query:
querynator query-api-civic \
-v input_file.vcf,tsv,gtf \
-o outdir \
-g ref_genome [GRCh37, GRCh38, NCBI36] \
--filter_vep
The command above generates the following result files using CIViCpy.
sample_name
├── vcf_files
| ├── sample_name.filtered_variants.vcf
| ├── sample_name.removed_variants.vcf
├── sample_name.civic_results.tsv
└── metadata.txt
Note
When the filter_vep flag is set a unique Querynator ID is added to the INFO column of each variant in the vcf file.
The same ID is added to the sample_name.civic_results.tsv if CIViC is queried.
Create an HTML Report
After querying the knowledgebases included in the querynator, it is possible to combine the results into one table and to create an HTML report summarizing the most important features of each variant.
Note
This functionality was specifically created to be included into the variantMTB nextflow pipeline
which can be found here. (Currently under development)
A typical command to create such a report:
querynator create-report \
--cgi_path path/to/cgi_results \
--civic_path path/to/civic_results \
--outdir path/to/save/results
The command above generates the following result directory:
outdir
├── combined_files
| ├── alterations_vep.tsv
| ├── biomarkers_linked.tsv
| ├── civic_cgi_vep.tsv
| └── civic_vep.tsv
├── report
| | ├── overall_report.html
| | ├── variant_reports
| | | ├── chr1-1234-A-T.html
| | | ├── chr1-1245-C-G.html
| | | └── ...
| | ├── plots
| | | ├── kb_upsetplot.png
└── └── └── └── tier_upsetplot.png
The command creates one overall report which includes some statistics and shows an overview of the most important variants in the project.
The Details column in the overall report links directly to a more detailed report on the variant in question.