Download File Name | Available at | Size | MD5 |
---|---|---|---|
58.pubsnpav.vcf.gz | CarrotOmics | 304.35MB | cf4b4a193fc19cb495dc0f238891460e |
58.pubsnpav.vcf.gz.tbi | CarrotOmics | 216.23KB | ef5000c025b0936f021d2af7d1584be9 |
carrot_79200 | ftp.ncbi.nih.gov |
Collection of 1,393,425 SNP variants from the genome publication. This is the VCF file used for JBrowse. See the linked analysis below for further details.
Relationships |
---|
The FASTA format, Carrot Genome Assembly DCARv2 Sequence Original Naming Scheme, is reference genome VCF, DCARv2 Genome Paper Variants 2016 JBrowse VCF. |
Analysis |
---|
Name | Description |
---|---|
We used BWA-MEM version 0.7.10 to map the resequencing reads from all carrot genotypes to the carrot reference genome using the following parameters -a -M –t 42. Alignments were filtered using SAMtools version 0.1.19 for only primary alignments with quality of at least 30, i.e. parameters -q 30 -F 256. Duplicate reads were marked using MarkDuplicates from Picard tools version 1.119 (https://broadinstitute.github.io/picard/). The GATK version 3.3-0 was used to identify SNP variants for each genotype using the GATK best practices method using RealignerTargetCreator, IndelRealigner, HaplotypeCaller, and GenotypeGVCFs. Then SelectVariants was used to separate SNPs, indels, and other variants. Reads used to construct the doubled haploid reference genome were also analyzed as a control, and variants that were also present here were filtered out with a custom Perl program. Variants were then filtered using VCFTools v0.1.12a with parameters --maf 0.1, --min-meanDP 5, and --max-missing 1. After filtering and variant detection with GATK from 39,695,937 SNP variants we generated 1,393,425 filtered SNPs. These variants were submitted to dbSNP, but that database has since limited its coverage to only human variants, the submitted files are only available in an archived form at ftp://ftp.ncbi.nih.gov/snp/organisms/archive/carrot_79200/ Post-publication, the variant file has been further annotated with ANNOVAR which categorized the variants into various categories: intergenic, upstream, downstream, splicing, intronic, exonic:synonymous SNV, exonic:nonsynonymous SNV, exonic:stopgain, exonic:stoploss, and in some cases combinations of these categories. This file can be downloaded from the link below. Data from this analysis can be viewed in JBrowse here. |
Name | Attribution 4.0 International (CC BY 4.0) |
---|---|
License Summary | You are free to:
The licensor cannot revoke these freedoms as long as you follow the following license terms:
Notices:
|
Full Legal Text | https://creativecommons.org/licenses/by/4.0/legalcode |