Improved Hybrid de novo Genome Assembly, Gene Prediction and Annotation of Carrot

Resource Type:

Project

Name:

Short Description:

The project goals are to improve the assembly of the genome of cultivated carrot, and improve gene predictions using this improved assembly.

Publication:

Coe K, Bostan H, Rolling W, Turner-Hissong S, Macko-Podgórni A, Senalik D, Liu S, Seth R, Curaba J, Mengist MF, Grzebelus D, Van Deynze A, Dawson J, Ellison S, Simon P, Iorizzo M. Population genomics identifies genetic signatures of carrot domestication and improvement and uncovers the origin of high-carotenoid orange carrots.. Nature plants. 2023 Oct; 9(10):1643-1658.
Liu, Su. Improved Hybrid De Novo Genome Assembly, Resistance Gene Prediction and Annotation of Carrot (Daucus carota). M.Sc. Thesis. 2020. North Carolina State University.
Iorizzo M et al. Improved Hybrid de novo Genome Assembly, Gene Prediction and Annotation of Carrot (Daucus carota). Jan. 11, 2020. XXVII Plant and Animal Genome Abstract W046.

Relationship:

There is 1 relationship.
Relationships
The project, Improved Hybrid de novo Genome Assembly, Gene Prediction and Annotation of Carrot, is a subproject of project, Carrot Genome Sequencing Project.

Analysis:


Name	Description
Carrot Genome Assembly DCARv3 Gaps	This analysis represents the JBrowse annotation of gaps in the assembled carrot genome sequence "Carrot Genome Assembly DH1 v3.0". This JBrowse track shows all gaps in the genome assembly consisting of runs of 100 'N's, which represent gaps of unknown size. There are 554 gaps in the entire genome assembly.
Carrot Genome Assembly DH1 v3.0	Pacific Bioscience, Oxford Nanopore, Illumina Paired-End (PE) and Hi-C sequencing data were used to develop an improved genome assembly and annotation of the doubled haploid orange Nantes-type carrot DH1, NCBI BioSample SAMN03216637. The new DH1 v3 assembly spans 440.7 Mb, assembled into nine chromosomes which represent ~93% of the estimated nuclear genome size (473 Mb). This genome is available in the CarrotOmics Blast Search
Carrot Genome Assembly DH1 v3.0 Regulatory Gene Annotation	PlantTFcat, a reference plant transcription factor and transcriptional regulator categorization tool, was used to predict the transcription factors and regulatory genes in v3 gene models as well as the DCARv2 genes for comparison purposes. The data from this analysis is presented in supplementary table S17 in the carrot genome publication.
Carrot Genome Assembly DH1 v3.0 Repetitive DNA Annotation	De novo identification of carrot repetitive DNA was carried out with RepeatModeler v.2.0.1. Annotation of the consensus sequences was performed using a curated database of carrot LTR-RTs, Helitrons and MITE, carrot satellite repeats, dicot plant repeats from RepBase (v.23.05) and DANTE. Masking was performed using RepeatMasker.
Carrot Genome Assembly DH1 v3.0 Resistance Gene Annotation	PRGdb 3.0, a comprehensive platform for prediction and analysis of plant disease resistance genes, was used to predict the disease resistant genes in v3 as well as the v2 for the comparison purposes. The data from this analysis is presented in supplementary table S19 in the carrot genome publication.
Carrot Genome Assembly DH1 v3.0 Tandem Repeats annotation by trf	Tandem repeat detection with Tandem Repeats Finder version 4.07b
Daucus carota genome V.3 SSR Detection with MISA	SSRs were detected in the Daucus carota V.3 assembly using misa, and when possible primers were designed with primer3. misa detection parameters 2-6 3-4 4-3 5-3 6-3 7-3 8-3 and interruptions 10 This analysis has not been published. RESULTS OF MICROSATELLITE SEARCH ================================ Total number of sequences examined: 11 Total size of examined sequences (bp): 441,119,442 Total number of identified SSRs: 148,264 Number of SSR containing sequences: 11 Number of sequences containing more than 1 SSR: 11 Number of SSRs present in compound formation: 12,941 Distribution to different repeat type classes --------------------------------------------- Unit size Number of SSRs 2 52,385 3 30,622 4 38,281 5 16,265 6 7,217 7 2,622 8 872 128,207 designed primer pairs
DCAR Gene annotation V1.0 locations on Carrot Genome Assembly DH1 V3.0	Gene predictions from the Carrot Genome Assembly DCARv2 were transferred to the Carrot Genome Assembly DCARv3 using GMAP. Organellar annotations were transferred separately using CrossMap with manual corrections for trans-spliced genes.
DCAR V3.0 Gene Prediction	A multi-step approach was used to predict the most comprehensive gene model catalog for the carrot genome v3. MAKER and GeMoMa were used to perform gene prediction based on the integration of de novo gene prediction and evidence-based predictions. For MAKER, carrot ESTs, DH1 Illumina and IsoSeq transcriptome sequences, gene models obtained from five closely related or model species and proteins from Uniprot-sprot were used as transcript evidence. AUGUSTUS v2.5.5 and SNAP were used for de novo prediction. Through this analysis MAKER predicted 28,721 gene models. Next, GeMoMa was used to improve the quality of the splice junction sites predicted by MAKER and to predict the gene models that were not predicted by MAKER. The datasets included as input in GeMoMa were: predicted genes from the five related species or model species used for MAKER prediction; final gene models produced from MAKER pipeline; splice sites mined from the mapping of the DH1 Illumina transcriptome data on DH1 v3. This analysis produced an intermediate set of 32,625 gene models. A final step was performed to refine all gene models and predict any missing models. In this step, gene models predicted on the DH1 v2 assembly, named DCARv2 (32,112) and RefSeq (44,484) were transferred/re-predicted to the DH1 v3 genome assembly using GMAP and GenomeThreader. DCARv2 or RefSeq gene models that were not predicted by MAKER+GeMoMa, that had experimental evidence an that were not masked, were considered as new gene models. In those cases where the structure of the RefSeq and DCARv2 gene models were not in agreement, the correct structure was manually inspected using the experimental evidences. Finally, high-quality IsoSeq transcripts were mapped to the DH1 v3 assembly using GMAP and GenomeThreader, and those transcripts mapping with appropriate gene structure and not predicted in the previous steps, were added to the gene model catalog.
NCBI Daucus carota subsp. sativus Annotation Release GCF_001625215.2-RS_2024_03	The genome sequence records for Daucus carota subsp. sativus RefSeq assembly GCF_001625215.2 (DH1 v3.0) were annotated by the NCBI Eukaryotic Genome Annotation Pipeline, an automated pipeline that annotates genes, transcripts and proteins on draft and finished genome assemblies. Further description is available from NCBI at https://www.ncbi.nlm.nih.gov/refseq/annotation_euk/Daucus_carota_subsp._sativus/GCF_001625215.2-RS_2024_03/

Biomaterial:


Name	Description
SAMN03216637	This biosample is also known by its germplasm accession of DH1, please see this record for more details.

File:


File	Type
Carrot Genome Assembly DCARv3 Sequence Original Naming Scheme	FASTA format
DCARv2 V3.0 Gene Prediction GFF3	GFF3
Carrot ESTs for DH1 v3.0 genome assembly evaluation	FASTA format
DCARv3 Tandem Repeats Annotation	GFF3
DCARv3 SSR Detection with MISA	GFF3
DCARv3 Genome Assembly gaps	GFF3
DCARv3 Repetitive DNA Annotation	GFF3
DCARv3 Regulatory Gene Annotation	GFF3
DCARv3 Resistance Gene Annotation	GFF3
DCARv2 Gene annotation V1.0 locations on DCARv3 assembly	GFF3

Cross Reference:


GitHub	dsenalik/Carrot_Genome_DH1_v3 mishaploid/carrot-demography
NCBI BioProject	285926 798760 865166 865653