Overview
GenusMusa
Speciesacuminata
Common Namemusa
AbbreviationM. acuminata

DH-Pahang is a doubled-haploid of the germplasm collection accession named Pahang (2n=22). Pahang was collected in Malaysia’s Pahang province in the late 1940s. It is a wild accession that belongs to subspecies Musa acuminata ssp. malaccensis, whose genetic signature is commonly found in dessert and cooking bananas.

DH stands for Doubled Haploid which refers to the induced doubling of the chromosomes in a haploid cell (which has one set of chromosomes). In this case, the haploid cell was pollen. The doubled-haploid (DH Pahang) was produced through anther culture and spontaneous chromosomes doubling.

The Musa acuminata reference genome sequence results from collaboration between Genoscope and CIRAD (UMR AGAP) funded by ANR. The sequence was analyzed in collaboration with several teams within the Global Musa Genomics Consortium (GMGC) and was published as version 1 in:

D’Hont A, Denoeud F, Aury JM., Baurens FC, Carreel F, Garsmeur O, Noel B, Bocs S, Droc G, Rouard M, et al. (2012) The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature, 10.1038/nature11241.

The reference sequence was then improved by a combination of methods and datasets, leading the release of the version 2 of the assembly and gene annotation, which are described in the following publication:

Martin G, Baurens FC, Droc G, Rouard M, Cenci A, Kilian A, Hastie A, Dolezel J, Aury JM, Alberti A, Carreel F, D’Hont A (2016) Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatic methods. BMC Genomics.

If you have any question, please don’t hesitate to contact us

Assembly statistics version 1

1. Sequencing Method

Genomic sequence was generated using a whole genome shotgun approach using Sanger technology sequencing (ABI 3730xl) and next generation sequencers (Roche/454 GSFLX and Illumina GAIIx)

  Number of reads Number of bases Coverage Insert size(bp)
Sanger 2,049,457 1,537,092,750 2.94 10,000
Sanger (BAC ends) 90,542 67,839,000 0.13 110,000
Single Roche/454 27,495,411 8,952,303,336 17.12 NA
Illumina 553,276,222 26,557,258,656 50.78 500

Table 1. Raw sequencing data overview.

2. Assembly Method

Sanger and 454 were assembled with Newbler. We obtained 29,251 contigs that were linked into 7,513 scaffolds. The contig N50 was 28.3 kb, and the scaffold N50 was 1.3 Mb.

Assembly   Number Total length(Mb) Percentage of assembly N50 (Kb) Longest (Kb)
Contigs All 24,425 390.6 - 43.1 477
Scaffolds All 7.513 472.9 100% 1,311 11,965
Anchored on chromosomes 258 331.8 70%    
Anchored on chromosomes and oriented 68 221.0 47%    

Table 2. Global statistics on the sequence of Musa acuminata (DH-Pahang)

3. Anchoring the assembly on a genetic map

  • Construction of a genetic map

180 individuals were derived from "Musa acuminata 'Pahang' self-fertilization and used to build the genetic map.

  • Anchoring the assembly to the linkage groups

652 markers were used for the anchoring (589 SSRs + 63 DArt)
70% of the assembly (332 Mb) was anchored to the 11 Musa chromosomes of the Pahang genetic map including 91.8 % of the predicted genes.

4. Genome Size / Loci

The cumulative scaffold size was 472.9 Mb, about 10% smaller than the estimated genome size of 523 Mb.
Gene models were predicted in the Musa genome, based on a combination of evidence integrated using the GAZE computational framework.
36,538 protein-coding loci have been predicted.

Chromosome Sequence length(bp) Protein Coding loci
chr1 27,573,629 2.834
chr2 22,054,697 2.327
chr3 30,470,407 3.253
chr4 30,051,516 3.367
chr5 29,377,369 2.974
chr6 34,899,179 3.700
chr7 28,617,404 2.766
chr8 35,439,739 3.454
chr9 34,148,863 3.105
chr10 33,665,772 3.157
chr11 25,514,024 2.678
chrUn_random 141,147,818 2.927
Total 472,960,417 36.542

Table 3. Genome Statistics.

Assembly statistics version 2

1. Sequencing Method

Genomic sequence was generated using a whole genome shotgun approach using Sanger technology sequencing (ABI 3730xl) and next generation sequencers (Roche/454 GSFLX and Illumina GAIIx)

  Number of reads Number of bases Coverage Insert size(bp)
Sanger 2,049,457 1,537,092,750 2.94 10,000
Sanger (BAC ends) 90,542 67,839,000 0.13 110,000
Single Roche/454 27,495,411 8,952,303,336 17.12 NA
Illumina 553,276,222 26,557,258,656 50.78 500
Illumina 259,808,062 21,242,380,431 40.62 5,000

Table 1. Raw sequencing data overview.

2. Assembly Method

Sanger and 454 were assembled with Newbler. We obtained 29,251 contigs that were linked into 7,513 scaffolds. The contig N50 was 28.3 kb, and the scaffold N50 was 1.3 Mb.

Assembly NumberTotal length(Mb)Percentage of assemblyN50 (Kb)Longest (Kb)
ContigsAll24,425390.6-43.1477
ScaffoldsAll1.532450.8100%3,00016,368
Anchored on chromosomes258331.889.5%  
Anchored on chromosomes and oriented68221.047%  

Table 2. Global statistics on the sequence of Musa acuminata (DH-Pahang)

3. Anchoring the assembly on a genetic map

  • Construction of a genetic map

268 individuals were derived from "Musa acuminata 'Pahang' self-fertilization and used to build the genetic map.

  • Anchoring the assembly to the linkage groups

23,430 markers were used for the anchoring (609 SSRs + 75 DArt, 20,919 DArtSeq)
89.5% of the assembly (397 Mb) was anchored to the 11 Musa chromosomes of the Pahang genetic map including 91.8 % of the predicted genes.

4. Genome Size / Loci

The cumulative scaffold size was 450.8 Mb, about 13.8 % smaller than the estimated genome size of 523 Mb.
Gene models were predicted in the Musa genome, based on a combination of evidence integrated using the GAZE computational framework.
36,538 protein-coding loci have been predicted.

ChromosomeSequence length(bp)Protein Coding loci
chr129,070,4522.834
chr229,511,7342.327
chr335,020,4133.253
chr437,105,7433.367
chr541,853,2322.974
chr637,593,3643.700
chr735,028,0212.766
chr844,889,1713.454
chr941,306,7253.105
chr1037,674,8113.157
chr1127,954,3502.678
chrUn_random46,622,2172.927
mitochondrion7,218,240??
Total450,848,47336.542

Table 3. Genome Statistics.

Feature Browser
The following browser provides a quick view for new visitors. Use the searching mechanism to find specific features.
Feature NameUnique NameType
Ma00_p00010.1Ma00_p00010.1polypeptide
Ma00_p00010.2Ma00_p00010.2polypeptide
Ma00_p00020.1Ma00_p00020.1polypeptide
Ma00_p00030.1Ma00_p00030.1polypeptide
Ma00_p00040.1Ma00_p00040.1polypeptide
Ma00_p00050.1Ma00_p00050.1polypeptide
Ma00_p00050.2Ma00_p00050.2polypeptide
Ma00_p00060.1Ma00_p00060.1polypeptide
Ma00_p00070.1Ma00_p00070.1polypeptide
Ma00_p00080.1Ma00_p00080.1polypeptide
Ma00_p00090.1Ma00_p00090.1polypeptide
Ma00_p00100.1Ma00_p00100.1polypeptide
Ma00_p00110.1Ma00_p00110.1polypeptide
Ma00_p00110.2Ma00_p00110.2polypeptide
Ma00_p00120.1Ma00_p00120.1polypeptide
Ma00_p00130.1Ma00_p00130.1polypeptide
Ma00_p00140.1Ma00_p00140.1polypeptide
Ma00_p00150.1Ma00_p00150.1polypeptide
Ma00_p00160.1Ma00_p00160.1polypeptide
Ma00_p00170.1Ma00_p00170.1polypeptide
Ma00_p00180.1Ma00_p00180.1polypeptide
Ma00_p00190.1Ma00_p00190.1polypeptide
Ma00_p00200.1Ma00_p00200.1polypeptide
Ma00_p00210.1Ma00_p00210.1polypeptide
Ma00_p00220.1Ma00_p00220.1polypeptide

Pages