It has been necessary to change some of the ST numbers in the C. glabrata MLST scheme. This has come about due to an unforeseen issue whereby STs had been assigned but not entered in to the database, prior to its hosting on PubMLST. Some of these were subsequently included in the published literature.
Since this came to light, we have tried to minimise the impact and retrospectively added the profiles that have been formally published. Where these had been assigned STs that have since been re-issued we have had to change the ST on the later-assigned profiles.
The following timeline indicates the changes that have been made:
- 2016
- Transfer of alleles, 68 STs, and isolates from mlst.net.
- 2016
- STs 5 and 9 were not part of the data received, although they are mentioned in the literature. ST-5 and ST-9 have been blocked from future use.
- 2012-2016
- ST numbers issued by mlst.net within this timeframe were not documented and were not passed to pubmlst.org during the data migration process.
- 08/2018
- Novel alleles and sequence types from Lott et al. 2010 (PMID 20190071) and Lott et al. 2012 (PMID 21838617) were added to the sequence database.
- 08/2018
- ST-83 in Lott et al. 2010 (PMID 20190071) is identical to ST-75 in Lott et al. 2012 (PMID 21838617). ST-83 was given preference due to the earlier publication date; consequently there is no ST-75 in the database, and the number is blocked from future use.
- 09/2018
- Novel sequence types from Amanloo et al. 2018 (PMID 28482076) were added; due to partial overlap with numbers used in this study with those used in Lott et al. 2010 and Lott et al. 2012, ST-71 to ST-79 (designation in Amanloo et al. 2018) are included as ST-101 to ST-109 in the database. This is noted in the respective ST records.
- 10/2018
- A total of 368 non-redundant isolates from Lott et al. 2010 (PMID 20190071, n=229 ) and Lott et al 2012 (PMID 21838617, n=265) were added to the isolates database. Isolates included in these two studies partially overlap.
- 11/2018
- A total of 50 isolates from Amanloo et al. 2018 (PMID 28482076) were added to the isolates database, with ST numbering as explained above.
- 11/2018
- Added novel alleles, STs, and isolates from Bordallo-Cardona et al. 2019 (PMID 30397068) upon publication.
- 11/2018
- Added novel alleles, STs, and isolates from Achmad et al. 2019 (PMID 30455247) in parallel to publication process.
- 12/2018
- Added novel alleles, STs, and isolate information from Biswas et al. (PMID 30559734), Mushi et al. (PMID 30597052), and Bordallo-Cardona et al. (PMID 30397068) in parallel to publication process.
- 01/2019
- Retrospectively added information from Sasso et al. (PMID 29580647). Novel FKS allele "X" added to database as FKS29, ST "X" is now ST166.
- 06/2019
- Retrospectively added alleles, STs, and isolates from Healey et al. 2016 (PMID 27020939).
- 07/2019
- Added NCBI_BioProject field to isolates table.
- 07/2019
- Added novel alleles, STs, and isolates from five whole-genome-sequencing studies: Xu et al. 2016 (PMID 27713500; BioP PRJNA218162), Håvelsrud and Gaustad 2017 (PMID 28280017, BioP PRJNA297263) Vale-Silva et a.l, 2017 (PMID 28663342, BioP PRJNA374542), Carrete et al., 2018 (PMID 29249661, BioP PRJNA361477), and Barber et al. 2019 (PMID 30478162, BioP PRJNA483064). Novel STs for isolates “Norway 5 and 6” from Håvelsrud and Gaustad 2017 (PMID 28280017) are now ST137, novel STs for isolates “P35_2” and “P35_3” from Carrete et al 2018 (PMID 29249661) are now ST136.
- 07/2019
- Finalized isolate data from Biswas et al. 2018 (PMID 30559734) in parallel to publication process. Labels there misplaced in the original figure 1 have subsequently been corrected by the authors (PMID 31608038).
- 07/2019
- Retrospectively added novel alleles, STs, and isolates from Biswas et al. 2017 (PMID 28344162, BioP PRJNA310957). Isolates CMRL-06, -07, and -08 were omitted due to ambiguous sites in our mapping obtained from data deposited at SRA.
- 07/2019
- Added isolate data from Rivero-Menendez et al. 2019 (PMID 31285229) upon publication. Consecutive isolates are indicated by patient numbers.
- 08/2019
- Reconstructed ST5 (5-7-8-1-3-6) and ST9 (1-2-2-7-2-1) from original publication (PMID 14662965) and mended records for isolates CE-02 (ST8→ST9) and CE-03 (ST3→ST5).
- 09/2019
- Added novel alleles, STs, and isolates deduced from raw data deposited in SRA by Guo et al. 2019 (PMID 31059831, BioP PRJEB20459). Isolate Y1644 “from ATCC archive, isolated from Iowa, USA” was found to be ST10, and therefore presumed to be ATCC90030 (==database isolate 1).
- 09/2019
- Added 3 isolates from Carrete et al. 2019 (PMID 30809200; BioP PRJNA506893).
- 09/2019
- Added 3 isolates deposited in SRA from Porto (Portugal) under BioP PRJNA525402 (2019).
- 09/2019
- Added novel alleles, STs, and isolates of the CDC (USA) deduced from raw data deposited in SRA under BioP PRJNA329124 (2016) and PRJNA524686 (2019). Isolates were partially redundant, also with those already present in the database (Lott et al. 2010, 2012; PMID 20190071; PMID 21838617) and Healey et al (2016; PMID 28018323). The following modifications were made to join the datasets:
- Three records were omitted: SRR8697269 (CAS11-3129), which displayed a frameshift in FKS2 due to an 8 bp insertion in our assembly, SRR8697391 (CAS08-0631), which did not have sufficient sequencing depth to determine the ST, and SRR8697473 (CAS08-0629), which had no matches to Cg MLST loci (isolate might be C. parapsilosis).
- One novel ST derived from BioP PRJNA329124 (ST169) and 11 derived from BioP PRJNA524686 (STs 179-189) were added.
- In BioP PRJNA329124 isolate names are given with underscores, these were replaced by dashes to allow matching with other datasets.
- For three isolates duplicate SRA entries were found: CAS08-0209, CAS08-0439, and CAS11-2978. Since deduced STs were identical, these were merged into single records each.
- Thirty-eight isolates were already present in the database by isolate name and could be traced back to the same original. Since the deduced STs were identical to those previously recorded, the SRA information was added to the pre-existing records.
- Six isolates (CAS08-0069, CAS08-0094, CAS08-0525, CAS08-0569, CAS08-0725, andCAS09-0869) were already present in the database by isolate name as above, but the genome sequencing-derived STs did not match those previously recorded. These datasets were introduced with the postfix "_GS" to the isolate name to flag those versions with genome sequencing-derived STs.
- In total, 26 novel isolates from BioP PRJNA329124 and 219 from BioP PRJNA524686 were added.
- 12/2022
- Amended records for Biswas et al 2018 with PMID and NCBI_Bioproject numbers, and added missing isolates.
- 12/2022
- Analyzed unknown STs from Arasthefar et al. (PMID: 34909054)
- ST”X” in isolate DPL209 corresponds to ST215 (PMID: 28018323)
- ST”Y” corresponds to ST16, and is only erroneously labelled in Table 1 Isolates are already contained in database from older studies.
- 12/2022
- Retrospectively added isolates from published studies, including those where novel alleles and STs had previously been added during the respective publication processes:
- all 16 isolates from Dong et al. (PMID: 25720562)
- only those 6 isolates from Achmad et al. (PMID: 30455247) where STs are given in table 2
- all 10 isolates from Canela et al. (PMID: 33611738)
- all 56 isolates from Khalifa et al. (PMID: 32571826)
- all 10 isolates from Boonsilp et al. (PMID: 34356956)
- all 133 isolates from Chen et al. (PMID: 35369470)
- all 3 isolates from Moorhouse et al. (PMID: 33796227)
- 12/2022
- Added data from Jensen et al. (PMID: 26711776)
- The sequence for the novel TRP1 allele in isolate RHJ_122 was not available anymore from the authors, this isolate is not represented in the database.
- Added 8 novel STs.
- Added 49 isolates.
- 12/2022
- Added 4 studies using genome sequencing:
- Added 46 non-redundant isolates from Helmstetter et al (PMID: 35199143), using always only the first of sequentially obtained isolates.
- Isolate names were padded to 3 digits to allow easier alphanumeric sorting
- STs for CG86 and CG185 (SRR12825233, SRR12825253) could not be determined as the raw data did not yield sequences for FKS or URA markers. These isolates are deposited in SRA, but also not presented in the manuscript (i.e. lacking in Supp. Table 2).
- The control assembly of CG151 (SRR12825241) showed a novel LEU allele (LEU37)
- The control assemblies of CG181 (SRR12825234) and CG151 (SRR12825241) yielded novel STs (ST224, ST225), deviating from the published STs (123 and 15).
- Added 3 isolates from Pais et al (PMID: 36448018).
- Control assembly of isolate 73281 (SRR14844978) shows a novel URA3 allele, leading to the novel ST 226, not ST6 as given in the publication.
- Added 30 isolates and two novel STs (227 and 228) from Stefanini et al (PMID: 36354359). STs were derived from own control assemblies.
- Added 8 isolates from Szervas et al (PMID: 34829249). STs were derived from own control assemblies. The MLST profiles undisclosed in the manuscript are ST128 (ERR4669795), ST148 (ERR4669757), and ST238 stemming from a novel TRP1 allele (ERR4669779).
- Added 46 non-redundant isolates from Helmstetter et al (PMID: 35199143), using always only the first of sequentially obtained isolates.
Control genome assembly methodology:
For control purposes, genome sequences are re-assembled from raw data downloaded from SRA. Raw data are only superficially checked using FASTQC, and trimmed from adapters if needed using trimmomatic. Reads are de novo assembled to scaffold level using spades (standard options), and the scaffolds used to check the ST designation given in the respective publication. Where this leads to data deviating from the published STs, reads are extracted by mapping (BWA-mem) to the reference allele (always allele 1), and the SNP curated by manually inspecting the mapped reads. Where this holds, this is mentioned in the comment fields of the isolate record, and the new allele attributed giving the SRA number.