Recent Updates
March 18th, 2011.
Initial chimera checking and removal has been performed on all V6V4 and V3V5 sequences using the uchime option of USearch. We are currently using a combination of reference set comparison (similar to Chimera Slayer) and de novo checking (directly comparing sequences within each PCR amplification). We are currently running additional review of possible chimeras that have not been identified and removed (false negatives) and expect to refine our chimera removal process in the near future.
Information on USearch can be found here: http://drive5.com/usearch/
November 8th, 2010.
v6 clusters have been updated and are available here.
August 9th, 2010.
The VAMPS data has been updated to Silva 102.
For v6v4 and v3v5 projects, chimeras will be removed as soon as possible.
April 5th, 2010.
Clustering
The operational taxonomic units (OTUs) for the V6 sequencing data have been recalculated and are now available through the Clusters and Diversity page. All V6 data have been clustered together using the new SLP method based on pairwise distances, a noise-reducing preclustering step, and an average linkage clustering. The advantage of the new method is that it provides correction for sequencing errors, it minimizes the propagation of OTUs with sampling depth, and can be run across all projects. Unique OTU IDs can be used to compare between projects and datasets. Any samples sequenced since March 16 have not yet been incorporated with the new clusters. The previous multiple sequence alignment, complete linkage clusters are still available through the VAMPS archive.
October 9th, 2009.
Upload Untrimmed Data.
The ability to upload untrimmed data has been added. Uploads include raw sequence data, quality data, primers, and run keys. After uploading, the data is automatically run through our trimming process. The data can then be viewed in the Community Visualization tools.
August 23rd, 2009.
Improved Trimming and GASTing
- The base-calling software internal to the GS-FLX was updated late February 2008. Prior to February 2008, only the length of homopolymers dictated the quality scores. All runs March 7, 2008 and later report an improved quality score that we are now in a position to improve the quality of sequence reads. Data on the VAMPS site posted on March 7, 2008 or later no longer contain reads with an average base quality less than 30.
- The fuzzy trimming of the distal primer infrequently left the first two or three bases of the primer at the end of the read. This would only occur in cases where the "fuzz" was just after the first few bases, and the remaining bases were recognized as valid primer. We now require the first 3 bases of the primer to be part of the "fuzzy" find for it to be valid. In many cases, moving down the list of fuzzy matches, we were still able to remove the primer at the start. The clustering OTUs ignore terminal gaps, so the effect should be minimal on clustering.
- The current GAST process has a more stringent requirement for a valid prefiltering BLAST match. If the BLAST match of the read has an alignment length less than 80% of the read length, it is not considered an adequate match. In cases where no valid BLAST match now exists, the GAST process will assign "Unknown" as the taxonomy. In experiments of known sequence, we have found that this helps to remove nonV6 contamination. The reads are not deleted - they may still be a high quality read, but they will not contribute to the clusters and diversity, because they will be filtered out before the clusters are run.
The previous data have been archived at: http://vampsarchive.mbl.edu
July 30th, 2009.
Clusters and Diversity. A taxonomy file (*.tax) has been added to the OTU Cluster files. The taxonomy file shows the taxonomy corresponding to the reads in the *.list file for the 0.03, 0.06, and 0.10 widths.
April 2nd, 2009.
Export Taxcounts. The ability to export data for selected domains was added to the Export Taxcounts page. The user can now select Archaea, Bacteria, Eukarya, or Organelle sequences for download, as well as all domains together.
March 12th, 2009.
Exporting Fasta Sequences. The GAST distance was added to the definition line in the fasta files. The format of the definition line is now Sample ID | Project | Dataset | GAST Distance and Taxonomy | Count.
March 9th, 2009.
Exporting Fasta Sequences. The ability to export data for selected domains was added. The user can now select Archaea, Bacteria, Eukarya, or Organelle sequences for download. The taxonomy of each sequence is included in the definition line.
March 9th, 2009.
Exporting Fasta Sequences. The compression of fasta files has been eliminated. It is no longer necessary to unzip the downloaded files.
February 13th, 2009.
Community Visualization. A change was made to how the normalization of data is done for custom datasets in the Community Visualization page. The custom datasets were previously normalized as one unit. Now the normalization also is done within the custom dataset, normalizing the individual dataset components against each other.
February 11th, 2009.
Export Taxcounts. The ability to select datasets through saved custom datasets was added. When selecting data as normalized to maximum or by frequency, the datasets are normalized within the custom dataset by normalizing the individual dataset components against each other.
February 5th, 2009.
DOTUR Clusters. Projects are now filtered by taxonomic domain before running DOTUR clusters. Projects using Bacterial primers, for instance, are filtered to only include Bacteria before generating OTUs. Similarly, projects using Archaeal primers will only have OTUs for Archaea, and Eukaryal projects will only have Eukaryal OTUs.
January 26th, 2009.
Project Naming.
- CoMM projects now begin with a prefix of ICM_
- Keck projects begin with KCK_
- LTER projects begin with LTR_
- All other projects begin with a three-letter code designating the research facility or the researchers initials.
- All projects end a suffix designating the domain and hypervariable region targeted by the primers. For instance _Bv6 used primers for the V6 region in Bacteria, _Av6 is the V6 region in Archaea, and Ev9 is the V9 region in Eukarya.
December 9th, 2008.
Export Taxcounts. The ability to select datasets through saved custom projects has been temporarily disabled.
December 1st, 2008.
Export Taxcounts. The ability to select datasets through saved custom projects was added.
November 5th, 2008.
Community Visualization and Export Taxcounts. V9 data were added to the database.
October 29th, 2008.
Community Visualization and Export Taxcounts. NOTICE TO USERS: The taxonomy data have been updated as of October 27th and 29th, 2008. The raw sequences have been retrimmed to improve the removal of primer sequences, and the taxonomy has been updated to SILVA 95, with taxonomic assignments contributed from Entrez Genome, Hugenholtz, RDP, SILVA, and several users.
October 10th, 2008.
Export Taxcounts. The ability to select individual datasets from different projects has been added. The user can now combine projects and datasets in order to do a customized export of data.
August 19th, 2008.
Community Visualization. The appearance of the Community Visualization entry page was updated. The Novice tool was renamed the Simple Taxonomy tool.
July 31st, 2008.
Clustering and Diversity. OTU data packages were updated. An error was regrettably discovered in our distance calculations which were used to generate the OTU data downloads. This was corrected and all the OTU data packages were updated.
July 28th, 2008.
Clustering and Diversity. OTU files specifying OTU sizes were added to the OTU Cluster data package.
July 28th, 2008.
Databases. Download of RefV3 database was added.
July 28th, 2008.
Software. Download of GAST software was added.
July 3rd, 2008.
Export Taxonomic Counts. Updated downloads. If left-clicking on download link, a new window will be opened so that the existing page and links will be preserved.
July 2nd, 2008.
Export Taxonomic Counts. Updated processing of export data to improve download speed.
June 24th, 2008.
Export Taxonomic Counts. Updated the Normalize By Percent calculation to be written with ten decimals in order to minimize rounding errors.
June 12th, 2008.
Export Taxonomic Counts. Updated processing of export data to improve download speed.
June 9th, 2008.
Export Taxonomic Counts. Removed Totals column from normalized data outputs.
June 6th, 2008.
Export Taxonomic Counts. Normalization By Percent was added to the output options. Species and strain were added to the taxonomy. Compression of output files was removed.