Umber of viruses in a fraction. PFGE analysis indicated the presence of three distinct genome sizes, while TEM showed four distinct morphological groups. Both PFGE and TEM can underestimate actual diversity, since genetically distinct viruses can have indistinguishable genome sizes [24] or morphologies [36]. Given these caveats, we found that there was a minimum of four distinct groups of viruses in the sequenced fraction. The sequence library did not contain matches to more than a few genes of any one virus, suggesting that the viral genomes represented in the library have not previously been sequenced. Most virus hits were to bacteriophages, consistent with theSequence Assembly and Clavulanic acid potassium salt site contig AnnotationAssembly of the sequences resulted in 221 AZ876 contigs comprised of 2 to 38 sequences each (Figure 5A) and ranging in size from 370 to 6536 bp in length (Figure 5B), with 65 of the sequences in the library comprising these contigs. Identification of ORFs in the largest contigs (.4 kb) revealed 47 complete ORFs with an average length of 640 bp (Figure 6). The majority of these contigs had larger ORFs, but the seventh contig was comprised entirely of short ORFs (111?13 bp) with no significant hits and the ninth contig contained a much larger ORF (3672 bp) with similarity to a viral tape measure protein. Annotation of the ORFs showed thatFigure 3. Taxonomic classification of the sequence library. Classification of all sequences (A) and families represented in the virus sequences (B) based on significant hits (E-value #0.001) to the GenBank database using BLASTx. Numbers of sequences are in parentheses. doi:10.1371/journal.pone.0060604.gAssembly of a Viral Metagenome after FractionationTable 2. Categories of viral proteins in the sequence library.Protein category unknown oxygenase helicase/primase structural DNA polymerase exonuclease ferrochelatase DNA synthesis peptidase DNA packaging DNA methylase integrase endolysin endonuclease DNA binding heat shock protein protease transcriptional activator transferase doi:10.1371/journal.pone.0060604.tNumber of sequences 245 63 49 37 31 25 21 13 9 5 3 3 2 2 1 1 1 1observed morphologies of the viruses in the sample, which mostly resembled tailed bacteriophages in the order Caudovirales. The distant relationships of our library sequences to known viral DNA polymerase sequences suggest that the viruses in the sequenced fraction are not closely related to any previously sequenced virus, and thus information about their potential hosts cannot be inferred from the phylogenetic tree. However, the library sequences formed a well-supported clade, suggesting that the viruses in the fraction used to construct the library were relatively closely related with respect to the phylogeny of their putative DNA polymerase sequences. The phylogenetic results also show that there were viruses belonging to at least five operational taxonomic units in the sequenced fraction. While we did not directly compare the fractionated viral assemblage to the whole, unfractionated viral community, assembly of the sequence library from the fractionated sample showed that there were many more contigs generated than from comparable metagenomic analyses of whole viral assemblages [11?3,37,38]. In the latter studies, only 0.3?.5 of library sequences could be assembled into contigs with a maximum of 4 sequences per contig, whereas 65 of the sequences in our library were assembled into contigs with a maximum of 38 sequences in a contig. This sup.Umber of viruses in a fraction. PFGE analysis indicated the presence of three distinct genome sizes, while TEM showed four distinct morphological groups. Both PFGE and TEM can underestimate actual diversity, since genetically distinct viruses can have indistinguishable genome sizes [24] or morphologies [36]. Given these caveats, we found that there was a minimum of four distinct groups of viruses in the sequenced fraction. The sequence library did not contain matches to more than a few genes of any one virus, suggesting that the viral genomes represented in the library have not previously been sequenced. Most virus hits were to bacteriophages, consistent with theSequence Assembly and Contig AnnotationAssembly of the sequences resulted in 221 contigs comprised of 2 to 38 sequences each (Figure 5A) and ranging in size from 370 to 6536 bp in length (Figure 5B), with 65 of the sequences in the library comprising these contigs. Identification of ORFs in the largest contigs (.4 kb) revealed 47 complete ORFs with an average length of 640 bp (Figure 6). The majority of these contigs had larger ORFs, but the seventh contig was comprised entirely of short ORFs (111?13 bp) with no significant hits and the ninth contig contained a much larger ORF (3672 bp) with similarity to a viral tape measure protein. Annotation of the ORFs showed thatFigure 3. Taxonomic classification of the sequence library. Classification of all sequences (A) and families represented in the virus sequences (B) based on significant hits (E-value #0.001) to the GenBank database using BLASTx. Numbers of sequences are in parentheses. doi:10.1371/journal.pone.0060604.gAssembly of a Viral Metagenome after FractionationTable 2. Categories of viral proteins in the sequence library.Protein category unknown oxygenase helicase/primase structural DNA polymerase exonuclease ferrochelatase DNA synthesis peptidase DNA packaging DNA methylase integrase endolysin endonuclease DNA binding heat shock protein protease transcriptional activator transferase doi:10.1371/journal.pone.0060604.tNumber of sequences 245 63 49 37 31 25 21 13 9 5 3 3 2 2 1 1 1 1observed morphologies of the viruses in the sample, which mostly resembled tailed bacteriophages in the order Caudovirales. The distant relationships of our library sequences to known viral DNA polymerase sequences suggest that the viruses in the sequenced fraction are not closely related to any previously sequenced virus, and thus information about their potential hosts cannot be inferred from the phylogenetic tree. However, the library sequences formed a well-supported clade, suggesting that the viruses in the fraction used to construct the library were relatively closely related with respect to the phylogeny of their putative DNA polymerase sequences. The phylogenetic results also show that there were viruses belonging to at least five operational taxonomic units in the sequenced fraction. While we did not directly compare the fractionated viral assemblage to the whole, unfractionated viral community, assembly of the sequence library from the fractionated sample showed that there were many more contigs generated than from comparable metagenomic analyses of whole viral assemblages [11?3,37,38]. In the latter studies, only 0.3?.5 of library sequences could be assembled into contigs with a maximum of 4 sequences per contig, whereas 65 of the sequences in our library were assembled into contigs with a maximum of 38 sequences in a contig. This sup.