Pathogen detection in milk samples by ligation detection reaction-mediated universal array method

Transcript

1 J. Dairy Sci. 92: doi: /jds American Dairy Science Association, Pathogen detection in milk samples by ligation detection reaction-mediated universal array method P. Cremonesi,* 1 G. Pisoni, M. Severgnini, C. Consolandi, P. Moroni, M. Raschetti,*# and B. Castiglioni* *Institute of Agricultural Biology and Biotechnology Italian National Research Council, Via Bassini 15, Milan, Italy Department of Veterinary Pathology, Hygiene and Public Health, University of Milan, Via Celoria 10, Milan, Italy Institute of Biomedical Technologies Italian National Research Council, Via Fratelli Cervi 93, Segrate, Milan, Italy Centro Interdipartimentale di Studi sulla Ghiandola Mammaria (CISMA), Via Celoria 10, Milan, Italy #Department of Veterinary Science and Technology for Food Safety, University of Milan, Via Trentacoste 2, Milan, Italy ABSTRACT This paper describes a new DNA chip, based on the use of a ligation detection reaction coupled to a universal array, developed to detect and analyze, directly from milk samples, microbial pathogens known to cause bovine, ovine, and caprine mastitis or to be responsible for foodborne intoxication or infection, or both. Probes were designed for the identification of 15 different bacterial groups: Staphylococcus aureus, Streptococcus agalactiae, nonaureus staphylococci, Streptococcus bovis, Streptococcus equi, Streptococcus canis, Streptococcus dysgalactiae, Streptococcus parauberis, Streptococcus uberis, Streptococcus pyogenes, Mycoplasma spp., Salmonella spp., Bacillus spp., Campylobacter spp., and Escherichia coli and related species. These groups were identified based on the 16S rrna gene. For microarray validation, 22 strains from the American Type Culture Collection or other culture collections and 50 milk samples were tested. The results demonstrated high specificity, with sensitivity as low as 6 fmol. Moreover, the ligation detection reaction-universal array assay allowed for the identification of Mycoplasma spp. in a few hours, avoiding the long incubation times of traditional microbiological identification methods. The universal array described here is a versatile tool able to identify milk pathogens efficiently and rapidly. Key words: foodborne pathogen, ligase detection reaction, deoxyribonucleic acid array, mastitis INTRODUCTION Mastitis, the inflammation of the mammary gland, is a multifactorial disease usually caused by a microbial infection, which causes primarily lower milk yield, reduced milk quality, and greater production costs. Many different contagious and environmental bacteria cause Received October 2, Accepted March 24, Corresponding author: paola.cremonesi@unimi.it 3027 mastitis, including Staphylococcus aureus, Streptococcus agalactiae, several Mycoplasma spp., Escherichia coli, Streptococcus dysgalactiae, Streptococcus uberis, and Bacillus spp. (Oviedo-Boyso et al., 2007). Staphylococcus aureus is one of the most significant pathogens causing IMI and mastitis in dairy ruminants in many countries, such as Slovenia, the Netherlands, and Italy (Pengov, 2006). Streptococcus uberis, isolated from mammary glands, is responsible for clinical and subclinical mastitis during both the nonlactating and lactation periods (Jayarao et al., 1999). There are many other streptococci that can generate unusual cases of mastitis, such as Streptococcus bovis, Streptococcus canis, Streptococcus pyogenes, and Streptococcus equi ssp. zooepidemicus (Watts et al., 1984; Devriese et al., 1999; Hassan et al., 2005; Blood et al., 2006; Pisoni et al., 2009). In addition, CNS, a group of staphylococcal species that are part of normal skin flora and that have been traditionally considered to be minor mastitis-related pathogens, have become the predominant pathogens causing subclinical or mild clinical mastitis in many countries (Honkanen- Buzalski et al., 1994; Pitkala et al., 2004). Finally, mastitis caused by Mycoplasma (especially Mycoplasma bovis in cattle) appears to be an emerging problem (Fox et al., 2005), because no effective treatment exists for this form of mastitis, and infected cows are considered positive for life. At the same time, raw milk and dairy products made from nonpasteurized milk have been responsible for Staph. aureus, Campylobacter, Salmonella, Bacillus, and E. coli outbreaks and could also represent a hazard for consumers (Allos, 2001; Granum, 2001; Little et al., 2008). Escherichia coli and related species (Bottero et al., 2004), Salmonella (Jayarao et al., 2006), and Bacillus cereus (Wong et al., 1988) are well-known organisms that are potentially very dangerous for human health. Pathogen identification has traditionally been based on microbiological techniques that are time-consuming, generally requiring at least 2 to 3 working days and, in some conditions, not having a satisfying diagnostic sensitivity. For example, isolation and microbiologi-

2 3028 Cremonesi et al. cal identification of M. bovis require different specific growth media and long incubation times (Gonzàlez and Wilson, 2003). From a clinical point of view, the limited diagnostic sensitivity of bacteriology is not sufficient to exclude any infection because a negative bacteriological result does not necessarily mean that the corresponding quarter is actually free of a pathogen infection. Consequently, a considerable number of infected cows remain undetected, and the control and eradication of pathogen mastitis in herds are therefore difficult. The development of PCR-based methods has provided a promising option for the rapid detection of bacteria, even in cases of difficult-to-detect subclinical mastitis. In recent years, DNA microarray technology has played an increasingly important role in bacterial studies, including detection and quantitative analyses (Castiglioni et al., 2004; Pingle et al., 2007). The DNA microarrays were developed for simultaneous analysis of Staph. aureus enterotoxin genes, Listeria spp., Campylobacter spp., and Clostridium perfringens toxin genes (Sergeev et al., 2004). More recently, Wang et al. (2007) described a 16S rrna oligonucleotide microarray to identify 22 common pathogenic species, whereas Kim et al. (2008) developed a microarray for the detection of the main foodborne pathogens. In this study, a microarray platform, based on the discriminative properties of the DNA ligation detection reaction (LDR), associated with a universal tag-array (UA), was developed and used to identify pathogens such as Staph. aureus, Strep. agalactiae, nonaureus staphylococci (NAS), Strep. bovis, Strep. equi, Strep. canis, Strep. dysgalactiae, Streptococcus parauberis, Strep. uberis, Strep. pyogenes, Mycoplasma spp., Salmonella spp., Bacillus spp., Campylobacter spp., and E. coli and related species. DNA Samples MATERIALS AND METHODS The specificity and sensitivity of the LDR probe pairs were tested with DNA extracted from 22 American Type Culture Collection (ATCC) reference strains (LGC Promochem, Middlesex, UK) or isolates listed in Table 1, using the DNA extraction protocol described in Cremonesi et al. (2006). The LDR probe pairs were further evaluated with DNA extracted from a total of 50 milk samples (42 bovine, 2 ovine and 3 caprine mastitic samples plus 3 bovine milk samples from healthy cows), also collected for bacteriological examinations. For the specificity and microarray validation experiments, 50 fmol of the purified 16S rrna PCR product were used. For the sensitivity assays, 50, 25, 12.5, and 6 fmol of the PCR product was used. Bacteriological Procedures Bacteriological culturing of quarter and udder half milk samples was performed according to standards of the National Mastitis Council (1999). Briefly, 10 μl of each milk sample was spread on blood agar plates (5% defibrinated sheep blood, Microbiol Diagnostici, Cagliari, Italy). The plates were incubated aerobically at 37 C and examined after 24 and 48 h. The colonies were provisionally identified on the basis of gram stain, morphology, and hemolysis patterns and the numbers of each colony type were recorded. The representative colonies were then subcultured on blood agar plates and incubated aerobically at 37 C for 24 h to obtain pure cultures. Production of catalase and coagulase was tested for gram-positive cocci. Specific identifications of staphylococci and streptococci were made using commercial micromethods (API Staph ID32 system and API 20 Strep, respectively, biomérieux, Rome, Italy). Gram-negative isolates were identified by use of colony morphology, gram-staining characteristics, oxidase, and biochemical reactions on MacConkey s agar and API 20E (biomérieux). Ten microliters of milk was also plated onto a Hayflick agar plate (Brown et al., 1990) containing 15% horse serum; agar plates were incubated at 35 C with 5% CO 2 and examined using a 40 stereomicroscope for Mycoplasma growth every 48 h until such time as colonies were obtained, or for 10 d if no colonies were identified. PCR Amplifications from DNA Samples The 16S rrna gene was amplified with universal primers 16S27F (5 -AGA GTT TGA TCC TGG CTC AG-3 ) and R1492 (5 -TAC GGY TAC CTT GTT ACG ACT T-3 ; Edwards et al., 1989). The PCR amplifications were performed with a GeneAmp PCR system 9700 thermal cycler (Applied Biosystems, Foster City, CA). The reaction mixtures included 0.85 μl of each primer (15 μm), 10 μl of 2 PyroStart Fast PCR Master Mix (Fermentas, M-Medical SRL, Milan, Italy), and 10 to 15 ng of genomic DNA in a final volume of 20 μl. Before amplification, DNA was denatured for 1 min at 95 C. Amplification consisted of 30 cycles at 94 C for 1 s, 61 C for 5 s, and 72 C for 38 s. After the cycles, an extension step (10 s at 72 C) was performed. The PCR products were purified by using a Wizard SV gel and PCR Clean-Up System purification kit (Promega Italia, Milan, Italy), according to the instructions of the manufacturer, eluted in 20 μl of molecular grade Journal of Dairy Science Vol. 92 No. 7, 2009

3 UNIVERSAL ARRAY FOR PATHOGEN DETECTION 3029 Table 1. Origin, signal-to-noise-ratio (SNR), and t-test unadjusted P-values for ligation detection reaction probe pair specificity tests 1 Organism Species Bacterial collection Probe name SNR-specific probe Average SNR of other probes (n = 14) P-value for the specific probe Bacillus Bacillus cereus ATCC Bacillus spp E E-09 Campylobacter Campylobacter coli ATCC 1061 Campylobacter spp._ E-07 Campylobacter spp._ E-04 Campylobacter lari ATCC BAA1060 Campylobacter spp._ E-06 Campylobacter spp._ E-08 Campylobacter jejuni ATCC Campylobacter spp._ E E-06 Campylobacter spp._ E E-05 Escherichia coli E. coli O157 ATCC E. coli_et_rel E E-09 Mycoplasma M. agalactiae DIPAEE 2 Mycoplasma spp E E-04 M. arginini DIPAEE E-04 M. bovis DIPAEE E E-12 M. capri DIPAEE E-06 M. capricolum DIPAEE E-04 M. mycoides DIPAEE E-05 Salmonella Salmonella enteritidis ATCC Salmonella spp E E-11 Staphylococcus Staph. aureus ATCC Staph. aureus E E-10 Staph. epidermidis ISPA-CNR 3 NAS E E-06 Staph. sciuri DIPAV E E-05 Staph. hemolyticus DIPAV E E-05 Streptococcus Strep. agalactiae ATCC BAA611 Str. agalactiae E E-07 Strep. uberis ATCC 9927 Str. uberis E E-11 Strep. parauberis DMS 6632 Str. parauberis E E-10 Strep. bovis DIPAV Str. bovis E E-06 Strep. canis DIPAV Str. canis E E-10 Strep. dysgalactiae DIPAV Str. dysgalactiae E E-08 Strep. equi DIPAV Str. equi E E-07 Strep. pyogenes ATCC Str. pyogenes E E-14 1 For each sample, the bacterial collection reference, the SNR of the specific probe, and the SNR of all the remaining (nonspecific) probes are reported. P-values are calculated as the results of a one-sided t-test comparison between the intensity of fluorescence of each probe and those of the Blank spots. 2 Department of Livestock Production, Epidemiology and Ecology, Faculty of Veterinary Medicine, University of Turin, Turin, Italy. 3 Institute of Sciences of Food Production Italian National Research Council, Milan, Italy. 4 NAS = nonaureus staphylococci. 5 Department of Animal Pathology, Hygiene and Veterinary Public Health, University of Milan, Milan, Italy. water, and quantified by capillary electrophoresis on an Agilent BioAnalyzer 2100 (Agilent Technologies, Palo Alto, CA). LDR Probe Design for the 16S rrna Gene We designed specific LDR probe pairs for the 16S rrna gene sequences of 15 different bacterial groups: Staph. aureus, Strep. agalactiae, NAS, Strep. bovis, Strep. equi, Strep. canis, Strep. dysgalactiae, Strep. parauberis, Strep. uberis, Strep. pyogenes, Mycoplasma spp., Salmonella spp., Bacillus spp., Campylobacter spp., and E. coli and related species. Probe pairs for Strep. equi and Strep. dysgalactiae were designed to identify any of the 2 subspecies of each. For each of these groups, 16S Journal of Dairy Science Vol. 92 No. 7, 2009

4 3030 Cremonesi et al. Figure 1. Schematic of the grouping and clustering of the 16S rrna gene sequences. For each node, the number of sequences used from the Ribosomal Database Project II database (length >1,200 bp and good quality) is reported. The dimension of the triangles is proportional to the number of sequences that were clustered together. The phylogenetic tree was built on the 15 consensus sequences by using the neighbor-joining algorithm and Phylip tree type. The scale represents the number of differences between sequences (e.g., 0.1 means 10% differences between 2 sequences). NAS = nonaureus staphylococci. rrna sequences longer than 1,200 bp and flagged as being of good quality, chosen among those available in the Ribosomal Database Project (RDP) II, release 9.51 ( were selected. A total of 744 sequences were used for the consensus determination and probe design (Figure 1). Sequences belonging to the same subgroups were then aligned together by ClustalW (Chenna et al., 2003) and imported into the Oligonucleotide Retrieving for Molecular Applications (ORMA) tool (M. Severgnini, P. Cremonesi, C. Consolandi, B. Castiglioni; and G. Caredda and G. De Bellis of the Institute of Biomedical Technologies Italian National Research Council, Segrate; unpublished data). This software is based on a set of scripts under Matlab (Mathworks, Natick, MA) to retrieve positions discriminating each sequence from the others, design optimal probes on the found positions, and quality filter the candidates, driving the user to the determination of thermodynamically optimized oligonucleotides. The consensus sequences for each subgroup were obtained according to the algorithm by Ludwig et al. (2004), setting a cutoff threshold of 40% and substituting, where necessary, multiple bases above the given threshold with the corresponding International Union of Pure and Applied Chemistry ambiguity code. The sequences were then realigned with ClustalW and used in the ORMA tool for probe design. We required the melting temperature to be within the 66 to 68 C range, with a length of between 25 and 50 bases; a maximum of 5 degenerated bases per oligonucleotide were allowed. Three rounds of design were performed: 1) Salmonella spp. was aligned against the E. coli sequence only; 2) Strep. canis and Strep. pyogenes were aligned against Streptococcus group sequences only (i.e., all Streptococcus species included in the LDR assay); and 3) all other probe pairs were selected considering the alignment of all the species. One probe per species or genus was designed, except for Campylobacter spp., for which 2 different positions were tested, to compare and verify their specificity and performances. The specificity of the probe pairs was checked with both the Probe Match tool in RDP II database and BLAST (Basic Local Alignment Search Tool) analysis (Altschul et al., 1997). Each common probe was synthesized to have a complementary ZipCode (czipcode) affixed to the 3 end, and a phosphate modification to the 5 end. A Cy3 label was attached to the 5 end of the discriminating probes. In LDR, if both probes are annealed to the target sequence, the fluorescent dye Journal of Dairy Science Vol. 92 No. 7, 2009

5 UNIVERSAL ARRAY FOR PATHOGEN DETECTION 3031 and the czipcode are linked into the same molecule. In hybridization, a czipcode pairs with its corresponding ZipCode, addressing the fluorescent signal to a specific spot on a glass slide. All the oligonucleotides (Table 2) were synthesized by Thermo Electron GmbH (Ulm, Germany). When a probe sequence contained an ambiguity code, this base was replaced with inosine during oligonucleotide synthesis. UA Preparation Phenylen-diisothiocyanate-activated chitosan glass slides were used as surfaces for the preparation of microarrays (Consolandi et al., 2006). Microarrays were prepared as described in Castiglioni et al. (2004), using a MicroGrid II Compact (Biorobotics, Cambridge, UK) contact pin spotter equipped with Silicon Microarray pins (Parallel Synthesis Technologies, Santa Clara, CA). For the present study, 17 ZipCodes, randomly selected from the 49 of the entire UA layout described in Chen et al. (2000), were assigned for the recognition of bacterial group based on the 16S rrna gene. ZipCode 66 (hybridization control) and ZipCode 63 (ligation control) were used to locate the submatrixes during the scanning and to verify the success of the ligation procedure, respectively. LDR, UA Hybridization, and Signal Detection All the group-specific probe pairs were added into an oligo-mix, at a concentration of 1 µm each. The oligo-mix also contained a discriminating and a common probe specific for the synthetic oligonucleotide used as the LDR control (5 -AGC CGC GAA CAC CAC GAT CGA CCG GCG CGC GCA GCT GCA GCT TGC TCA TG-3 ). The LDR was carried out in a final volume of 20 μl containing 1 μl of 10 Pfu DNA Ligase Buffer (Stratagene, La Jolla, CA), 1 μl of oligo-mix, 1 μl of Pfu DNA ligase (Stratagene), and 0.1 μl of the synthetic oligonucleotide. The reaction mixture was preheated for 5 min at 94 C and was then cycled for 30 rounds at 94 C for 30 s and at 65 C for 4 min. Hybridization, slide washing, and data acquisition were performed following the procedures described in Castiglioni et al. (2004). Briefly, the hybridization mixture had a total volume of 65 μl and contained 20 μl of LDR mixture, 5 saline sodium citrate (SSC, Sigma-Aldrich, Milan, Italy), 0.1 μm Cy3-labeled czip- Code 66 (complementary to ZipCode 66), and 0.1 mg/ ml of salmon sperm DNA. After heating at 94 C for 2 min and chilling on ice, the hybridization mixture was applied to the slide, on which the 8 arrays were separated by Press-To-Seal silicone isolators (1.0 9 mm; Schleicher and Schuell BioScience, Dassel, Germany). By using 8 subarrays per slide, associated with a multichamber hybridization system, 8 samples were run in parallel on the same slide. Hybridization was carried out in a dark chamber at 65 C for 1 h 30 min in a temperature-controlled water bath. After hybridization, the slide was washed at 65 C for 15 min in prewarmed 1 SSC. Finally, the slide was dried by spinning at 80 g for 3 min. The fluorescent signals were acquired at a 5-μm resolution by using a ScanArray Lite laserscanning system (PerkinElmer Life and Analytical Sciences, Boston, MA) with a green laser for Cy3 dye (λ ex, 543 nm; λ em, 570 nm). The power of both the laser and the photomultiplier tube was set between 70 and 95%, depending on the signal intensities, avoiding saturation in any case. Data Analysis Intensities of fluorescence (IF) were quantified by ScanArray Express 3.1 software (PerkinElmer), using the adaptive circle option, allowing diameters between 60 and 300 μm. No normalization was performed on IF because the experiments were analyzed independently of one another. All statistical tests and data analyses were performed in Matlab. To assess whether a probe was significantly above the background, a one-sided Student t-test (α = 0.01) was performed, using the IF of Blank spots (i.e., with no oligonucleotide spotted, n = 6), plus 2 times their standard deviations, as a conservative estimation of the null distribution. For each ZipCode, we considered the population of the IF of all the replicates (n = 4) and tested it for being significantly above the null distribution (H 0 : μ test = μ null ; H 1 : μ test > μ null ). The signal-to-noise ratio (SNR) was calculated as the ratio between the average IF of each probe and the average Blank spots for IF; coefficient of variation percentages represent the percentage ratio of the standard deviation of IF on their average values. RESULTS Group-Specific SNP Identification The ORMA probe design tool (M. Severgnini, P. Cremonesi, C. Consolandi, B. Castiglioni; and G. Caredda and G. De Bellis of the Institute of Biomedical Technologies Italian National Research Council, Segrate; unpublished data) was used in our study for SNP identification and LDR probe design on 16S rrna, allowing the precise determination and detection of the main pathogens infecting cattle, sheep, and goats or those responsible for foodborne intoxication or infection, or both. The BLAST analysis and the Probe Match tool in the RDP II database were used to confirm in silico Journal of Dairy Science Vol. 92 No. 7, 2009

6 3032 Cremonesi et al. Table 2. List of group-specific probes and corresponding ZipCodes Probe pair name Discriminating probe 1 (5 3 ) Common probe 2 (5 3 ) SNP position ZipCode 3 Staph. aureus TAA TAC CGG ATA ATA TTT TGA ACC GCA TGG TTC AAA AGT GAA AGA CGG TC Strep. canis CGT AAA GCT CTG TTG TTA GAG AAG AAC GGT AAT GGG AGT GGA AAA C Strep. dysgalactiae GGT CTA GAG ATA GGC TTT CCC TTC GGG GCA GG Salmonella spp. CCA TCA GAT GTG CCC AGA TGG GAT TAG CTT GTT GGT GA Campylobacter spp_179 CCC TAC ACA AGA GGA CAA CAG TTG GAA ACG ACT GCT AAT ACT Campylobacter spp_234 CTC TAT ACT CCT GCT TAA CAC AAG TTG AGT AGG GAA AGT TTT TCG G Bacillus spp. CTA AGT GTT AGA GGG TTT CCG CCC TTT AGT GCT GAA GT Strep. equi CTA ATA CCG CAT AAA AGT GGT TGA CCC ATG TTA ACI ATT TAA AAG GAG CAA CA Strep. agalactiae CGT GCC TAA TAC ATG CAA GTA GAA CGC TGA IGT TTG GTG TTT A Strep. bovis GCG TGC CTA ATA CAT GCA AGT AGA ACG CTG AAG ACT TTA GCT TGC TAA Strep. parauberis GCC GTG GCT CAA CCA TGG TTC GCT TTG GAA ACT GGA T Strep. uberis AAT ACC GCA TGA CAA TAG GGT ACA CAT GTA CCC TAT TTA AAA GGG GCA AA Mycoplasma spp. CGI CGC AGC TAA CGC ATT AAA TGA TCC GCC TGA GT NAS 4 CCG GAG CTA ATA CCG GAT AAT ATI TIG AAC CGC ATG GTT CIA T Strep. pyogenes GCA GGC GGT TTT TTA AGT CTG AAG TTA AAG GCA TTG GCT CAA CCA A E. coli_et_rel GGG TTG TAA AGT ACT TTC AGC GGG GAG GAA GGG AGT AAA GTT AAT AC TTG CTG TCA CTT ATA GAT GGA TCC GCG CTG CAT TAG CTA GTT GGT AA CCA TTA TGT GAC GGT AAC TAA CCA GAA AGG GAC GGC TAA CTA C AGT GAC AGG TGG TGC ATG GTT GTC GTC AGC TCG GGT AAC GGC TCA CCA AGG CGA CGA TCC CTA G CTA TAC TCC TGC TTA ACA CAA GTT GAG TAG GGA AAG TTT TTC GGT GTA TGT AGG ATG AGA CTA TAT AGT ATC AGC TAG TTG GTI AGG TAA TGG C TAA CGC ATT AAG CAC TCC GCC TGG GGA GTA CG GCT CCA CTA TGA GAT GGA CCT GCG TTG TAT TAG CTA GTT CAC TAG ACT GAT GAG TTG CGA ACG GGT GAG TAA CG AGT TGG AAG AGT TGC GAA CGG GTG AGT AAC GCG TAG GTA A AAC TTG AGT GCA GAA GGG GAG AGT GGA ATT CCA TGT GTA GC TGC TTC ACT ATG AGA TGG ACC TGC GTT GTA TTA GCT AGT TGG TAA GGT AA AGT AIG ITC GCA AGA ITI AAA CTT AAA GGA ATT GAC GGG GAI CC AGT GAA AGA IGG TTT TGC TIT CAC TTA TAG ATG GAI CCG CGC TGT ACG CTT TGG AAA CTG GAG AAC TTG AGT GCA GAA GGG CTT TGC TCA TTG ACG TTA CCC GCA GAA GAA GCA CCG G A Cy3 label attached to the 5 end; discriminating nucleotides at the 3 end are in bold face. 2 Designed to anneal to the target sequence immediately adjacent to the discriminating probe. 3 ZipCode numbers as in Chen et al. (2000). 4 NAS = nonaureus staphylococci. Journal of Dairy Science Vol. 92 No. 7, 2009

7 UNIVERSAL ARRAY FOR PATHOGEN DETECTION 3033 the specificity of the selected candidates. However, the NAS probe pair identified mostly the following species: Staphylococcus epidermidis, Staphylococcus hemolyticus, and Staphylococcus lugdunensis. Moreover, the probe pair designed for the E. coli group targeted some Shigella strains (data not shown). Because Shigella can also be found in milk samples and can potentially be harmful for human health, we decided to keep the probe pair in the assay, renaming it E. coli_et_rel. Where necessary, multiple rounds of design were performed to maximize the quality of the probe pairs found. In particular, the Salmonella spp. probe, whose consensus resulted in values significantly different from all the other species, was obtained by comparing its consensus with only the E. coli and related species consensus sequence. Specificity of the Probe Pairs The specificity of the probe pairs was tested separately with the 16S rrna products on a total of 22 ATCC or isolated bacterial strains (Table 1). Several examples of these results are also shown in Figure 2. No signal was detected using the Listeria monocytogenes strain ATCC as negative control (data not shown). All the LDR-UA experiments were replicated by duplicate independent LDR. Table 1 reports the SNR and the P-values of the t-tests of the assays for the probe pairs on single PCR product target. The procedure showed optimal specificity, with excellent signal-to-noise ratios. Results were in complete concordance with identification of ATCC strains and microbiological isolates: only probes associated with the expected species were present (P-values always <0.005), whereas all remaining probes were well below any acceptable P-value for the t-test. Hybridization and ligation control probes were always significantly present (data not shown). The SNR of the probes called as significantly over the background varied from 4.31 to 238.3, with an average of 37.98; at the same time, SNR of the nonsignificantly over-background spots varied between 0.12 and 1.55, with an average of 0.65 ± Both Campylobacter spp. probes performed nearly the same in terms of specificity, with P-values below Reproducibility and homogeneity of hybridization was excellent as well, with an average coefficient of variation percentage among replicates for the present probes of ± 7.35%. Sensitivity of the Probe Pairs To test the sensitivity of the method and the correlation between signal intensity and template concentration, the assay was evaluated by separate LDR on serial dilutions from 50 to 6 fmol of M. bovis, Strep. agalactiae, Strep. pyogenes, and Staph. aureus 16S rrna-pcr products. Figure 3 shows the SNR and the P-values (in logarithmic scale) of the serial dilutions of the 4 pathogen species. Considering a threshold value of 0.01 for significance of the presence of a probe, M. bovis, Strep. agalactiae, and Strep. pyogenes revealed a detection limit of 6 fmol, whereas Staph. aureus was limited to 12 fmol. As expected, SNR and P-values were inverselinearly correlated (Pearson correlation between 0.81 and 0.97), whereas the correlation between SNR and PCR product concentration or P-values and PCR product concentration are slightly lower (range 0.63 to 0.79 and 0.59 to 0.85, respectively). Evaluation of the Probe Pairs with Milk Samples The performance of the LDR probe pairs was evaluated using DNA extracted from 50 milk samples, 47 of which were from mastitic animals. With bacteriological analysis of the 50 milk samples, 9 samples were found to be positive for Staph. aureus, 24 were positive for Streptococcus spp., 5 were positive for Staphylococcus spp., 9 were positive for Mycoplasma spp., and 3 were negative controls. The LDR results obtained from the analysis of these samples revealed a good correlation with those obtained from the microbiological assays (Figure 4), also showing the presence of Bacillus spp. in 2 mastitic milk samples and specifying the presence of Strep. uberis (8), Strep. agalactiae (6), Strep. dysgalactiae (3), Strep. equi (2), Strep. bovis (4), and Strep. canis (3) for the Streptococcus spp., and NAS (7) for the Staphylococcus spp. Nine samples were positive for Staph. aureus, whereas 11 were positive for Mycoplasma spp. Finally, 3 samples were negative with both bacteriological and LDR analyses. DISCUSSION The objective of this work was to develop an assay that could rapidly identify the main pathogens known to cause mastitis in cattle, sheep, and goats or those responsible for foodborne intoxication or infection, or both. The bacterial pathogens on the assay panel were chosen to include 1) the organisms most frequently isolated from mastitis samples, such as Staph. aureus, Strep. agalactiae, Strep. uberis, and some NAS; 2) those that have been described recently as a contagious cause of subclinical mastitis, such as Mycoplasma spp. (Fox et al., 2005); and 3) those potentially dangerous for human health, such as E. coli and related species (Bottero et al., 2004), Salmonella spp. (Jayarao et al., 2006), Staph. aureus (De Buyser et al., 2001), B. cereus (Wong et al., Journal of Dairy Science Vol. 92 No. 7, 2009

8 3034 Cremonesi et al. Figure 2. A) Universal array scheme. Each ZipCode is spotted in quadruplicate; oligonucleotides associated with actual probe pairs are highlighted in gray. ZipCodes 66 and 63 are the hybridization and ligation controls, respectively. Blank spots are negative controls (no oligo; only spotting buffer). B) Table specifying the bacterial groups with the corresponding ZipCode. C) Specificity results of the amplified 16S rrna gene from Staphylococcus aureus (ATCC 19095, ZipCode 2), Salmonella enteritidis (ATCC 13076, ZipCode 5), Campylobacter coli (ATCC 1061, ZipCodes 8 and 9), Bacillus cereus (ATCC 14579, ZipCode 10), Streptococcus agalactiae (ATCC BAA611D, ZipCode 15), Streptococcus uberis (ATCC 9927, ZipCode 19), Mycoplasma bovis (ATCC 25523, ZipCode 20), and Escherichia coli O157 (ATCC , ZipCode 28). Each bacterial group has 4 replicate spots. ZipCode numbers of the specific probes are reported under each replicate. Lighted spots at the corners represent the hybridization and ligation controls (spots 66 and 63), as in the universal array scheme (A). NAS = nonaureus staphylococci. 1988), and Strep. equi ssp. zooepidemicus (Bergonier et al., 1999). In the last decade, many molecular diagnostic methods have been developed for direct detection of the specific genes of bacteria in clinical and food samples, and many species- or genus-specific assays have been developed for such detection. Some of them are PCRbased assays (Hsu and Tsen, 2001) in which the discrimination of the target species is accomplished by the specific amplification of a unique region. Recently, a commercial real-time PCR-based assay that identifies 11 major pathogen species or groups responsible for IMI and a gene coding for staphylococcal β-lactamase production was validated for its analytical specificity and sensitivity (Koskinen et al. 2009). Other authors have integrated the PCR assay, performing a postamplification hybridization of the target sequence to a very small number of oligonucleotides on membrane (Lin and Tsen, 1999) or glass arrays (Sergeev et al., 2004). Generally, all these assays have targeted a limited number of species (i.e., most of them focused on only 1 and never on more than 6 species) and have required careful optimization for finding the appropriate conditions to amplify the specific target, with subsequent increases in detection time and overall assay duration. These hindrances have been partially circum- Journal of Dairy Science Vol. 92 No. 7, 2009

9 UNIVERSAL ARRAY FOR PATHOGEN DETECTION 3035 Figure 3. The signal-to-noise ratios (SNR; top, black line) and the unadjusted P-values (bottom, gray line, logarithmic scale) of the serial dilutions of Mycoplasma bovis (ATCC 25523), Streptococcus pyogenes (ATCC 12344), Staphylococcus aureus (ATCC 19095), and Streptococcus agalactiae (ATCC BAA611D). The x-axis reports the quantity of PCR product (fmol). The threshold for significance (P < 0.01) is drawn as a dashed line. vented in most recent works, which have targeted 11 or 13 pathogenic species at a time by analyzing the 16S rrna gene (Wang et al., 2007) or designing the probes by means of comparative genomics (Kim et al., 2008), respectively. However, these assays have not been able to discriminate Streptococcus at the species level and have required probes on different genes for detecting E. coli and Salmonella (Wang et al., 2007) or have shown some degree of aspecificity in detecting Staphylococcus spp., requiring at least 10 h to complete and 500 ng of genomic DNA (Kim et al., 2008). In this work, the molecular procedure chosen to discriminate among the different pathogen groups was the LDR-UA approach. The PCR-LDR-UA assay, including DNA extraction, PCR amplification, LDR, and data analysis, can be completed in less than 7 h. This assay is based on the properties of DNA LDR and requires 2 specific probes for each target sequence, as first described by Gerry et al. (1999). The use of the LDR in addition to PCR greatly reduces the chances of false positives and results in high specificity, because the procedure is based on the discriminating power of the thermostable ligase that joins the ends of the 2 adjacent oligonucleotides designed for a given target [discriminating probe (DS) or common probe]. The 3 end of the DS lies on the base that discriminates the target from among the others, whereas its 5 end bears a fluorescent reporter. The common probe anneals immediately downstream of the discriminating position and has a phosphorylated 5 end. Only when there is Journal of Dairy Science Vol. 92 No. 7, 2009

10 3036 Cremonesi et al. Figure 4. Heat map of unadjusted P-values for ligation detection reaction experiments on 50 milk samples. The map (on the left) indicates the P-value of the one-sided t-tests. No multiple comparisons correction has been made. The white color indicates a P-value of or less, whereas the black color indicates a P-value of 0.05 or more. Threshold value of 0.01 is highlighted in light gray. Microbiological characterization and source of the samples are provided on the right. NAS = nonaureus staphylococci. a perfect match between the 2 oligonucleotides does a thermostable ligase join the 2 ends and can the ligation process take place; an artificial sequence (called the ZipCode; Chen et al., 2000), added at the 3 end, then drives the ligated product to a specific position on the UA, where it can be detected. Use of terminal 3 -specific DS to distinguish a species among the others maximizes the selectivity of the ligation process, and thus of the entire procedure. Another advantage of using LDR coupled with PCR in this context is the multiplexing capability of LDR. In the UA-based approach, the optimization of hybridization conditions Journal of Dairy Science Vol. 92 No. 7, 2009

11 UNIVERSAL ARRAY FOR PATHOGEN DETECTION 3037 for each probe set is not required because the melting temperatures of the ZipCodes are nearly the same (about 68 C). New probe pairs can be added to the array without further optimization, thus reducing costs and setup time (Castiglioni et al., 2004; Chessa et al., 2007). In the current UA layout, up to 48 different species can be targeted independently, but there are no particular hindrances in increasing this number to 100. Moreover, this method can be used to analyze 8 different milk samples simultaneously, reducing the cost of analysis. Proper design of probes is the key factor for successful oligonucleotide-based assays, relying either on hybridization or on enzymatic procedures. The 16S rrna is the most commonly used gene for designing oligonucleotide arrays because it has large conserved regions, useful for alignment and universal primer design and species-specific variations, capable of distinguishing genera and well-resolved species (Plays et al., 1997) while avoiding the use of several primer sets that might result into a poor amplification efficiency, and thus generating false-positive results (Wilson et al., 2002). The RDP II and BLAST were used in our study to achieve probes that showed optimal specificity and sensitivity for the target species. In this work, we aimed for precise discrimination of similar species based on 16S rrna pathogen-specific sequences. Our experimental data, analyzed by the typical P-value criterion on a t-distribution pattern (<0.01) to determine hybridization differences, confirmed the high specificity of the LDR assay. In most experiments, the calculated P-value was well below 0.01 (Table 1), indicating that our detection platform was statistically effective for discriminating between target pathogens. Significantly, the designed probes correctly differentiated the Streptococcus spp. causing mastitis: 1 probe was designed for the following species of the Streptococcus group (Strep. agalactiae, Strep. equi, Strep. dysgalactiae, Strep. uberis, Strep. parauberis, Strep. bovis, Strep. canis) to determine the prevalence of different species in dairy herds. The PCR-LDR-UA assay allowed for discrimination between Strep. uberis and Strep. parauberis on the basis of a single PCR on the 16S rrna gene, avoiding the use of species-specific PCR primers (Hassan et al., 2005). Probe pairs for Strep. equi and Strep. dysgalactiae were designed to identify any of the 2 subspecies of each. The discrimination of Strep. equi ssp. zooepidemicus from Str. equi ssp. equi and Str. dysgalactiae ssp. equisimilis from Str. dysgalactiae ssp. dysgalactiae is not possible with the present set of LDR probe pairs. A further refinement of the assay would be the design and testing of specific probe pairs for the zoonotically important subspecies. Moreover, the diagnosis of Mycoplasma spp., known to cause mastitis in cattle, sheep, and goats, is most commonly made by microbiological procedures, using different specific growth media and long incubation times (González and Wilson, 2003). Thus, isolation and identification of Mycoplasma spp. are notoriously difficult and timeconsuming. Strains of M. bovis, Mycoplasma agalactiae, Mycoplasma arginini, Mycoplasma capri, Mycoplasma capricolum, and Mycoplasma mycoides can be clearly identified as belonging to the Mycoplasma genus by the specific probe pair in the PCR-LDR-UA assay, providing an improved diagnostic method for this pathogen. In our study, we showed that carefully designed specific oligonucleotide probes can distinguish Salmonella spp. from E. coli and related species (Table 1), which share large stretches of homologous regions within the amplified segments of the 16S rrna gene. However, our probe pair, initially designed to be specific only for E. coli strains, was found in silico to recognize some Shigella strains as well, confirming that they cannot be distinguished on the basis of the 16S rrna gene, as stated by Wang et al. (2007). From the probe sensitivity experiments, the PCR- LDR-UA assay revealed a detection limit for Staph. aureus down to 12 fmol of PCR product, well below the limit imposed by the European Union [Commission Regulation (EC) No. 2073/2005; eu/joindex.do?ihmlang = en; i.e., 500 cfu/ml, corresponding to approximately 150 fmol]. The sensitivity for 3 other species (M. bovis, Strep. pyogenes, Strep. agalactiae) was even better, resulting in 6 fmol. Generally, the presence of typical endogenous microflora might mask the detection of foodborne pathogens. To determine the ability of the PCR-LDR-UA assay to identify pathogens in mixed bacterial populations, we prepared and analyzed artificially unbalanced DNA mixtures of different microbial species, obtaining efficient and highly specific detections (data not shown). Finally, this method was evaluated using 50 bovine, ovine, and caprine milk samples; all the obtained bacterial identifications were in agreement with the result of the microbiological methods; moreover, the presence of other pathogens was assessed. For example, sample number 30, which was found to be positive for Staph. aureus by microbiological test, also revealed the presence of Mycoplasma spp., Strep. uberis, and Strep. agalactiae by PCR-LDR-UA. In conclusion, we demonstrated the great potential of the DNA chip described herein for rapid, sensitive, and reliable identification, directly from milk samples, of the main pathogens known to cause mastitis in cattle, sheep, and goats or those responsible for foodborne intoxication or infection, or both. Further, the specific target spectra produced by this gene chip may be gradually expanded through addition of newly designed Journal of Dairy Science Vol. 92 No. 7, 2009

12 3038 Cremonesi et al. oligonucleotide probes into the microarray, without the optimization of hybridization conditions. In microbiological contexts, pathogen detection by means of DNA microarrays could be integrated into routine milk monitoring. The estimated unit cost is approximately 1 per sample, including the oligonucleotide probes and all the reagents. The initially high costs for the eventual setup of a microarray laboratory (because of the spotting robot and the laser scanner) can be amortized over the medium term (approximately 3 yr). Moreover, many facilities offer the spotting and scanning service for reasonable costs. The LDR-UA assay, offering the power of a microarray format, will be expected to facilitate the analyses of several pathogens at the same time in the near future. ACKNOWLEDGMENTS This work was supported by a grant from Regione Lombardia (contract no. 962 Safe Milk ). We are grateful to Giada Caredda (Institute of Biomedical Technologies Italian National Research Council, Segrate, Italy) and Stefania Chessa (Department of Veterinary Science and Technology for Food Safety, University of Milan, Milan, Italy) for their technical assistance. REFERENCES Allos, B. M Campylobacter jejuni infections: Update on emerging issues and trends. Clin. Infect. Dis. 32: Altschul, S. F., T. L. Madden, A. A. Schäffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25: Bergonier, D., X. Berthelot, M. Romeo, A. Contreras, V. Coni, E. De Santis, S. Roselu, F. Barillet, G. Lagriffoul, and J. Marco Fréquence des différents germes responsables de mammites cliniques et subcliniques chez les petits ruminants laitiers. Pages in Milking and Milk Production of Dairy Sheep and Goats. F. Barillet and P. Zervas, ed. Wageningen Pers, Wageningen, the Netherlands. Blood, D. C., V. P. Studdert, and C. C. Gay Saunders Comprehensive Veterinary Dictionary. 3rd ed. Saunders Ltd./ Elsevier Inc., Amsterdam, the Netherlands. Bottero, M. T., A. Dalmasso, D. Soglia, S. Rosati, L. Decastelli, and T. Civera Development of a multiplex PCR assay for the identification of the pathogenic genes of Escherichia coli in milk and milk products. Mol. Cell. Probes 18: Brown, M. B., J. K. Shearer, and F. Elvinger Mycoplasmal mastitis in a dairy herd. J. Am. Vet. Med. Assoc. 196: Castiglioni, B., E. Rizzi, A. Frosini, K. Sivonen, P. Rajaniemi, A. Rantala, M. A. Mugnai, S. Ventura, A. Wilmotte, C. Boutte, S. Grubisic, P. Balthasart, C. Consolandi, R. Bordoni, A. Mezzelani, C. Battaglia, and G. De Bellis Development of a universal microarray based on the ligation detection reaction and 16S rrna gene polymorphism to target diversity of cyanobacteria. Appl. Environ. Microbiol. 70: Chen, J., M. A. Iannone, M.-S. Li, D. Taylor, P. Rivers, A. J. Nelsen, K. A. Slentz-Kesler, A. Roses, and M. P. Weiner A microsphere-based assay for multiplexed single nucleotide polymorphism analysis using single base chain extension. Genet. Res. 10: Chenna, R., H. Sugawara, T. Koike, R. Lopez, T. Gibson, D. G. Higgins, and J. D. Thompson Multiple sequence alignment with the Clustal series programs. Nucleic Acids Res. 31: Chessa, S., F. Chiatti, G. Cerotti, A. Caroli, C. Consolandi, G. Pagnacco, and B. Castiglioni Development of a single nucleotide polymorphism genotyping microarray platform for the identification of bovine milk protein genetic polymorphisms. J. Dairy Sci. 90: Consolandi, C., M. Severgnini, B. Castiglioni, R. Bordoni, A. Frosini, C. Battaglia, L. R. Bernardi, and G. De Bellis A structured chitosan-based platform for biomolecule attachment to solid surfaces: Application to DNA microarray preparation. Bioconjug. Chem. 17: Cremonesi, P., B. Castiglioni, G. Malferrari, I. Biunno, C. Vimercati, P. Moroni, S. Morandi, and M. Luzzana Technical Note: Improved method for rapid DNA extraction of mastitis pathogens directly from milk. J. Dairy Sci. 89: De Buyser, M.-L., B. Dufour, M. Maire, and V. Lafarge Implication of milk and milk products in food-borne diseases in France and in different industrialized countries. Int. J. Food Microbiol. 67:1 17. Devriese, L. A., J. Hommez, H. Laevens, B. Pot, P. Vandamme, and F. Haesebrouck Identification of aesculin-hydrolyzing streptococci, lactococci, aerococci and enterococci from subclinical intramammary infections in dairy cows. Vet. Microbiol. 70: Edwards, U., T. Rogall, H. Blocker, M. Emde, and E. C. Bottger Isolation and direct complete nucleotide determination of entire genes. Characterization of a gene coding for 16S ribosomal RNA. Nucleic Acids Res. 17: Fox, L. K., J. H. Kirk, and A. Britten Mycoplasma mastitis: A review of transmission and control. J. Vet. Med. B, Infect. Dis. Vet. Public Health 52: Gerry, N. P., N. E. Witowski, J. Day, R. P. Hammer, G. Barany, and F. Barany Universal DNA microarray method for multiplex detection of low abundance point mutations. J. Mol. Biol. 292: González, R. N., and D. J. Wilson Mycoplasmal mastitis in dairy herds. Vet. Clin. North Am. Food Anim. Pract. 19: Granum, P. E Bacillus cereus. Pages in Food Microbiology: Fundamentals and Frontiers. 2nd ed. M. P. Doyle, L. R. Beuchat, and T. J. Montville, ed. ASM Press, Washington, DC. Hassan, A. A., O. Akineden, and E. Usleber Identification of Streptococcus canis isolated from milk of dairy cows with subclinical mastitis. J. Clin. Microbiol. 43: Honkanen-Buzalski, T., V. Myllys, and S. Pyörälä Bovine clinical mastitis due to coagulase-negative staphylococci and their susceptibility to antimicrobials. J. Vet. Med. B 41: Hsu, S. C., and H. Y. Tsen PCR primers designed from malic acid dehydrogenase gene and their use for detection of Escherichia coli in water and milk samples. Int. J. Food Microbiol. 64:1 11. Jayarao, B. M., S. C. Donaldson, B. A. Straley, A. A. Sawant, N. V. Hegde, and J. L. Brown A survey of foodborne pathogens in bulk tank milk and raw milk consumption among farm families in Pennsylvania. J. Dairy Sci. 89: Jayarao, B. M., B. E. Gillespie, M. J. Lewis, H. H. Dowlen, and S. P. Oliver Epidemiology of Streptococcus uberis intramammary infections in a dairy herd. J. Vet. Med. B 46: Kim, H.-J., S.-H. Park, T.-H. Lee, B.-H. Nahm, Y.-R. Kim, and H.-Y. Kim Microarray detection of food-borne pathogens using specific probes prepared by comparative genomics. Biosens. Bioelectron. 24: Koskinen, M. T., J. Holopainen, S. Pyörälä, P. Bredbacka, A. Pitkälä, H. W. Barkema, R. Bexiga, J. Roberson, L. Sølverød, R. Piccinini, D. Kelton, H. Lehmusto, S. Niskala, and L. Salmikivi Analytical specificity and sensitivity of a real-time polymerase chain reaction assay for identification of bovine mastitis pathogens. J. Dairy Sci. 92: Lin, C. K., and H. Y. Tsen Comparison of the partial 16S rrna gene sequences and development of oligonucleotide probes Journal of Dairy Science Vol. 92 No. 7, 2009

13 UNIVERSAL ARRAY FOR PATHOGEN DETECTION 3039 for the detection of Escherichia coli cells in water and milk. Food Microbiol. 16: Little, C. L., J. R. Rhoades, S. K. Sagoo, J. Harris, M. Greenwood, V. Mithani, K. Grant, and J. McLauchlin Microbiological quality of retail cheeses made from raw, thermized or pasteurized milk in the UK. Food Microbiol. 25: Ludwig, W., O. Strunk, R. Westram, L. Richter, H. Meier, Yadhukumar, A. Buchner, T. Lai, S. Steppi, G. Jobb, W. Förster, I. Brettske, S. Gerber, A. W. Ginhart, O. Gross, S. Grumann, S. Hermann, R. Jost, A. König, T. Liss, R. Lüßmann, M. May, B. Nonhoff, B. Reichel, R. Strehlow, A. Stamatakis, N. Stuckmann, A. Vilbig, M. Lenke, T. Ludwig, A. Bode, and K. H. Schleifer ARB: A software environment for sequence data. Nucleic Acids Res. 32: National Mastitis Council Laboratory Handbook on Bovine Mastitis. Rev. ed. National Mastitis Council Inc., Madison, WI. Oviedo-Boyso, J., J. J. Valdez-Alarcón, M. Cajero-Juárez, A. Ochoa- Zarzosa, J. E. López-Meza, A. Bravo-Patiño, and V. M. Baizabal- Aguirre Innate immune response of bovine mammary gland to pathogenic bacteria responsible for mastitis. J. Infect. 54: Pengov, A Staphylococcus aureus Do we really have to live with it? Slov. Vet. Res. 43: Pingle, M. R., K. Granger, P. Feinberg, R. Shatsky, B. Sterling, M. Rundell, E. Spitzer, D. Larone, L. Golightly, and F. Barany Multiplexed identification of blood-borne bacterial pathogens by use of a novel 16S rrna gene PCR-ligase detection reactioncapillary electrophoresis assay. J. Clin. Microbiol. 45: Pisoni, G., R. N. Zadoks, C. Vimercati, C. Locatelli, M. Zanoni, and P. Moroni Epidemiological investigation of Streptococcus equi subspecies zooepidemicus involved in clinical mastitis in dairy goats. J. Dairy Sci. 92: Pitkala, A., M. Haveri, S. Pyorala, V. Myllys, and T. Honkanen- Buzalski Bovine mastitis in Finland 2001 Prevalence, distribution of bacteria, and antimicrobial resistance. J. Dairy Sci. 87: Palys, T., L. K. Nakamura, and F. M. Cohan Discovery and classification of ecological diversity in the bacterial world: The role of DNA sequence data. Int. J. Syst. Bacteriol. 47: Sergeev, N., M. Distler, S. Courtney, S. F. Al-Khaldi, D. Volokhov, V. Chizhikov, and A. Rasooly Multipathogen oligonucleotide microarray for environmental and biodefense applications. Biosens. Bioelectron. 20: Wang, X.-W., L. Zhang, L. Q. Jin, M. Jin, Z. Q. Shen, S. An, F. H. Chao, and J.-W. Li Development and application of an oligonucleotide microarray for the detection of food-borne bacterial pathogens. Appl. Microbiol. Biotechnol. 76: Watts, J. L., S. C. Nickerson, and J. W. Pankey A case study of Streptococcus group G infection in a dairy herd. Vet. Microbiol. 9: Wilson, W. J., C. L. Strout, T. Z. DeSantis, L. J. Stilwell, A. V. Carrano, and G. L. Andersen Sequence-specific identification of 18 pathogenic microorganisms using microarray technology. Mol. Cell. Probes 16: Wong, H. C., Y. L. Chen, and C. L. F. Chen Growth, germination and toxigenic activity of Bacillus cereus in milk products. J. Food Prot. 51: Journal of Dairy Science Vol. 92 No. 7, 2009

14 Nucleic Acids Research Advance Access published June 16, 2009 Nucleic Acids Research, 2009, 1 15 doi: /nar/gkp499 ORMA: a tool for identification of species-specific variations in 16S rrna gene and oligonucleotides design Marco Severgnini 1, *, Paola Cremonesi 2, Clarissa Consolandi 1, Giada Caredda 1, Gianluca De Bellis 1 and Bianca Castiglioni 2 1 Institute of Biomedical Technologies, Italian National Research Council, Via Fratelli Cervi 93, Segrate and 2 Institute of Agricultural Biology and Biotechnology, Italian National Research Council, Via Bassini 15, Milan, Italy Received February 3, 2009; Revised April 30, 2009; Accepted May 24, 2009 ABSTRACT 16S rrna gene is one of the preferred targets for resolving species phylogenesis issues in microbiological-related contexts. However, the identification of single-nucleotide variations capable of distinguishing a sequence among a set of homologous ones can be problematic. Here we present ORMA (Oligonucleotide Retrieving for Molecular Applications), a set of scripts for discriminating positions search and for performing the selection of high-quality oligonucleotide probes to be used in molecular applications. Two assays based on Ligase Detection Reaction (LDR) are presented. First, a new set of probe pairs on cyanobacteria 16S rrna sequences of 18 different species was compared to that of a previous study. Then, a set of LDR probe pairs for the discrimination of 13 pathogens contaminating bovine milk was evaluated. The software determined more than 100 candidate probe pairs per dataset, from more than S rrna sequences, in less than 5 min. Results demonstrated how ORMA improved the performance of the LDR assay on cyanobacteria, correctly identifying 12 out of 14 samples, and allowed the perfect discrimination among the 13 milk pathogenic-related species. ORMA represents a significant improvement from other contexts where enzyme-based techniques have been employed on already known mutations of a single base or on entire subsequences. INTRODUCTION During the last decades, different nucleic-acid-based detection techniques have been developed in order to employ identification based on single-nucleotide variations in both genotyping and detection experiments on a multiplicity of targets. These techniques allowed distinguishing alleles and correctly assessing the genotype at the singlebase level. In particular, 16S rrna gene sequences have been used to resolve bacterial phylogeny and taxonomy issues in different contexts. The DNA sequence coding for the small ribosomal subunit has been by far the most common genetic marker employed by the scientific community, because of: (i) its presence in almost all bacteria, often existing as a multi-gene family, or operons; (ii) the function of the 16S rrna gene has not changed over time, suggesting that random sequence changes are an accurate measure of time (evolution); and (iii) the 16S rrna gene (more than 1500 bp) is large enough for informatics purposes (1) with large stretches of conserved regions and few different loci. DNA microarrays represent one of the most popular platforms in molecular technologies, allowing a highthroughput format for the parallel detection of 16S rrna genes from environmental samples (2). DNA chips have been developed as a preferred device for the identification of different microorganisms based on 16S gene sequences. The multiplicity of species which can be arrayed on a single-dna chip allows a high multiplexing capability, with the possibility of identifying many different targets at one time (3). Single-base variations by microarray analysis can be detected by differential hybridization techniques using allele-specific oligonucleotide probes (4), or by enzyme-mediated detection methods (5). One of the most critical points of the molecular recognition procedures is the design of the specific probes needed to perform the entire analysis. In genotyping experiments, this is accomplished on the basis of the already-known information about each single-base variation. In detection experiments, on the other hand, in order to explore whether a certain target sequence is present in a DNA sample or not, the main problem is searching *To whom correspondence should be addressed. Tel: ; Fax: ; marco.severgnini@itb.cnr.it ß 2009 The Author(s) This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

15 2 Nucleic Acids Research, 2009 for a priori not yet identified specific positions that can discriminate exactly between one target and another. In hybridization-based techniques, mutations are identified on the basis of the higher thermal stability of the perfectly-matched probes as compared to mismatched probes. Although this has been the most frequently applied technique, it is characterized by many hindrances which make hybridization-based strategy function poorly in high-complexity biological samples. Therefore, for analytical and diagnostic purposes, hybridization is generally combined with some other selection or enrichment procedures. Enzyme-mediated ligation methods, on the other hand, rely on interrogation of a mutation by a couple of oligonucleotides annealing immediately adjacent to each other on a target DNA, with one of the probes having its 3 0 -end complementary to the point mutation. In this case, the search is for a single base that characterizes a species against all the others in a group of interest. The presence of a point mutation is assessed by the ligation of the two adjacent oligonucleotides, which occurs only when both are correctly base-paired (6). The Ligation Detection Reaction (LDR) (7), for instance, represents a reliable technique for identifying one or more sequences differing by one or more single-base changes, insertions, deletions, or translocations in a plurality of targetnucleotide sequences. This enzymatic in vitro reaction is based on the design of two oligonucleotide probes for each target sequence: a probe specific for the variation (called Discriminating Probe, or DS), which is 5 0 -fluorescently labeled, and a 5 0 -phosphorylated Common Probe (or CP), starting one base 3 0 -downstream of the DS. The previously polymerase chain reaction (PCR)-amplified sample, the oligonucleotide probe pairs and a thermostable DNA ligase are blended to form a mixture: the two probes hybridize consecutively along the template and the DNA ligase joins their ends only in the case of a perfect match. This reaction is cycled to increase product yield. The PCR LDR approach, usually, is associated to the hybridization onto a Universal Array (UA), where a set of artificial sequences, called Zip-codes are arranged (7). This entire approach was proven to be rapid, flexible and easily adaptable from one target to another, useful, for example, in environmental monitoring (8,9), forensics (10) and the food industry (11,12). Here we present Oligonucleotide Retrieving for Molecular Applications (ORMA), a series of integrated scripts in Matlab, which performs an accurate search of all the positions able to specifically discriminate one species among homologous ones, based on the 16S rrna gene sequence. ORMA also performs an accurate selection of high-quality oligonucleotide probes to be used in molecular applications. Automated and computerbased methods can be very useful for performing accurately and rapidly all the requested operations, through the many steps between the original, complete, set of sequences and the final list of application-oriented probes. The problem of designing specific oligonucleotide probes for the identification of target species has already been addressed by a certain number of software (13 16). At present, there is no preferential reference strategy for designing microarrays for species identification based on 16S rrna sequences: many authors rely on academic software (17,18), others develop their own scripts (19,20). Among the currently available academic software, ARB (21) and PRIMROSE (22) are very diffused, both being tools implemented specifically on 16S rrna, structured for interacting with and retrieving sequences from specific databases and operating a probe design on the basis of the phylogenesis of the species under analysis. Also, some commercial software, like Oligo 7 (Molecular Biology Insights, Cascade, CO, USA) (23) or AlleleID (Premier Biosoft, Palo Alto, CA, USA) (24) have been applied for probe design in a pathogen characterization experiment (25). In this article ORMA was used for determining sets of LDR probe pairs in microbiological-related contexts (water safety and food safety applications, respectively). The approach was evaluated and validated using the probe pairs derived from ORMA-determined discriminating positions on a set of cyanobacteria 16S rrna sequences belonging to 18 different species; the results were compared to those of a previously published study (8). Secondly, a set of LDR probe pairs for the discrimination of 13 mastitis- or intoxication-related pathogens species in bovine milk was designed and experimentally evaluated. The tool, although here applied on 16S rrna, can be used on any set of highly correlated sequences. MATERIALS AND METHODS Algorithm ORMA scripts were developed under Matlab 6.1 (Mathworks, Natick, MA, USA) environment (Release 12.1). No additional toolboxes are required. All statistical analyses and representations were made by the same software. Probe designs and simulations were run onto a hp Workstation xw4100, with a Dual-core 3.2 GHz. Intel Processor and 2.5 GB RAM. ORMA functions and m-code are available for free upon request. Overall structure. ORMA overall structure is tree-like, with a main function that, sequentially, recalls all the side scripts needed to perform each requested operation. The software also comprises a series of scripts for retrieving oligonucleotide sequences, quality-check them and design probes for different applications, such as Ligase Detection Reaction (LDR) or Minisequencing/Primer Extension probes. The overall procedure is accomplished in four main steps (Figure 1, Supplementary Figure 1): (i) sequence importing and processing; (ii) discriminating positions finding; (iii) designing of the candidate probes, starting from the positions found and (iv) ranking (i.e. assignment of a quality score to each) and exporting of the candidates (in tabular format). (i) Sequence import and processing. The search for discriminating positions on 16S rrna starts from the import of a set of already-aligned sequences (which can be optionally used for the creation of consensus sequences, grouping them in homogeneous clusters, before being used for the discriminating position search algorithm).

16 Nucleic Acids Research, Figure 1. Block diagram representing the steps through which ORMA works. The four steps described in the main text are highlighted in gray: (I) Sequence importing and consensus creation; (II) Search of the discriminating positions by SBS algorithm; (III) Retrieval of the candidate sequences from the found positions. The actual design depends on the molecular application chosen; (IV) Quality filtering and ranking of the candidate probes. On the right, in boxes, example screenshots (probe pair design on cyanobacteria dataset) are given for each step. Steps (II) and (III) are indistinguishable in ORMA output and have been represented together. Please note that for visualization purposes only a part of the total 18 sequences are represented. Standard multiple-alignment formats (Clustal-like, Multi Sequence Files, or aligned FASTA format) can be used. A careful check of multiple alignment scores should be made, in order to avoid designs on sequence datasets of distantly related species, which can occur in base misalignments. The scripts also include a procedure for consensus determination from a set of user-defined sequences, according to four different rules: (a) majority rule, in which the consensus base is the most frequently present in the aligned sequences and no degenerated bases are used. In case of equal occurrences, N s are used in the consensus; (b) threshold rule simple, which assigns a specific base to the corresponding position in the consensus only if its frequency is above a given threshold. Different thresholds can be set for gaps and bases. Degenerated bases are not used and are substituted by N s in the consensus; (c) threshold rule complex, which comprises also degenerated bases. The algorithm is the same as point (b) option, but requires a threshold for substituting positions with multiple bases above the threshold with the corresponding IUPAC code degenerated base and (d) ARB-like algorithm, with separate thresholds for gaps and bases. All the bases above the given threshold are used to compute eventual degenerated bases. For each of these four options, consensus score accuracy is calculated, as the percentage of original sequences that carried the same base as the consensus in each position.

17 4 Nucleic Acids Research, 2009 (ii iii) Design of candidate probes. We have implemented a Single Base Seeker (SBS) algorithm, for the determination of positions able to discriminate one sequence among a set of homologous ones. The discriminating position finding procedure can be summarized as follows in four basic steps: (a) Choice of a user-defined subset of sequences of the dataset (indicated as the positive set ). The remaining sequences are used as a group of the discriminating positions must be different from; these are addressed, in the present article, as the negative set. Positive and Negative sets differ for the fact that every consensus in the positive set group will be subjected to probe design, whereas those of the negative set will not; (b) For each sequence, determination of a list of the positions of nondegenerated bases; (c) For each position on point b, calculation of a score as the sum of all the sequences carrying the same base as the considered sequence, in the same position. If the only sequence carrying the base is the tested one, the position is set as discriminating and (d) Re-calculation of the score on point c, substituting to each (eventual) degenerated base its two or three alternatives (an N automatically flags the position as nondiscriminating). ORMA, then, retrieves the sequences flanking each of the putative discriminating positions. Actual oligonucleotide design is dependent on the molecular application chosen. The maximum length and the thermodynamic model for calculation of the parameters of the probes can be specified by the user. For the LDR experiments here described, two oligonucleotide probes are designed, one upstream (Discriminating Probe, DS, comprising the discriminating position) and one downstream (Common Probe, CP) of each position. (iv) Discriminating position related filters and scores. The putative discriminating positions and related candidate probes are subjected to a series of constraints and quality filters. The software keeps track of all the designed candidates, assigning a quality score, depending on how many filters they pass. The current options of the script on the discriminant base are: (a) limiting the range of positions, in order to exclude candidates insisting on positions too close to the 5 0 -or3 0 -end of the sequences, where, usually, the majority of errors in the alignment or characterization of the sequences occur and (b) testing the presence of other species with probes insisting on the same position, thus excluding eventual interactions between a single CP and multiple DS, with subsequent non-specificity. The candidate probes can also be filtered and ranked according to their thermodynamic properties (length, melting temperature, number of degenerated bases, low complexity regions), evidencing the candidates having a certain length, a melting temperature comprised in a user-specified range, having no more than the inputted number of degenerated bases (which can be a real issue for the oligonucleotide specificity), having short homopolymeric regions and not comprising short tandem repeats. Then, ORMA calculates some specific statistics for the qualitative evaluation of the candidates designed on consensus sequences, compared to the original dataset (i.e. the subset of sequences from which every consensus is built): (a) the intra-group score, as the number of initial sequences having the same discriminating base as the consensus and (b) the inter-group score, as the number of sequences other than those used for that consensus having the same discriminating base as the candidate one. This latter score is calculated only when the consensus were created inside ORMA, starting from a single-global alignment. These scores allow the choice of probes that best discriminate between the target and the non-target sequences (i.e. having the highest intra-group and the lowest inter-group score). The software output can be exported as a comma-separated spreadsheet reporting: (a) the list of all the discriminating bases, grouped per species, with absolute (referring to the global alignment) and relative (referring to the specific consensus) positions of the discriminating base, and the base distributions of all the other consensus sequences in the same position; (b) the thermodynamic parameters of the candidate probe pairs, including the T m, the length of DS and CP probes and the number of degenerated bases in each and (c) the qualitative filtering and the specificity-related scores, including the sequence score, as the average of the consensus scores along all the bases constituting the DS and CP, with penalties for degenerated bases. Experimental data Cyanobacteria dataset experiment. The complete cyanobacteria 16S rrna data set comprised a total of 352 sequences, which were organized by phylogenetical similarity and grouped in a total of 18 clusters, as described in (8). Multiple alignments of all the sequences was performed by ClustalW (26) and the resulting file was imported into ORMA, where 18 consensus, one per cluster, were built. Consensus sequences were determined following the ARB-like algorithm (as described in Materials and Methods section and in Supplementary Methods), setting 50% as the threshold for gaps and 40% as the threshold for other bases. Melting temperature calculations followed the salt-adjusted method, with 50 mm Na + and 0% formamide. Candidate probe pairs were filtered on the basis of their length (minimum 25 nt, maximum 60 nt per probe), melting temperature (63 688C) and number of degenerated bases (maximum 4), on both DS and CP. The best probe pairs for all the species were selected, according to their best intra- and inter-group scores. We required that no less than 80% of the sequences constituting each of the 18 clusters carried the same base as the consensus in the candidate discriminating position (intra-group score). When only one candidate was designed or the intra-group score of the best candidate was below 80%, we still picked that candidate for further evaluations. On the other hand, the inter-group score was set to be below a 2% threshold, with the same exceptions as above. The Unicyano probe, which allowed the identification of any of the species in the study, was the one proposed by Castiglioni et al., with minor refinements for adjusting its melting temperature. At first, the LDR mix made by all probe pairs (250 fmol/ml each probe) was tested on specific synthetic templates (perfectly complementary to each probe pair) to assess the

18 Nucleic Acids Research, feasibility of the LDR procedure with the ORMAdesigned probe pairs. Then, a total of 14 DNA samples, corresponding to 13 cyanobacteria species (kindly provided by MIDI_CHIP project partners, (Table 1), were tested in duplicate, independent, LDR experiments, with both ORMA and Castiglioni et al. probe pairs. Milk-pathogen dataset experiment. Milk pathogensrelated 16S sequences were retrieved from RDP- Ribosomal Database Project II (release 9.51, rdp.cme.msu.edu/) (27) for a total of 738 sequences and divided into 13 subgroups, according to their phylogenetic classification. Only sequences of length >1200 bp and flagged as of good quality were retrieved. Each subgroup was aligned independently in ClustalW, since the overall number of 16S sequences was >500 (above the maximum limit of the alignment tool) and imported into ORMA. The consensus sequence for each group was calculated with the same parameters specified for the cyanobacteria data set. Then, a new multiple-alignment step was performed before proceeding to actual probe design. One probe pair for each of the main six subspecies of the Streptococcus group (Streptococcus agalactiae, S. bovis, S. equi, S. canis, S. dysgalactiae S. uberis) was designed; moreover, the Staphylococcus aureus probe pair was designed independently from all the remaining coagulase negative Staphylococci (grouped in the Staphylococcus, no aureus probe), because of its relationship with outbreaks of mastitis in dairy ruminants (28) and with major health issues, like food-related intoxications (29). In order to have the best homogeneity among the species within each group, the design was actually performed in three rounds: (a) Salmonella spp. was aligned against Escherichia coli and related spp. consensus sequence only; (b) S. canis was aligned against Streptococcus group sequences only; (c) All the remaining positions were selected considering the alignment of all other subspecies. One probe pair per species was designed, except for Campylobacter spp. for which two probe pairs were evaluated in terms of reproducibility and specificity. The thermodynamic parameters were the same described for the cyanobacteria data set, except for the melting temperature, which was required to be in the range C. The inter-group score of the candidates was required to be above a threshold of 80%, as in the cyanobacteria dataset. Probe pair specificity was checked by both RDP II database and BLAST (Basic Local Alignment Search Tool, blast/blast.cgi) (30) analysis, carefully examining the 3 0 -region of the discriminating probe, in order to exclude any interaction between probe pairs targeting different species. LDR probe pairs were mixed at a final concentration of 1 pmol/ml and tested on 13 DNAs from ATCC reference strains (LGC Promochem, Middlesex, UK) and bacterial collections (Supplementary Table 1). Genomic DNA was extracted following the protocol described by (31), PCR amplified and analyzed in duplicate, by separated LDR reactions. PCR and LDR/Universal Array approach. Complete experimental procedures concerning the amplification of 16S rrna sequences (including primers and thermal cycling), LDR mixes, Universal Arrays preparation and hybridization are reported in Supplementary Data. Data analysis. All arrays were scanned with ScanArray 5000 scanner (Perkin Elmer Life Sciences, Boston, MA, USA), at 10 mm resolution, with different acquisition parameters on both laser power and photo-multiplier gain, in order to avoid saturation. Intensities of fluorescence (IF) were quantitated by ScanArray Express 3.0 software, using the Adaptive circle option, letting diameters vary from 60 to 300 mm. No normalization procedures on the IFs were performed. To assess whether a probe pair was significantly above the background (i.e. was present or not), we performed a one-sided t-test (a = 0.01). At the same time, also the type II error was calculated and 1-b used as the estimate of the power of the statistical test. The null distribution was set as the population of Blank spots (e.g. with no oligonucleotide spotted, n = 6) IFs. Two times the standard deviation of pixel intensities of the same spots was added to obtain a conservative estimate. For each Zip-code, we considered the population of the IFs of all the replicates (n = 4) and tested it for being significantly above the nulldistribution (H 0 : m test = m null ;H 1 : m test > m null ). Signal-to-noise ratios, SNR p and SNR np were calculated, for each present and non-present probe pairs, respectively, indicating the ratio between the mean IF of each probe pair and the mean Blank IF, divided on the probe-type. RESULTS AND DISCUSSION Searching, designing and selecting oligonucleotide probes for molecular applications experiments on sets of highly similar sequences, such as the 16S rrna, is a non-trivial procedure, which involves many complex and timeconsuming steps. In this article, this procedure was accomplished by the use of ORMA, an integrated architecture of Matlab scripts. The 16S rrna, a gene sequence of more than 1500 bp, is the preferred genomic target for analyses in the microbiological field (17 20). It should be noted that 16S region is commonly used in taxonomical classifications involving in silico alignment and procedures for its two basic properties: (i) 16S presents highly conserved regions which can be used to correctly align all the sequences in the database; (ii) on the other side, 16S presents highly polymorphic regions that can be used in clusterization, phylogenetic tree construction and molecular discrimination of microbiological families even very close one to each other (32). Use of an automated method for discriminating positions determination, probe retrieval and filtering has obvious and evident advantages over the manual design, often used in previously published papers (8,33 35). These advantages become more significant with increasing dimension of the databases and of the sequences length. ORMA can perform all these operations with user-specified parameters in an automated way and calculates a series of

19 6 Nucleic Acids Research, 2009 Table 1. Cyanobacterial samples and related LDR results for Castiglioni et al. probes and ORMA-designed ones Group Sample ID Strain/Clone name Geographic origin Sequencing classification (score) c LDR results b Castiglioni et al. probes ORMA probes Calothrix 1 Calothrix sp.strain PCC 7714 Small pool, Aldabra Atoll, India Specific (2/2) Specific (2/2) Cylindrospermopsis 2 Cylindrospermopsis 1LT32S01 a Trasimeno Lake, Italy Specific (2/2) Specific (2/2) Cylindrospermum 3 Cylindrospermum stagnale PCC 7417 Soil, greenhouse, Stockholm, Sweden Aspecific (2/2) Specific (2/2) Halotolerans 4 Cyanothece sp.strain PCC 7418 Solar Lake, Israel Aspecific (1/2), Specific (2/2) Specific (1/2) Leptolyngbya 5 Leptolyngbya sp.strain 0BB 30S02 Bubano Basin, Imola, Italy Specific (2/2) Specific (2/2) Microcystis 6 Microcystis aeruginosa PCC 9354 Little Rideau Lake, Ontario, Canada Specific (1/2) Specific (2/2) No signal (1/2) Nodularia 7 Nodularia 3SD7S01 a Svalbard Islands, Norway Aspecific (2/2) Specific (2/2) Nostoc 8 Nostoc sp.strain PCC 7107 Shallow pond, Point Reyes, CA, USA Nostoc (100%) Aspecific (2/2) Non-specific (0/2) 9 Nostoc sp.strain PCC 8114 Water bloom, Lake Hepet.on, Morris Co, NJ, USA Planktothrix 10 Planktothrix sp.strain 2 Lake Markusbo lefja rden, Åland Islands, Finland Prochlorococcus+ Synechococcus Cylindrospermum (58%) Non-specific (0/2) Non-specific (0/2) Specific (2/2) Specific (2/2) 11 Prochlorococcus marinus PCC 9511 Mediterranean Sea Specific (2/2) Specific (2/2) 12 Synechococcus sp.strain Hegewald Lake Kuusja rvi, Saukkolahti, Finland Aspecific (1/2), Specific (1/2) Specific (2/2) Spirulina 13 Spirulina major PCC 6313 Brackish water, Berkeley, CA, USA Aspecific (2/2) Specific (2/2) Synechocystis 14 Synechocystis sp.strain PCC 7008 Shallow pond, Point Reyes, CA, USA Non-specific (0/2) Specific (2/2) Where sequencing has been performed, the result of the classification is also reported. Sample ID refers to the numbers used in Figure 2. a Clonal DNA from environmental sample. b Specific indicates that only the probe corresponding to the species was present; non-specific means that no probe was present (except for the universal cyanobacteria probe); aspecific means that the species-specific probe was present, but also other probes showed an IF significantly above background signal. The number of replicates is reported within brackets. c According to RDP II database, release 9.60.

20 Nucleic Acids Research, qualitative parameters which help in the choice of candidate probes that best discriminate between the sequences of the positive and those of the negative set. The general idea of these scores is to distinguish the sequences/groups which are of interest in a given experiment from those who aren t and that can potentially have a cross-contamination with the positive set, because they could be amplified by PCR, contributing to the molecular complexity of the sample. In this article, performances of ORMA were evaluated by considering the experimental evidences coming from the design of LDR probe pairs on two different 16S rrna datasets. First, a new set of cyano-specific probe pairs was designed and compared to the original one (8), generated on the same database of sequences. Then, the tool was used to setup LDR probe pairs for the identification of pathogenic species present in bovine milk. Cyanobacteria dataset Species-specific probe pairs were designed in a single round, starting from the whole dataset of 352 ClustalWaligned cyanobacteria 16S rrna sequences, imported, converted and grouped into 18 group-specific consensus sequences by ORMA. Calculated consensus sequences were highly similar, (ClustalW score = , n = 18), had a high consensus score (average score , n = 352) and a very low content of degenerated bases (average < 2%, max = 6%). ORMA identified a total of 192 candidate probe pairs for the 18 species, with an overall duration of the whole procedure of less than 5 min (Table 2). More tests on speed performances of the SBS algorithm on simulated data available as Supplementary Data and Supplementary Figure 2. One probe pair per species was chosen, according to its ranking after ORMA filtering steps. The probe pair for Anabaena + Aphanizomenon group was flagged as inadequate by ORMA, having six degenerated bases in the CP, which could negatively influence its thermodynamical properties. However, this probe pair insisted on the only discriminating position found for that cluster. The mix containing all probe pairs was tested on the corresponding synthetic templates and, as expected, all except Anabaena+Aphanizomenon gave positive results. Duplicate LDR experiments on 18 probe pairs (17 species-specific + 1 universal) were carried out on 14 16S rrna PCR products. We performed side-by-side tests of the same DNA samples by the two probe pairs datasets, ORMA and the one described in Castiglioni et al., comparing their performances and specificity. Probe pairs used in Castiglioni et al. identified correctly (P < 0.005, average beta power of the test: 0.85) 6 out of 14 analyzed DNAs (in both duplicate LDR), whereas other two completely failed. Six other DNAs somehow showed a degree of aspecificity (i.e. the correct probe pair was present, but non-specific probe pairs were also called present) (Table 1, Figure 2). Cyanobacteria universal probe pair was called as statistically over the background in all the experiments. Evaluations on ratio of signal intensities suggested that hybridizations went well and were not responsible for the aspecificity. In fact, excluding non-specific signals, SNR np had an average value of and SNR p varied between 10 and 680, with an average of about 149 (data not shown). The Anabaena + Aphanizomenon probe pair of Castiglioni et al. study resulted specific on both synthetic and environmental samples (data not shown). This probe pair, however, was designed with its DS insisting on a position which did not discriminate univocally the Anabaena + Aphanizomenon consensus from the consensuses of the other species. Thus, it would never be identified by ORMA as discriminating (because of the way the algorithm is built). Instead, the presence of some internal mismatches (especially the one on the second base before the 3 0 -end of the DS) is probably the reason for this finding. In fact, the mismatch gives instability to the 3 0 -end of the DS when annealing on the 16S rrna sequences of species other than those of Anabaena + Aphanizomenon cluster, impeding the ligase to join the two adjacent end of the DS and CP oligonucleotides. ORMA designed probe pairs have been capable of correctly identifying (P < 0.005, average beta power of the test: 0.85) 12 out of 14 analyzed cyanobacteria samples, on both replicates. Also in this experimental set, the cyanobacteria universal probe pair was called as statistically over the background in all the experiments (as expected, since this probe pair and the ones used in Castiglioni et al. coincided). Performances of the LDR procedure, in terms of signal-to-noise ratios were comparable to those obtained with the Castiglioni et al. probe set, having a SNR np of and a SNR p ranging from 7 to 387 (average 131) (data not shown), indicating a certain variability. In this case, we had no signs of aspecificity in the experiments (Figure 2), even in those cases which were critical with Castiglioni et al. probe pairs. In fact, probes were chosen in order to maximize the intra-group similarity (i.e. having the maximum number of sequences in the positive set carrying the discriminating base) and minimize the possibility of an inter-group cross-talk (i.e. having a minimum number of sequences in the negative set carrying the discriminating base) (Figure 3). The average of intra-group scores of the candidates was 95.1% 10.1% (n = 17), varying in the range %. The minimum value was that of the cluster of Gleotheceae, in which we had only five sequences, whereas 13 out of the 17 clusters were characterized by a score of 100%. Intergroup scores, on the other hand, were always very low, with an average of 0.4% 1% (n = 17). Thus, where ORMA probe pairs failed, we had a false negative call (with the cyanobacteria universal probe pair called as present ), but not a false positive. Experiments on the two Nostoc DNAs gave no results on the species-specific probe pair; anyway the presence of a cyanobacterial DNA was correctly assessed by the Universal probe pair. Sequencing of the two products revealed that one of them has been correctly classified by microbiological methods, whereas the other DNA was very uncertain and classified as cylindrospermum (58% confidence) by RDP Classifier tool, release [On 22 May 2008, RDP II database for cyanobacteria (release 9.61) underwent a major change in hierarchical classification of the species. The taxonomies here presented refer to older versions, which at present, can be found within genus GpI of

Vedere altro