We report and model a previously undescribed systematic error causing spurious excess correlations that depend on the distance between probes on Affymetrix® microarrays. The phenomenon affects pairs of features with large chip separations, up to over 100 probes apart. The effect may have a significant impact on analysis of correlations in large collections of expression data, where the systematic experimental errors are repeated in many data sets. Examples of such studies include analysis of functions and interactions in groups of genes, as well as global properties of genomes. We find that the average correlations between probes on Affymetrix microarrays are larger for smaller chip distances, which points out to a previously undescribed positional artifact. The magnitude of the artifact depends on the design of the chip, and we find it to be especially high for the yeast S98 microarray, where spurious excess correlations reach 0.1 at a distance of 50 probes. We have designed an algorithm to correct this bias and provide new data sets with the corrected expression values. This algorithm was successfully implemented to remove the positional artifact from the S98 chip data while preserving the integrity of the data.
ASJC Scopus subject areas