Question: Repetitive motifs on Illumina 450k derived sequences
gravatar for Gustavo Fernández Bayón
3.3 years ago by

Hi everybody,

(First of all, I would like to admit that I am posting this question here two days after posting it at Biostars. The latter seems to be quite inactive, at least with respect to the subject under question, and I thought it could be a good idea to share this with the R/Bioc community.)

As a part of some recent analyses using the Illumina 450k DNA Methylation microarrays, I have been running the MEME suite to find significant motifs in some Differentially Methylated Probes (DMP) subsets. Problem is, I have found strange results in the shape of the same motifs coming out again and again.

Using the FDb.InfiniumMethylation.hg19 and the BSgenome.Hsapiens.UCSC.hg19 R/Bioconductor packages, I generate DNA sequences of 200bp length centered on the probes being processed and save them as FASTA files. Afterwards, I feed them to MEME and wait for the results.

Some motifs were appearing for every subset we were testing. Specifically, the most common motifs were repetitive sequences of the same nucleotide (polyA, polyC, polyG, polyT). This raised some suspicions, so we decided to try the motif finding procedure on two subsets containing 300 and 1000 random 450k probes. Problem is, the same motifs appeared again.

So, it seems that those motifs are somehow present around the 450k probes. Is this a probe design consequence? I am also wondering if the MEME parameters could be behind these results. I am currently running with the following options:

meme {input.fasta} -dna -nmotifs 10 -evt 0.01 -maxw 50 -maxsize 10000000

Just wondering if the prior distribution of nucleotides in the vicinity of 450k probes does not meet the statistical assumptions of the MEME algorithm.

Has anybody here experienced a similar problem? Any help or hint would be much, much appreciated.

EDIT: I am including a capture of MEME's output to show how the motifs look like:

illumina 450k meme motifs • 590 views
ADD COMMENTlink written 3.3 years ago by Gustavo Fernández Bayón440
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 116 users visited in the last hour