Entering edit mode
Lucia Peixoto
▴
330
@lucia-peixoto-4203
Last seen 10.3 years ago
Dear All,
I have a dataset for which I have two conditions. I have 9 replicates
per
group for microarrays, 5 per group for RNAseq (which are a subset of
the
RNA samples used in the microarrays, couldn't sequence all 9), and 8
per
group for qPCR (which is an independent set of experiments).
Each n is an independent mouse, in and independent day from and
independent
experiment, so that one experiment with yield n=1 for each of the
groups.
The correlation between control and treatments within the same day is
not
better than across days, however.
Theoretically they all measure the same biological phenomenon, which
is
gene expression changes, so I have been doing some comparisons between
them
to try to get at the truth of what is really being differentially
expressed. In particular I have focused in the 5 samples in each of
the
three groups in which the only difference is whether the RNA was
hybridized
by microarray or sequenced.
To my surprise the gene lists obtained from analyzing differential
expression using RNASeq (using either edgeR or DESeq) is considerably
smaller than the one obtained from microarray analysis (using locfdr
on
pairwise t-statistics) at the same FDR. The RNASeq list is included in
the
microarray list, but there are several differences I have validated by
qPCR
that the RNASeq analysis is not able to detect at a reasonable FDR.
Moreover, there seems to be an unusual bias towards not being able to
detect down-regulated genes. I am a little bit puzzled by this, since
one
of the reasons we are sequencing is that it is supposed to have a
better
dynamic range.
These are the same RNA samples so this apparent lack of sensitivity
has to
be related to either library prep or statistical analysis. So these
are my
questions:
- can the inability to distinguish down-regulated genes be related to
filtering low count reads? (in order to get good separation between
groups
in an MDS plot I need to filter cpm >0.1)
- Is it possible that I need more coverage to improve sensitivity? I
am
currently sequencing at 50X pair end, that seemed enough. Is there any
published study looking at RNASeq sequencing depth and sensitivity in
human
or mouse genomes?
- Are the multiple testing corrections applied in EdgeR and DESeq too
stringent thus rendering the overall analysis less sensitive?
For the record my count matrices are of counts of transcripts,
averaging
counts over all exons from the same gene model for all RefSeq genes. I
did
this because the microarray data is per transcript. In log scale I
have on
average 0.7 R2 correlation between microarray intensity and RPKM from
the
same sample.
Thanks for the insight!
Lucia
--
Lucia Peixoto PhD
Postdoctoral Research Fellow
Laboratory of Dr. Ted Abel
Department of Biology
School of Arts and Sciences
University of Pennsylvania
"Think boldly, don't be afraid of making mistakes, don't miss small
details, keep your eyes open, and be modest in everything except your
aims."
Albert Szent-Gyorgyi
[[alternative HTML version deleted]]