Entering edit mode
hello BioC Users,
I know this is quite a longmail, but I hope you can find the time to
read it
and help me a bit.
I would like to ask for help in analyzing the miRNA 2.0 microarray
from
Affymetrix.
I am trying to do an analysis of four different tissues from
drosophila
melanogaster. I have three replicates for each of the four body parts.
first problem is the structure of the arrays. There are no MM probes
on it,
so I don't have any option to run the quality control measurements
given by
R.
Both to me known quality control packages (AffyQCReport and
QualityMetrics)
have failed here. Also running the simple commands didn't work (I am
getting
the error message that I have NAs).
I don't think there is a way to overcome this problem, as the probes
are not
there. This way even detecting the p.values of the PM probes is not
possible, as there are no MM probes to compare them with.
Does anyone ever worked with the affy miRNA array and can give me a
hint for
any qc packages that can work with it.
I know that Affymetrix offers the miRNA QC tool, but to be honest I
didn't
quite understand what it does. I see some graphs and tables but it
doesn't
tell me much. The manual from affy is IMHO really shortcoming and not
helpful at all.
my second problem is the analysis itself.
I was thinking about doing it with the siggenes package (SAM).
uploaded all my CEL files
raw_data <- ReadAffy(widget=T)
#setQCEnvironment(array=cleancdfname(cdfName(Jenny_data)),
path=getwd())
#pData(raw_data)
#RMA normalization
normData<-rma(raw_data)
phenoData(raw_data) <- read.AnnotatedDataFrame("phenotype.txt")
#subsetting the rma ExpressionSet for the separate four body parts
rma_1 <- get.array.subset(normData, "body_part", "A")
rma_2 <- get.array.subset(normData, "body_part", "B")
rma_3 <- get.array.subset(normData, "body_part", "D")
rma_4 get.array.subset(normData, "body_part", "T")
######sam analysis
#rma_1
Probe_names <- featureNames(rma_1)
cl <- c(0,0,0,1,1,1) # three replicate wt vs. dilp
Until here it is quite a straightforward analysis.
Here is it, where it becomes tricky.
As I am only interested in analyzing the miRNA from drosophila, I was
thinking that it is better to compare only the miRNA that are from
drosophila.
So I am extracting all genes with dme in the name (this is the
identifier
for drosophila).
dme <- grep("dme", Probe_names) # create an object with the numbers
for the
input values of dme miRNA from the ExpressionSet
rma_dme <- rma_fatbody[dme,] # create a subset of the ExpressionSet
with
only the miRNA from Drosophila (begins with dme)
I than run a sam analysis only on these miRNA.
sam.out<- sam(rma_dme,cl, var.equal=FALSE, B=100, include.zero=FALSE,
na.replace=FALSE, rand=123)
summary(sam.out) # find the right Delta value with the right
percentage of
false / called
summary(sam.out, seq(0.1,1.9,0.1))# print a table with detailed dealta
values according to "seq"
list.siggenes(sam.out, 0.6) # change according to the delta value
siggenes.res <- summary(sam.out, 1.6)
plot(sam.out, 0.6, sig.col = c(3,2), main="sig. genes for body part 1
comparisons \n delta = 0.5")
It all sounds very good, but I am getting very strange results. The
first
body part I compared have given me these results
>summary(sam.out, seq(0.1,1.9,0.1))# print a table with detailed delta
values according to "seq"
SAM Analysis for the Two-Class Unpaired Case Assuming Unequal
Variances
s0 = 0.0106 (The 0 % quantile of the s values.)
Number of permutations: 20 (complete permutation)
MEAN number of falsely called variables is computed.
Delta p0 False Called FDR cutlow cutup j2 j1
1 0.1 0.123 143.75 176 0.1009 -0.181 0.273 93 104
2 0.2 0.123 113.55 154 0.0910 -0.276 0.671 90 123
3 0.3 0.123 95.7 144 0.0821 -0.480 0.806 83 126
4 0.4 0.123 75.6 129 0.0724 -0.758 1.014 74 132
5 0.5 0.123 64.15 124 0.0639 -0.829 1.404 72 135
6 0.6 0.123 49.45 102 0.0599 -1.399 1.404 50 135
...
Is it possible that from the total 186 mir-dme Probes, 154 of them are
differentially deregulated between the two conditions with such a FDR
value
(<10%)?
As a general question - what can I regard as a good FDR value? I know
there
is not strict value, but as some people have probebly more experience
than I
do :-), it would be nice to know what they are using.
My problem here is that I am not even sure, whether this way of
analyzing
the data is statistically allowed or not. Does it make sense to first
exclude all non-drosophila from the data set and than run the sam
analysis
or must I use all miRNA probes.
(when doing so I get no sig. deregulated genes or just 1-3 miRNAs,
depends
on the body parts)
I hope that someone can share his/her experience with me and shed some
light
in this miRNA mess.
Thanks a lot
Assa
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United
States.1252
attached base packages:
[1] tools tcltk splines stats graphics grDevices
datasets
utils methods base
other attached packages:
[1] mirna20cdf_2.8.0 tkWidgets_1.30.0 DynDoc_1.30.0
widgetTools_1.30.0 simpleaffy_2.28.0 gcrma_2.24.1
[7] genefilter_1.34.0 siggenes_1.26.0 multtest_2.8.0
affy_1.30.0 Biobase_2.12.1 rcom_2.2-3.1
[13] rscproxy_1.3-1
loaded via a namespace (and not attached):
[1] affyio_1.20.0 annotate_1.30.0 AnnotationDbi_1.14.1
Biostrings_2.20.1 DBI_0.2-5
[6] IRanges_1.10.4 MASS_7.3-12 preprocessCore_1.14.0
RSQLite_0.9-4 survival_2.36-5
[11] xtable_1.5-6
[[alternative HTML version deleted]]