Question

miRNA microarray analysis

0

Entering edit mode

Assa Yeroslaviz ★ 1.5k

@assa-yeroslaviz-1597

Last seen 8 weeks ago

Germany

hello BioC Users, I know this is quite a longmail, but I hope you can find the time to read it and help me a bit. I would like to ask for help in analyzing the miRNA 2.0 microarray from Affymetrix. I am trying to do an analysis of four different tissues from drosophila melanogaster. I have three replicates for each of the four body parts. first problem is the structure of the arrays. There are no MM probes on it, so I don't have any option to run the quality control measurements given by R. Both to me known quality control packages (AffyQCReport and QualityMetrics) have failed here. Also running the simple commands didn't work (I am getting the error message that I have NAs). I don't think there is a way to overcome this problem, as the probes are not there. This way even detecting the p.values of the PM probes is not possible, as there are no MM probes to compare them with. Does anyone ever worked with the affy miRNA array and can give me a hint for any qc packages that can work with it. I know that Affymetrix offers the miRNA QC tool, but to be honest I didn't quite understand what it does. I see some graphs and tables but it doesn't tell me much. The manual from affy is IMHO really shortcoming and not helpful at all. my second problem is the analysis itself. I was thinking about doing it with the siggenes package (SAM). uploaded all my CEL files raw_data <- ReadAffy(widget=T) #setQCEnvironment(array=cleancdfname(cdfName(Jenny_data)), path=getwd()) #pData(raw_data) #RMA normalization normData<-rma(raw_data) phenoData(raw_data) <- read.AnnotatedDataFrame("phenotype.txt") #subsetting the rma ExpressionSet for the separate four body parts rma_1 <- get.array.subset(normData, "body_part", "A") rma_2 <- get.array.subset(normData, "body_part", "B") rma_3 <- get.array.subset(normData, "body_part", "D") rma_4 get.array.subset(normData, "body_part", "T") ######sam analysis #rma_1 Probe_names <- featureNames(rma_1) cl <- c(0,0,0,1,1,1) # three replicate wt vs. dilp Until here it is quite a straightforward analysis. Here is it, where it becomes tricky. As I am only interested in analyzing the miRNA from drosophila, I was thinking that it is better to compare only the miRNA that are from drosophila. So I am extracting all genes with dme in the name (this is the identifier for drosophila). dme <- grep("dme", Probe_names) # create an object with the numbers for the input values of dme miRNA from the ExpressionSet rma_dme <- rma_fatbody[dme,] # create a subset of the ExpressionSet with only the miRNA from Drosophila (begins with dme) I than run a sam analysis only on these miRNA. sam.out<- sam(rma_dme,cl, var.equal=FALSE, B=100, include.zero=FALSE, na.replace=FALSE, rand=123) summary(sam.out) # find the right Delta value with the right percentage of false / called summary(sam.out, seq(0.1,1.9,0.1))# print a table with detailed dealta values according to "seq" list.siggenes(sam.out, 0.6) # change according to the delta value siggenes.res <- summary(sam.out, 1.6) plot(sam.out, 0.6, sig.col = c(3,2), main="sig. genes for body part 1 comparisons \n delta = 0.5") It all sounds very good, but I am getting very strange results. The first body part I compared have given me these results >summary(sam.out, seq(0.1,1.9,0.1))# print a table with detailed delta values according to "seq" SAM Analysis for the Two-Class Unpaired Case Assuming Unequal Variances s0 = 0.0106 (The 0 % quantile of the s values.) Number of permutations: 20 (complete permutation) MEAN number of falsely called variables is computed. Delta p0 False Called FDR cutlow cutup j2 j1 1 0.1 0.123 143.75 176 0.1009 -0.181 0.273 93 104 2 0.2 0.123 113.55 154 0.0910 -0.276 0.671 90 123 3 0.3 0.123 95.7 144 0.0821 -0.480 0.806 83 126 4 0.4 0.123 75.6 129 0.0724 -0.758 1.014 74 132 5 0.5 0.123 64.15 124 0.0639 -0.829 1.404 72 135 6 0.6 0.123 49.45 102 0.0599 -1.399 1.404 50 135 ... Is it possible that from the total 186 mir-dme Probes, 154 of them are differentially deregulated between the two conditions with such a FDR value (<10%)? As a general question - what can I regard as a good FDR value? I know there is not strict value, but as some people have probebly more experience than I do :-), it would be nice to know what they are using. My problem here is that I am not even sure, whether this way of analyzing the data is statistically allowed or not. Does it make sense to first exclude all non-drosophila from the data set and than run the sam analysis or must I use all miRNA probes. (when doing so I get no sig. deregulated genes or just 1-3 miRNAs, depends on the body parts) I hope that someone can share his/her experience with me and shed some light in this miRNA mess. Thanks a lot Assa > sessionInfo() R version 2.13.0 (2011-04-13) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252 attached base packages: [1] tools tcltk splines stats graphics grDevices datasets utils methods base other attached packages: [1] mirna20cdf_2.8.0 tkWidgets_1.30.0 DynDoc_1.30.0 widgetTools_1.30.0 simpleaffy_2.28.0 gcrma_2.24.1 [7] genefilter_1.34.0 siggenes_1.26.0 multtest_2.8.0 affy_1.30.0 Biobase_2.12.1 rcom_2.2-3.1 [13] rscproxy_1.3-1 loaded via a namespace (and not attached): [1] affyio_1.20.0 annotate_1.30.0 AnnotationDbi_1.14.1 Biostrings_2.20.1 DBI_0.2-5 [6] IRanges_1.10.4 MASS_7.3-12 preprocessCore_1.14.0 RSQLite_0.9-4 survival_2.36-5 [11] xtable_1.5-6 [[alternative HTML version deleted]]

miRNA affy siggenes miRNA affy siggenes • 1.3k views

ADD COMMENT • link 13.8 years ago Assa Yeroslaviz ★ 1.5k