"Subscript out of bounds" erorr during microarray analysis with Limma
3
0
Entering edit mode
Claudia ▴ 10
@claudia-6236
Last seen 5.4 years ago
Sweden

Hello.

I'm trying to analyze some affymetrix human gene st 1.0 arrays with oligo and limma and I receive an error when I try to generate a toptable.

To keep things simple for the moment, I would just like to see differential expression on the base of the dichotomous variable Diagnosis of my pheno file. Is it correct to generate the model matrix the way I tried?

Any help would be much appreciated.

Here are my code, the error and the output of sessionInfo.

library(oligo)

celFiles <- list.celfiles()
pheno <- read.AnnotatedDataFrame("phenodata.txt", header = TRUE, row.name="Name",sep="\t")
oligoRaw <- read.celfiles(filenames=celFiles, phenoData=pheno)
eset <- oligo::rma(oligoRaw)

library(limma)
dm<-model.matrix(~pheno$Diagnosis) fit<-lmFit(eset, dm) fitE<-eBayes(fit) toptable(fitE, coef=pheno$Diagnosis)

Error in fit$coefficients[, coef, drop = FALSE] : subscript out of bounds In addition: Warning message: In toptable(fitE, coef = pheno$Diagnosis) :
Treat is for single coefficients: only first value of coef being used

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods
[9] base

other attached packages:
[1] limma_3.24.13              pd.hugene.1.0.st.v1_3.14.1 RSQLite_1.0.0
[4] DBI_0.3.1                  oligo_1.32.0               Biostrings_2.36.1
[7] XVector_0.8.0              IRanges_2.2.5              S4Vectors_0.6.1
[10] Biobase_2.28.0             oligoClasses_1.30.0        BiocGenerics_0.14.0

loaded via a namespace (and not attached):
[1] affxparser_1.40.0     GenomicRanges_1.20.5  splines_3.2.1
[4] zlibbioc_1.14.0       bit_1.1-12            foreach_1.4.2
[7] GenomeInfoDb_1.4.1    tools_3.2.1           ff_2.2-13
[10] iterators_1.0.7       preprocessCore_1.30.0 affyio_1.36.0
[13] codetools_0.2-14      BiocInstaller_1.18.3
oligo limma bioconductor affymetrix microarrays • 5.9k views
0
Entering edit mode
@steve-lianoglou-2771
Last seen 8 minutes ago
United States

You're problem is that passing coef=pheno$Diagnosis to toptable is wrong (also, probably you meant to use topTable, instead?) library(limma) dm<-model.matrix(~pheno$Diagnosis)
fit<-lmFit(eset, dm)
fitE<-eBayes(fit)
toptable(fitE, coef=pheno\$Diagnosis)

A valid value for coef would be the name of one of the non-intercept columns of dm (ie. any of colnames(dm)[-1]), or an index of one of its columns (ie. 2 <= idx <= nrow(dm)).

0
Entering edit mode
Claudia ▴ 10
@claudia-6236
Last seen 5.4 years ago
Sweden

Thank you a lot for your suggestion. I tried what you suggested and it worked, but the p-values I obtain seem to be exagerately low to be real.

> topTable(fitE, coef=1)
logFC  AveExpr        t            P.Value       adj.P.Val         B
8112916 14.38131 14.37817 750.9670 1.371181e-47 1.532671e-43 84.93415
8174970 14.38131 14.37817 750.9670 1.371181e-47 1.532671e-43 84.93415
7946565 14.39045 14.38589 750.7118 1.380909e-47 1.532671e-43 84.93264
8109821 14.08591 14.07765 701.3604 5.679887e-47 3.828884e-43 84.61264

Do you have any idea on that might be causing this?

Thank you for your time.

1
Entering edit mode

You have fit a model with an intercept, which is estimating the mean of one of your groups. When you test this coefficient, the hypothesis you are testing is whether or not the coefficient is equal to zero. Since you are estimating the mean of a sample type, the estimates are almost always much different from zero, so you get very small p-values.

0
Entering edit mode
Claudia ▴ 10
@claudia-6236
Last seen 5.4 years ago
Sweden

Thank you for your explanation. In order to test whether or not there are differentially expressed genes the two groups and the associated p-values should I use a different model?

Should I do something like this or are there better ways to test my hypothesis?

library(limma)
design <- model.matrix(~ -1+factor(c(1,1,1,1,0,1,1,0,0,1,1,0,0,0,0,0,0,0,1,1)))
colnames(design) <- c("FR", "NR")
fit <- lmFit(eset, design)
contrast.matrix <- makeContrasts(FR-NR, levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
topTable(fit2, coef=1, number = 100)

Thank you for all your help

0
Entering edit mode

That looks fine, assuming that a value of 1 corresponds to NR while 0 corresponds to FR.

0
Entering edit mode

That is correct. If I analyze the data this way I found no genes with a significant adjusted p-value, and very few nominal significant genes at all, and these results look a bit strange to me. Do you see something else that I may have done wrong or have any other suggestions on how I should run this analysis? Thank you for all your advice.

logFC             AveExpr         t      P.Value adj.    P.Val             B
7892939  0.3124115 11.605640  5.339525 2.786726e-05 0.1110249 2.433165
8117081 -0.4669661  9.317189 -5.225590 3.632258e-05 0.1110249 2.208148

0
Entering edit mode

Look at your MDS plot. Is there substantial separation between the FR and NR samples? Are the replicates clustered close together in each group? If not, there may be outlier samples or hidden batch effects that you need to consider. Alternatively, the biological truth may just be that there is no significant DE between FR and NR.

0
Entering edit mode

I will work on this, thanks you for all the suggestions.