Question: Confusion about Design Matrix ,Contrast Matrix and logFC values in limma package
0
gravatar for sujata
11 days ago by
sujata0
India/Chennai/REC
sujata0 wrote:

For my data set 9 control and 7 cases are there The following matrix is the design matrix generated by command design1=model.matrix(~mm) where mm is

mm=pData(eset)$treatment

and eset has been got by following command

m<-as.matrix(read.csv("Alzheimer.csv", sep=",", row.names=1))
tmp <- read.csv("pdata.txt", row.names = 1,sep=",")
pdata <- AnnotatedDataFrame(tmp)
eset <- new("ExpressionSet", exprs = m, phenoData = pdata)
mm=pData(eset)$treatment
design1
   (Intercept) mm
1            1  0
2            1  0
3            1  0
4            1  0
5            1  0
6            1  0
7            1  0
8            1  0
9            1  0
10           1  1
11           1  1
12           1  1
13           1  1
14           1  1
15           1  1
16           1  1
attr(,"assign")
[1] 0 1

I am very new to the area for microarray data analysis I am making contrast matrix by the following commands and getting some warnnings Please help me in clarifying the points. I am confused whether I am doing correctly or not.

> contrastm <- makeContrasts(mm, levels = design1)
Warning message:
In makeContrasts(mm, levels = design1) : Renaming (Intercept) to Intercept
>contrastm
           Contrasts
Levels      mm
  Intercept  0
  mm         1
> fit2 <- contrasts.fit(fit, contrastm)
Warning message:
In contrasts.fit(fit, contrastm) :
  row names of contrasts don't match col names of coefficients
> fit2b <- eBayes(fit2)
> topTable(fit2b,coef=1)
                logFC   AveExpr         t      P.Value  adj.P.Val         B
215306_at    431.0302  376.9312  8.366248 5.120111e-07 0.01140914 -4.457608
218801_at   -281.7603  491.2188 -7.091080 3.782164e-06 0.02504624 -4.467288
203894_at   -413.7508  862.6063 -7.063322 3.959712e-06 0.02504624 -4.467541
219746_at    283.7810  439.5875  6.986785 4.496026e-06 0.02504624 -4.468248
213666_at   -419.2190  702.4250 -6.299508 1.457800e-05 0.05814260 -4.475374
203461_at    222.0429  227.0438  6.255007 1.576693e-05 0.05814260 -4.475888
213929_at    151.5079  209.1625  6.171945 1.826497e-05 0.05814260 -4.476866
209129_at    218.1206  436.3500  6.015035 2.417706e-05 0.06734219 -4.478782
218456_at   -700.8302 1294.4313 -5.822691 3.425263e-05 0.08330774 -4.481258
200980_s_at -334.3381 1216.2938 -5.774792 3.738623e-05 0.08330774 -4.481897

why logFC is value is so high when thresold is only 1.5

-ve values are down regulated gene and prositive values are uprelulated genes?

limma • 92 views
ADD COMMENTlink modified 11 days ago by Gordon Smyth39k • written 11 days ago by sujata0
Answer: Confusion about Design Matrix ,Contrast Matrix and logFC values in limma package
1
gravatar for James W. MacDonald
11 days ago by
United States
James W. MacDonald52k wrote:

What do you mean by 'the threshold is only 1.5'? If you had a threshold of 1.5, and the fold change was 431, that seems OK? I mean, 431 is greater than 1.5, so it seems reasonable to me. But you don't have anything here that indicates a threshold, so I have no idea what you are getting at. In addition, you apparently are using raw expression values of some kind, and in general you would probably want to take logs.

Anyway, you are being pretty mysterious about what you are doing and what sort of data these are, so it's pretty difficult to provide any reasonable help.

ADD COMMENTlink written 11 days ago by James W. MacDonald52k
Answer: Confusion about Design Matrix ,Contrast Matrix and logFC values in limma package
0
gravatar for Gordon Smyth
11 days ago by
Gordon Smyth39k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth39k wrote:

The first problem is that you are apparently trying to analyse expression data that is unlogged and (possibly) unnormalized. limma works on log2-expression values, which yours are not.

Apart from that, your analysis might be ok but you are including quite a few unnecessary commands and steps. Why don't you follow the limma User's Guide examples? I wonder what documentation you are following, because comparing two groups is very straightforward. You don't even need a contrast matrix (or an ExpressionSet or an AnnotatedDataFrame).

If you want more help you would have to show us what your data is, for example by:

head(m)
head(tmp)

You would also need to provide code that is consistent. At the moment, you use objects design and fit without showing the code that created them.

ADD COMMENTlink modified 11 days ago • written 11 days ago by Gordon Smyth39k

The data and the code I have pasted here

GSM21215 GSM21217 GSM21218 GSM21219 GSM21220 GSM21221 GSM21226 GSM21231 GSM21232 GSM21203 GSM21206
1007_s_at   2735.0   3746.0   3317.0   5689.0   4192.1   3048.6   4723.8   4662.9   5801.2   6141.2   4533.3
1053_at       42.7     26.1    138.9     77.6     86.2    129.4     48.4    111.3     34.3    105.5    145.0
117_at       161.3    153.6    381.2    248.8    894.1     98.2    233.7     86.6    328.0     92.0   6045.3
121_at      1272.3    979.6    917.4   1291.9   1176.9   1242.9   1659.7   1398.0   1631.5   1105.5   1237.8
1255_g_at    312.7    444.6     97.5    206.4    189.8    197.5    137.1    120.2    142.0    221.4     77.4
1294_at      320.8    319.9    297.0    470.8    240.2    285.1    542.2    526.9    707.5    493.6    178.6
          GSM21207 GSM21208 GSM21212 GSM21213 GSM21229
1007_s_at   9341.1   5003.2   8013.1   7905.5   3646.4
1053_at       23.3      6.1     74.9     27.7     66.2
117_at       262.5    240.7    170.3     77.6    172.9
121_at      2538.4   2055.6   1565.0   2258.2   1287.7
1255_g_at    134.0    202.4    188.8    232.7     84.0
1294_at      450.1    419.3    588.2    532.8    303.5
 head(tmp)
         treatment
GSM21215         0
GSM21217         0
GSM21218         0
GSM21219         0
GSM21220         0
GSM21221         0

pdata.txt
id,treatment
GSM21215,0
GSM21217,0
GSM21218,0
GSM21219,0
GSM21220,0  
GSM21221,0
GSM21226,0
GSM21231,0
GSM21232,0
GSM21203,1
GSM21206,1  
GSM21207,1



code:
m<-as.matrix(read.csv("Alzheimer.csv", sep=",", row.names=1))
tmp <- read.csv("pdata.txt", row.names = 1,sep=",")
pdata <- AnnotatedDataFrame(tmp)
eset <- new("ExpressionSet", exprs = m, phenoData = pdata)
mm=pData(eset)$treatment
design1=model.matrix(~mm)
fit=lmFit(eset,design1)
contrastm <- makeContrasts(mm, levels = design1)
fit2 <- contrasts.fit(fit, contrastm)
fit2b <- eBayes(fit2)
topTable(fit2b,coef=1)

ADD REPLYlink modified 10 days ago by James W. MacDonald52k • written 10 days ago by sujata0

That's just a part of your code. And looking at that GSE I don't see where there's any treatment? There is also no need to do it the way you have. An alternative is to do

> library(GEOquery)
> eset <- getGEO("GSE1297")[[1]]
> eset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 22283 features, 31 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: GSM21203 GSM21204 ... GSM21233 (31 total)
  varLabels: title geo_accession ... Sex:ch1 (56 total)
  varMetadata: labelDescription
featureData
  featureNames: 1007_s_at 1053_at ... AFFX-TrpnX-M_at (22283 total)
  fvarLabels: ID GB_ACC ... Gene Ontology Molecular Function (16 total)
  fvarMetadata: Column Description labelDescription
experimentData: use 'experimentData(object)'
  pubMedIds: 14769913 
Annotation: GPL96 
## design matrix with Alzheimer severity and sex
> design <- model.matrix(~characteristics_ch1.1 + characteristics_ch1.6, pData(eset))
> colnames(design)[-1] <- c("Incipient","Moderate","Severe","M")
> exprs(eset) <- log2(exprs(eset))
## only keep the RefSeq/GenBank ID, gene name, symbol, and Gene ID
> fData(eset) <- fData(eset)[,c(9:12)]
> fit <- lmFit(eset, design)
> fit2 <- eBayes(fit)
> topTable(fit2, 2, 2)
          Representative.Public.ID                         Gene.Title
222196_at                 AK000470 zinc finger protein 839 pseudogene
207479_at                NM_018613                                   
          Gene.Symbol ENTREZ_GENE_ID    logFC  AveExpr        t      P.Value
222196_at   LOC389906         389906 2.800012 3.879957 4.598821 7.601561e-05
207479_at                            1.788760 3.654219 4.298046 1.748021e-04
          adj.P.Val         B
222196_at 0.9068039 -3.988158
207479_at 0.9068039 -4.040426
> topTable(fit2, 3, 2)
          Representative.Public.ID
209628_at                 AK023289
202133_at                 BF674349
                                               Gene.Title Gene.Symbol
209628_at nuclear transport factor 2-like export factor 2        NXT2
202133_at  WW domain containing transcription regulator 1       WWTR1
          ENTREZ_GENE_ID    logFC  AveExpr        t      P.Value adj.P.Val
209628_at          55916 1.059065 8.579427 5.239891 1.273193e-05 0.1847719
202133_at          25937 1.816572 9.433800 5.145136 1.658411e-05 0.1847719
                 B
209628_at 2.756052
202133_at 2.545644
> topTable(fit2, 4, 5)
            Representative.Public.ID
219073_s_at                NM_017784
215320_at                   AL080179
206384_at                  NM_006539
204743_at                  NM_013259
218456_at                  NM_023925
                                                             Gene.Title
219073_s_at                           oxysterol binding protein-like 10
215320_at                                 SPATA31 subfamily C, member 2
206384_at           calcium channel, voltage-dependent, gamma subunit 3
204743_at                                                  transgelin 3
218456_at                                        caprin family member 2
            Gene.Symbol ENTREZ_GENE_ID      logFC   AveExpr         t
219073_s_at     OSBPL10         114884 -1.6670533  7.910239 -6.687176
215320_at     SPATA31C2         645961  3.1101555  2.235307  6.171726
206384_at        CACNG3          10368 -2.7021032  9.688733 -5.859596
204743_at        TAGLN3          29114 -1.1742816 11.104861 -5.728398
218456_at       CAPRIN2          65981 -1.0452887 10.278001 -5.721286
                 P.Value   adj.P.Val        B
219073_s_at 2.368482e-07 0.005277689 6.721354
215320_at   9.627984e-07 0.010727019 5.486638
206384_at   2.274559e-06 0.012618956 4.724691
204743_at   3.270739e-06 0.012618956 4.401761
218456_at   3.335861e-06 0.012618956 4.384218
> nrow(topTable(fit2, 4, Inf, p.value = 0.05))
[1] 411

So the only changes are between the severe Alzheimer disease and the controls, and there are about 411 genes with an FDR of 5%.

ADD REPLYlink written 10 days ago by James W. MacDonald52k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 141 users visited in the last hour