Question

Appropriate use of the function mroast to find KEGG pathways for differential expression

0

Entering edit mode

svlachavas ▴ 840

@svlachavas-7225

Last seen 15 months ago

Germany/Heidelberg/German Cancer Resear…

Dear All,

i have performed a limma paired analysis on an affymetrix expression dataset(13 cancer vs 13 control samples).

My initial non-specific filtered expressionSet which was used as input for the limma analysis: data.trusted.eset

and my design matrix looks like this:

(Intercept) conditionCancer pairs2 pairs3 pairs4 pairs5 pairs6 pairs7 pairs8 pairs9 pairs10 pairs11.....
1 1 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0 0 0 0 0 0 0
3 1 0 1 0 0 0 0 0 0 0 0 0
4 1 1 1 0 0 0 0 0 0 0 0 0
5 1 0 0 1 0 0 0 0 0 0 0 0
6 1 1 0 1 0 0 0 0 0 0 0 0
7 1 0 0 0 1 0 0 0 0 0 0 0
8 1 1 0 0 1 0 0 0 0 0 0 0
9 1 0 0 0 0 1 0 0 0 0 0 0
10 1 1 0 0 0 1 0 0 0 0 0 0
11 1 0 0 0 0 0 1 0 0 0 0 0
12 1 1 0 0 0 0 1 0 0

The specific coefficient im interested is "conditionCancer"

So i used the code below:

x <- hgu133aPATH2PROBE

mapped_probes <- mappedkeys(x)

xx <- as.list(x[mapped_probes])

indices <- ids2indices(xx, rownames(data.trusted.eset))

res <- mroast(data.trusted.eset, indices, design, contrast=2)

Firstly, my main question is if the implementation of the mroast fuction is correct regarding the argument contrast=2 in which i believe refers to the coefficient="conditionCancer" ?

Secondly, should i filter my data.trusted.eset with the output of the topTable function after removing possible duplicates and probesets mapping to NAs, in order to remove possible false results ?

For instance:

filtered <- data.trusted.eset[rownames(top3),] # where top3 is the final dataframe of topTable function after removing duplicates and NAs

and then use filtered instead of the initial expressionset ??

mroast limma pathway analysis differential gene expression affymetrix • 1.9k views

ADD COMMENT • link 9.8 years ago svlachavas ▴ 840

score 1 · Answer 1 · 2015-03-30

1

Entering edit mode

Kamil Slowikowski ▴ 30

@kamil-slowikowski-6901

Last seen 8 months ago

United States

Yes, contrast=2 is correct: it will use the second column of your design matrix.

You might consider reading the documentation for mroast with:

library(limma)
?mroast

When you perform multiple tests, such as differential expression for multiple genes, you should adjust for multiple testing. Read about the different adjustment methods, and then try using the decideTests function in the limma package:

?p.adjust
?decideTests

ADD COMMENT • link 9.8 years ago Kamil Slowikowski ▴ 30

0

Entering edit mode

Dear kslowikowski, thank you for your verification about the argument contrast. Moreover, i often use the function treat() or rank the probesets(genes) based on various criteria, but i have never considered decideTests, so i will see how it works

ADD REPLY • link 9.8 years ago svlachavas ▴ 840

score 1 · Answer 2 · 2015-03-30

1

Entering edit mode

Gordon Smyth 52k

@gordon-smyth

Last seen 16 hours ago

WEHI, Melbourne, Australia

You should filter out non-expressed probes from the eset, in the same way you would do so before running lmFit() and eBayes(). However there is no need to remove duplicates or probe sets without good annotation.

ADD COMMENT • link 9.8 years ago Gordon Smyth 52k

score 0 · Answer 3 · 2015-03-31

Dear Mr. Gordon Smyth, i have already performed non-specifici intensity filtering to remove any non-expressed probesets. Thus, if i dont have to remove duplicates or probesets without good annotation then my expressionset is ready to be used. Please excuse for asking one to more question, but it is also very important regarding the downstream process:

after the function mroast, i checked also a tutorial about an easy way to see the pathways, which needs a data matrix with ENTREZ Gene ID as the rownames. In detail:

library(pathview)

x <- hgu133aENTREZID

xx<- as.list(x)

entrezid <- sapply(rownames(fit2), function(x) xx[x], USE.NAMES=FALSE) # where fit2 is the output of ebayes function (fit2 <- eBayes(fit, trend=TRUE))

So my question refers to the next row:

gene.data <- fit2$coefficients

rownames(gene.data) <- entrezid

path <- pathview(gene.data=gene.data....)

So in " gene.data <- fit2$coefficients": should i use it just like this or i have to specify the coefficient im interested in, as is in contrast above ?

> head(fit2$coefficients) ??
(Intercept) conditionCancer pairs2 pairs3 pairs4 pairs5 pairs6 pairs7
1007_s_at 10.215808 0.22343373 0.52147061 0.60510597 1.0420911 0.5031775 0.06009372 0.1640625
1438_at 5.911114 2.53652962 0.70450123 -1.01602942 -0.6130980 -1.1538897 -1.33425754 -2.3958104
1487_at 9.050908 -0.20654769 -0.06396680 0.28847187 -0.0528669 0.3469220 0.04466553 0.4973197
1598_g_at 10.279677 -1.11014791 -0.61752751 0.12946488 0.4559134 0.4962469 -0.05954733 -0.1744488
1729_at 7.941900 -0.02386093 0.69056733 0.05095071 0.7857444 0.8638121 0.16643800 0.2573498
200000_s_at 9.991276 0.04892901 0.08502522 -0.35774486 0.2756260 -0.3458185 -0.71372258 -0.1826813