How do you get averages of each group in Limma output
1
0
Entering edit mode
Matthew • 0
@f2420891
Last seen 4 months ago
United States

Hello, I recently completed running limma-trend to find differentially expressed genes influenced by the sex, tissue type, and infection status of a plant. For example, one comparison was male infected flowers to female infected flowers. According to limma, there are significant differences in many of my comparisons, but in trying to get the output data, I'm unable to obtain certain information. I'm using this code

tab<-topTable(fit, sort.by="none", adjust.method= "BH", n= Inf, coef = 12)
write.table(tab, file = "MIF_FIF.txt", sep = "\t")

to get a file looking at that specific comparison, but this approach only gives me the following output:

"logFC" "AveExpr"   "t" "P.Value"   "adj.P.Val" "B"
"MOEQ000001T1exon1" 0.860344126172371   -2.2168954728788    2.86670254017661    0.0094071156815448  0.0343743208965692  -3.28233021689611
"MOEQ000002T1exon1" 1.74657018568266    -1.89484698082333   3.99972492140815    0.000680711380052985    0.00453880630827772 -0.752598382957659
"MOEQ000002T1exon2" 2.97743301874836    -1.53494458820183   5.18827888515825    4.19384639412824e-05    0.000573395231909479    1.99452393543233

In addition to this information, I'd like to have the output contain the mean expression values for each group, so a column for mean expression in male infected flowers and a column for the mean expression in female infected flowers. Additionally, I'd like to have a column with the log2FC as well. Does anyone know how to go about getting all of this information out of limma into one result file per comparison? Thanks very much!

RNASeq limma • 365 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 7 hours ago
WEHI, Melbourne, Australia

Hi Matthew,

You have an experiment with multiple explanatory factors. Experimental factors include sex, tissue type and infection status and your previous post suggests you also have different genotypes and pairing. It isn't meaningful to present simple averages of male and female plants because that would ignore variations in tissue type and infection status. Trying to compare two averages like that would only make sense for a very simple two-group experiment.

If your experiment was just a series of distinct groups (with design <- model.matrix(~0+group)) then limma would return the group averages as part of the fit object.

More generally, limma presents information in a way that makes sense even for complex experiments like yours. The output gives you to the average expression for each gene across all conditions and the logFC with respect to sex.

You can plot the logFC vs the average expression by plotMD(fit). Try this for example:

dt <- decideTests(fit)
plotMD(fit, status=dt)

Best wishes
Gordon

ADD COMMENT
0
Entering edit mode

Hi Gordon, thank you for the quick response. Regarding the groups, maybe I didn't explain my goal correctly, or maybe I'm just misunderstanding the situation. In comparing the male infected flowers to the female infected flowers, I'd like to see the average expression for eaach of those groups specifically, not the average expression for all males and all females. For the male infected flower condition, I have three replicates, and I'd like to see the average expression of those three replicates for the male infected flowers. I wouldn't expect that average to include things like the males outside that group like the uninfected male flowers or any males looking at other tissue types. My design looks like this:

design<-model.matrix(~0+library)
colnames(design)<- c("FIC","FIF","FUC","FUF","MIC","MIF","MUC","MUF")

where library is

library<-factor(c("FIC","FIF","FUC","FUF","MIC","MIF","MUC","MUF","FIC","FIF","FUC","FUF","MIC","MIF","MUC","MUF","MIC","MIF","MUC","MUF","FIC","FIF","FUC","FUF"))

Each component here is a characterization of one of the 24 samples, denoted by sex, then infection status, and tissue type, so 'FIC' would be female infected crown. My contrasts look like

contrast_matrix<-makeContrasts(MUF-MUC,FIF-FUF,MUF-FUF,FUF-FUC,MIF-MUF,MIC-FIC,MIF-MIC,MIC-MUC,MUC-FUC,FIC-FUC,MIF-FIF,FIF-FIC, levels = design)
fit<-contrasts.fit(fit, contrast_matrix)
fit<-eBayes(fit,trend=TRUE)
topTable(fit, sort.by = "none", coef=1)

Using the above code to look specifically at the comparison of male uninfected flowers to male uninfected crowns, is there a way to get the average expression of both the 3 replicates of male uninfected crowns and the 3 replicates of male uninfected flowers incorporated into the output? I've read somewhere that I will need this information for some of the downstream tools I want to use for functional analysis of the DE genes. Thanks again for your help, I really appreciate it!

ADD REPLY
0
Entering edit mode

You need to change the code to preserve the original fit:

fit <- lmFit(y, design)
fitc <- contrasts.fit(fit, contrast_matrix)

Then all the group means are contained in fit$coefficients while the log-fold-changes are in fitc$coefficients.

You could also consider the functions cpmByGroup or rpkmByGroup in the edgeR package.

I do not know of any functional analysis tools that require group means. All the tools that I know of use logFCs and p-values as output from topTable.

ADD REPLY
0
Entering edit mode

Thanks Gordon, that worked like a charm!

ADD REPLY

Login before adding your answer.

Traffic: 436 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6