Question: Large foldchange values while analyzing rna seq data using limma
0
gravatar for szenitha
2.7 years ago by
szenitha0
szenitha0 wrote:

Hello everyone,

I am new to data analysis and having trouble while trying to analyze rna-seq data using limma package. My RNA-seq data has 3 conditions(ifected lung,infected spleen,non infected spleen) and I am doing 2 things. Firstly I perform anova and then I define contrast and try to get the differentially expressed genes in the contrast. The problem is that I am getting very large fold change values. I am not sure what I am doing wrong. Please help.

Code:

design <- model.matrix(~0+State,data=pheno)
design
colnames(design)<-c("influng","infspleen","noninfspleen")

fit <- lmFit(data_normalized[variable_genes,],design)
fit <- eBayes(fit)

#define contrast

cont.matrix <- makeContrasts(infspleen.vs.noninfspleen=infspleen-noninfspleen,
       influng.vs.noninfspleen=influng-noninfspleen,
       influng.vs.infspleen=influng-infspleen,
            levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
#ANOVA

anova_result<-topTable(fit2,number=Inf)
top_var_genes<-topTable(fit2,number=1000,sort.by="p") #get 1000 most variable genes

head(topTable(fit2,coef = 3,adjust="fdr",number=Inf) )        #coef= 1 inf spleen vs non inf spleen
                                                                          #coef=2 inf lung vs non inf spleen
                                                                          #coef=3 inf lung vs inf spleen

##################################################################################################

output:

         logFC   AveExpr          t      P.Value    adj.P.Val         B
Ccr5  -23659.522 8989.1708 -156.95385 2.800465e-13 3.100115e-09 -4.333076
Anpep  -1974.974  688.7282 -137.83693 6.728328e-13 3.724130e-09 -4.333102
Htr7   -1058.050  393.6362 -101.96705 5.143171e-12 1.897830e-08 -4.333194
Mmp14  -4817.820 1745.7760  -72.20709 5.274585e-11 1.459741e-07 -4.333395
Socs3 -19469.569 7413.1870  -57.02656 2.589051e-10 5.231674e-07 -4.333639
Rin2   -9375.873 4753.0032  -56.26191 2.835596e-10 5.231674e-07 -4.333657

 

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.8             data.table_1.9.6           DESeq2_1.12.3             
 [4] rgl_0.95.1441              corrplot_0.77              RColorBrewer_1.1-2        
 [7] ffpe_1.16.0                TTR_0.23-1                 RUVSeq_1.6.2              
[10] edgeR_3.14.0               limma_3.28.14              EDASeq_2.6.2              
[13] ShortRead_1.30.0           GenomicAlignments_1.8.4    Rsamtools_1.24.0          
[16] Biostrings_2.40.2          XVector_0.12.0             BiocParallel_1.6.2        
[19] RSkittleBrewer_1.1         zebrafishRNASeq_0.106.2    BiocInstaller_1.22.3      
[22] scales_0.4.0               ggplot2_2.1.0              SummarizedExperiment_1.2.3
[25] Biobase_2.32.0             GenomicRanges_1.24.2       GenomeInfoDb_1.8.2        
[28] IRanges_2.6.1              S4Vectors_0.10.1           BiocGenerics_0.18.0       
[31] sva_3.20.0                 genefilter_1.54.2          mgcv_1.8-12               
[34] nlme_3.1-128              

 

limma logfoldchange • 369 views
ADD COMMENTlink modified 2.7 years ago by Aaron Lun23k • written 2.7 years ago by szenitha0
Answer: Large foldchange values while analyzing rna seq data using limma
2
gravatar for Aaron Lun
2.7 years ago by
Aaron Lun23k
Cambridge, United Kingdom
Aaron Lun23k wrote:

I would guess that you haven't log-transformed your expression values. limma assumes that the values are log-transformed, see ?lmFit. Also, I don't know what your variable_genes represents, but filtering to keep the most variable genes is not recommended - see Section 7 of the limma user's guide for details.

Edit: Just realized you're working with RNA-seq data; in which case, you should be using voom on your original gene-level counts to get log-transformed intensities for use in limma. If you don't have counts, then your options are limited - see Gordon's answer at A: Differential expression of RNA-seq data using limma and voom().

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Aaron Lun23k

I am actually normalizing the data using deseq2 and then filtering out genes which do not have a max expression of 10 in atleast one of the conditions. should I be normalizing the data only using limma?

ADD REPLYlink written 2.7 years ago by szenitha0

Why are you making things difficult for yourself? Just follow the voom-based workflow in Chapter 18 of the limma user's guide. There's no good reason to do more complicated things, especially if you're new to RNA-seq data analysis.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Aaron Lun23k

ok. Thank you.

ADD REPLYlink written 2.7 years ago by szenitha0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 168 users visited in the last hour