Large foldchange values while analyzing rna seq data using limma
1
0
Entering edit mode
szenitha ▴ 20
@szenitha-10863
Last seen 7.4 years ago

Hello everyone,

I am new to data analysis and having trouble while trying to analyze rna-seq data using limma package. My RNA-seq data has 3 conditions(ifected lung,infected spleen,non infected spleen) and I am doing 2 things. Firstly I perform anova and then I define contrast and try to get the differentially expressed genes in the contrast. The problem is that I am getting very large fold change values. I am not sure what I am doing wrong. Please help.

Code:

design <- model.matrix(~0+State,data=pheno)
design
colnames(design)<-c("influng","infspleen","noninfspleen")

fit <- lmFit(data_normalized[variable_genes,],design)
fit <- eBayes(fit)

#define contrast

cont.matrix <- makeContrasts(infspleen.vs.noninfspleen=infspleen-noninfspleen,
       influng.vs.noninfspleen=influng-noninfspleen,
       influng.vs.infspleen=influng-infspleen,
            levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
#ANOVA

anova_result<-topTable(fit2,number=Inf)
top_var_genes<-topTable(fit2,number=1000,sort.by="p") #get 1000 most variable genes

head(topTable(fit2,coef = 3,adjust="fdr",number=Inf) )        #coef= 1 inf spleen vs non inf spleen
                                                                          #coef=2 inf lung vs non inf spleen
                                                                          #coef=3 inf lung vs inf spleen

##################################################################################################

output:

         logFC   AveExpr          t      P.Value    adj.P.Val         B
Ccr5  -23659.522 8989.1708 -156.95385 2.800465e-13 3.100115e-09 -4.333076
Anpep  -1974.974  688.7282 -137.83693 6.728328e-13 3.724130e-09 -4.333102
Htr7   -1058.050  393.6362 -101.96705 5.143171e-12 1.897830e-08 -4.333194
Mmp14  -4817.820 1745.7760  -72.20709 5.274585e-11 1.459741e-07 -4.333395
Socs3 -19469.569 7413.1870  -57.02656 2.589051e-10 5.231674e-07 -4.333639
Rin2   -9375.873 4753.0032  -56.26191 2.835596e-10 5.231674e-07 -4.333657

 

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.8             data.table_1.9.6           DESeq2_1.12.3             
 [4] rgl_0.95.1441              corrplot_0.77              RColorBrewer_1.1-2        
 [7] ffpe_1.16.0                TTR_0.23-1                 RUVSeq_1.6.2              
[10] edgeR_3.14.0               limma_3.28.14              EDASeq_2.6.2              
[13] ShortRead_1.30.0           GenomicAlignments_1.8.4    Rsamtools_1.24.0          
[16] Biostrings_2.40.2          XVector_0.12.0             BiocParallel_1.6.2        
[19] RSkittleBrewer_1.1         zebrafishRNASeq_0.106.2    BiocInstaller_1.22.3      
[22] scales_0.4.0               ggplot2_2.1.0              SummarizedExperiment_1.2.3
[25] Biobase_2.32.0             GenomicRanges_1.24.2       GenomeInfoDb_1.8.2        
[28] IRanges_2.6.1              S4Vectors_0.10.1           BiocGenerics_0.18.0       
[31] sva_3.20.0                 genefilter_1.54.2          mgcv_1.8-12               
[34] nlme_3.1-128              

 

limma logfoldchange • 1.1k views
ADD COMMENT
2
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 1 hour ago
The city by the bay

I would guess that you haven't log-transformed your expression values. limma assumes that the values are log-transformed, see ?lmFit. Also, I don't know what your variable_genes represents, but filtering to keep the most variable genes is not recommended - see Section 7 of the limma user's guide for details.

Edit: Just realized you're working with RNA-seq data; in which case, you should be using voom on your original gene-level counts to get log-transformed intensities for use in limma. If you don't have counts, then your options are limited - see Gordon's answer at A: Differential expression of RNA-seq data using limma and voom().

ADD COMMENT
0
Entering edit mode

I am actually normalizing the data using deseq2 and then filtering out genes which do not have a max expression of 10 in atleast one of the conditions. should I be normalizing the data only using limma?

ADD REPLY
0
Entering edit mode

Why are you making things difficult for yourself? Just follow the voom-based workflow in Chapter 18 of the limma user's guide. There's no good reason to do more complicated things, especially if you're new to RNA-seq data analysis.

ADD REPLY
0
Entering edit mode

ok. Thank you.

ADD REPLY

Login before adding your answer.

Traffic: 873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6