Question: limma for getting DEGs from my own input matrix
0
gravatar for shuvadeep.maity
7 months ago by
shuvadeep.maity0 wrote:

Hi I have normalized RNAseq data for treatment and control of a particular cell line. The matrix is the log expression ratio(12hrscon/0hrs)and (12hrstrated/0hrs). Can I use limma to get the DEGs using the my matrix as direct input file?

I have 2 replicates (each with control and treatment). I designed my matrix as follows

design.mat<-cbind(c(1,1,0,0),c(0,0,1,1)) colnames(design.mat) <- c("Treated", "Con") design.mat Treated Con [1,] 1 0 [2,] 1 0 [3,] 0 1 [4,] 0 1 and contrast.mat <- matrix(c(1,-1), ncol = 1) dimnames(contrast.mat) <- list(c("Treated","Con"), "Diff") contrast.mat Diff Treated 1 Con -1

My input matrix
Gene_name Treated Treated.1 Con Con.1 1 1110004E09Rik 0.03105187 -0.14257259 0.02448855 0.1928913 2 2010300C02Rik -0.37487930 0.09699910 0.03967733 0.1089108 3 2210016F16Rik -0.57569502 -0.56298981 -0.01678530 -0.1019228 4 2410002F23Rik -0.01191708 -0.25381992 -0.05737920 -0.1057968 5 2700097O09Rik 0.04423673 -0.13028207 0.07958317 0.1220447 6 4931406P16Rik -0.00345034 0.06384793 -0.07985070 -0.1200976

Is it okay to use limma on my matrix data ? is the design is correct?

Thanks

limma • 273 views
ADD COMMENTlink modified 7 months ago by Aaron Lun24k • written 7 months ago by shuvadeep.maity0
Answer: limma for getting DEGs from my own input matrix
3
gravatar for Aaron Lun
7 months ago by
Aaron Lun24k
Cambridge, United Kingdom
Aaron Lun24k wrote:

No, you should not use limma on your matrix of log-ratios. The log-ratios discard information about the mean-variance trend that is necessary for accurate modelling. For example, a log-ratio computed from large counts should be much less variable than a log-ratio computed from small counts - but if you only provide the log-ratio, the model is not able to consider the size of the counts from which they are computed, which reduces the power and/or specificity of the DE analysis. Moreover, your log-ratios were probably computed by adding some pseudo-count to avoid undefined values, which further distorts the true log-fold changes.

To avoid all of these problems, you should get your hands on the raw counts and do your analysis using voom (or edgeR, etc.). You should have 8 columns in the count matrix - 2 time points for 2 replicates for 2 treatment conditions. I'll assume that the time points were generated as one 0/12 pair per replicate (i.e., one or your replicates corresponds to a single time course), and block appropriately:

condition <- rep(c("C", "T"), each=4)
time <- rep(c("0", "12"), 4)
block <- gl(4, 2) # paired time points per replicate

design <- model.matrix(~0 + block + time:condition)
design <- design[,!grepl("time0", colnames(design))] # get to full rank
colnames(design) <- make.names(colnames(design)) # friendlier column names

The first four terms represent the blocking factors for each replicate, and are not of scientific interest. The last two terms represent the average log-fold change of expression in time 12 over 0 in the control or treatment conditions. Setting up a contrast to test for DE between control and treatment in edgeR or limma is as easy as:

con <- makeContrasts(time12.conditionT - time12.conditionC, levels=design)

Conceptually, this set-up tests the same null hypothesis as what you described in your post, except it does it better.

ADD COMMENTlink modified 7 months ago • written 7 months ago by Aaron Lun24k

Thank you Aaron for your explanations/help. Much Appreciated!

ADD REPLYlink written 7 months ago by shuvadeep.maity0

Thank you Aaron for your explanations/help. Much Appreciated!

ADD REPLYlink written 7 months ago by shuvadeep.maity0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 241 users visited in the last hour