edgeR - inconsistent logFC sign
2
0
Entering edit mode
@sharvari-gujja-6614
Last seen 22 months ago
United States

Hello Gordon,

 

I am using edgeR to do pairwise comparison for 3 conditions. The avg values for each condition are:

 

AVG_C1 : 1596

AVG_C2: 1354

AVG_R: 1934

 

C1 vs. R gives me logFC of -0.333

and C2 vs. R gives me logFC of 0.066

 

logFC of -0.33 indicates the gene is down regulated in C1 - which is correct looking at the avg values.

However, logFC of 0.066  indicates that the gene is up-regulated in C2 - which s NOT correct looking at the avg values.

 

I get correct sign for some genes, while others don't seem right. I am not worried about the magnitude.

 

I've checked my code, but not sure why there is discrepancy in the sign for logFC:

 

counts <- read.table( file = "juanCounts.txt" , header = TRUE, row.names=1)

A <- c("C2")

B <- c("R")

n <- as.numeric(4)

group <- c(rep(A, n) , rep(B, n))

counts<- data.frame(counts[,grep(paste(c(A,B),collapse="|"),names(counts))])

cds <- DGEList( counts, group = group )

cds <- calcNormFactors( cds )

cds <- estimateCommonDisp( cds )

cds <- estimateTagwiseDisp( cds )

de.cmn <- exactTest( cds ,  pair = c( A,B ) )

resultsTbl.cmn <- topTags( de.cmn , n = nrow( de.cmn$table ) )$table

 

Could you please advice if I missed something. If it's helpful I could send you the counts matrix.

 

Thanks for all the help.

edgeR • 2.0k views
ADD COMMENT
0
Entering edit mode

I only see two groups in your code: R2 and C. Where's C1?

ADD REPLY
0
Entering edit mode

The edgeR code has only 2 groups for pairwise comparison. For C1 vs. R comparison, I modify A and B as:

A <- c("C1")

B <- c("R")

 

The point I am trying to make is that the sign for the logFC doesn't seem right. 

Another example for C1 vs. R comparison is:

Gene1:

AVG_C1 : 18274

AVG_R: 2140

logFC is -0.35

Gene2:

AVG_C1 : 18791

AVG_R: 22027

logFC is -0.19

 

For Gene1 , avg C1 > avg R, and logFC sign is negative

For Gene2, avg C1 < avg R, and logFC sign is also negative.

If the sign is based on Condition 1 then shouldn't the logFC be positive for gene1 and negative for gene2.

Could someone help with this issue.

Thanks

 

ADD REPLY
0
Entering edit mode
@steve-lianoglou-2771
Last seen 22 months ago
United States

Well, for starters: when you are estimating avg abundance counts by hand, I'm guessing you're using raw counts? Ie. counts that are not normalized by sequencing depth per sample?

ADD COMMENT
0
Entering edit mode

Yes, I am using raw counts from htseq-count.

 

 

ADD REPLY
0
Entering edit mode
@sharvari-gujja-6614
Last seen 22 months ago
United States

The point I am trying to make is that the sign for the logFC doesn't seem right. 

Another example for C1 vs. R comparison is:

Gene1:

AVG_C1 : 18274

AVG_R: 2140

logFC is -0.35

Gene2:

AVG_C1 : 18791

AVG_R: 22027

logFC is -0.19

 

For Gene1 , avg C1 > avg R, and logFC sign is negative

For Gene2, avg C1 < avg R, and logFC sign is also negative.

If the sign is based on Condition 1 then shouldn't the logFC be positive for gene1 and negative for gene2.

Could someone help with this issue.

ADD COMMENT
2
Entering edit mode

The point that Steve was trying to get you to understand is that the raw counts aren't that useful for deciding what the fold change 'should be'. You have to account for library size before computing fold changes, so the raw counts are not a sufficient statistic to infer anything.

As an example, consider two samples, for one gene sample A has 5000 counts, and sample B has 10,000 counts. So a 2-fold difference, right? But then maybe not. What if sample A has a total of 10M reads, and sample B has a total of 20M reads. All things equal, you expect about twice as many counts per gene for sample B as compared to sample A, if there are no differences in expression for any genes. So after accounting for the fact that the library size was twice as large for sample B, we would think the fold change is actually really close to 1 for that gene.

So you don't have enough information from the average counts to say what the logFC should be. You would be better off computing the CPM and then computing the average CPM per group, which is what the logFC is based off of.

ADD REPLY
0
Entering edit mode

Thank you so much for the explanation. This was really helpful!

So, a positive logFC indicates that the gene is up regulated in Condition1 compared to Condition2 and vice versa..?

 

ADD REPLY
0
Entering edit mode

I got it. For edgeR, a positive logFC indicates that the gene is up regulated in Condition2 compared to Condition1 and vice versa.

ADD REPLY

Login before adding your answer.

Traffic: 827 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6