Question

DESeq2 differential analysis - log2FC doubt

0

Entering edit mode

Tiago Chedraoui Silva ▴ 260

@tiago-chedraoui-silva-8877

Last seen 3.7 years ago

Brazil - University of São Paulo/ Los A…

Please, could you help me. I have some doubts about the DESeq2 differential analysis.

We were performing a DEA analysis (report: http://rpubs.com/tiagochst/DESeq2) between some cell lines, and the gene log2FC for the ST13P12 gene caught our attention.

In the section ST13P12 of the report you can see that group E has the higher average between all groups, while group A has 0 counts for all samples. When we perform the DEA using as reference the group "A"

Since the vignette says "estimates are of the logarithmic fold change log2(treated/untreated)" for condition treated vs untreated. I would expect E vs A to be log2(E/A) > 0 in the results group2_E_vs_A object. However, is is negative (-29.72).

Please, am I wrong ? Is my code wrong? Or there is an explanation for that change of signal ?

Thanks!

deseq2 • 1.5k views

ADD COMMENT • link updated 5.9 years ago by Michael Love 41k • written 5.9 years ago by Tiago Chedraoui Silva ▴ 260

score 0 · Answer 1 · 2018-05-30

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 5 hours ago

United States

This is a really useful way to share your code for debugging!

In your code you have set the reference level to "E".

See the DEA section, second chunk, third line.

ADD COMMENT • link 5.9 years ago Michael Love 41k

0

Entering edit mode

Yes. But:

- DEA section, second chunk, third line. I use E as reference

- DEA section, forth chunk, first line. I use A as reference

But E is more expressed than A for gene ST13P12 (section Gene ST13P12, avg.count), and setting E as reference gives me log2FC > 0. It should be the opposite no ? (DEA section, third chunk output)

ADD REPLY • link 5.9 years ago Tiago Chedraoui Silva ▴ 260

0

Entering edit mode

There is a bit of re-ordering going on with all_cell_lines_ordered and MetaData_File, and it's hard for me to keep track. Can you just do

plotCounts(dds, "ST13P12", "groups2")

ADD REPLY • link 5.9 years ago Michael Love 41k

0

Entering edit mode

So the plot is correlated with results. But is it possible that a 0 count for all samples in a group increase if normalized ? It seems the number of samples in each group are correct. http://rpubs.com/tiagochst/392569

ADD REPLY • link 5.9 years ago Tiago Chedraoui Silva ▴ 260

0

Entering edit mode

I think you are saying that the plot goes along with the results table. That's how I see it as well.

This means that there is a sample mixup in how you are preparing the data. I had a hard time following what was going on in these lines, so what I'd recommend is to use very simple R code when you provide counts and metadata to DESeqDataSet constructors, and use checks to make sure these are in the same order.

You can also check by looking at cbind(counts(dds)[gene.id,], dds$group2) to see if it confirms how you think the samples should line up with the counts.

ADD REPLY • link 5.9 years ago Michael Love 41k

0

Entering edit mode

Yes, the meatada in dds has different labels (JHOC5 was in group E in Metadata_table, but in dds it is labeled as C) http://rpubs.com/tiagochst/392569

ADD REPLY • link 5.9 years ago Tiago Chedraoui Silva ▴ 260

1

Entering edit mode

I'm scanning your code and the counts and metadata are not in the same order when you provide them to the DESeqDataSet constructor. You need to make sure these are in the same order.

ADD REPLY • link 5.9 years ago Michael Love 41k

0

Entering edit mode

Thanks! I thought it was going to do the same thing Summarized Experiment does, it checks if rownames and colnames match (even though my code was not going to work since rownames were empty)

ADD REPLY • link 5.9 years ago Tiago Chedraoui Silva ▴ 260

0

Entering edit mode

DESeq2 does check this, and will give an error with a mis-match:

> dds <- DESeqDataSetFromMatrix(
  matrix(1:16,ncol=4,
         dimnames=list(1:4,letters[1:4])),
  data.frame(x=factor(c(1,1,2,2)),
             row.names=letters[c(4,1,2,3)]),
  ~x)
Error in DESeqDataSetFromMatrix: 
  rownames of the colData:
   d,a,b,c 
  are not in the same order as the colnames of the countData:
   a,b,c,d

ADD REPLY • link 5.9 years ago Michael Love 41k

0

Entering edit mode

But that's weird because I did not get that. And the rownames and colnames were different.

> colnames(all_cell_lines_ordered[,-c(1:7)],)
 [1] "IOSE11"    "IOSE4"     "FT246"     "FT33"      "EEC16"     "EFO27"     "GTFR230"   "MCAS"      "OAW42"     "VOA1056"   "CaOV3"     "HEY"       "UWB1_289" 
[14] "Kuramochi" "ES2"       "JHOC5"     "RMGII"     "BT549"     "MCF10A"    "MCF7"     
> rownames(MetaData_File)
 [1] "1"  "2"  "3"  "4"  "5"  "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20"
> dds <- DESeqDataSetFromMatrix(all_cell_lines_ordered[,-c(1:7)],MetaData_File,~groups2)
Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
  some variables in design formula are characters, converting to factors

ADD REPLY • link 5.9 years ago Tiago Chedraoui Silva ▴ 260

1

Entering edit mode

Like SummarizedExperiment, it only works (you get a useful error) if you have rownames on the colData.

Neither DESeq2 nor SummarizedExperiment will complain if they don't exist. DESeq2 will give the SummarizedExperiment error ("assay colnames() must be NULL or equal colData rownames()") if the rownames exist and are entirely different. This is because it calls SummarizedExperiment directly.

DESeq2 will give the above extra useful error if the rownames exist and are the same values but in a wrong order.

ADD REPLY • link 5.9 years ago Michael Love 41k

0

Entering edit mode

Thank you very much for the help!

ADD REPLY • link 5.9 years ago Tiago Chedraoui Silva ▴ 260

0

Entering edit mode

I added the data here: https://drive.google.com/drive/folders/1p5CDMR2OOMCbT2WXRNtNRUrv4f3FhCNo?usp=sharing

ADD REPLY • link 5.9 years ago Tiago Chedraoui Silva ▴ 260