Question: differential expression analysis using deseq2
0
gravatar for pokharel.kisun
3.9 years ago by
pokharel.kisun0 wrote:

Hi,

I am doing DE analysis on three breed groups (FS, TX, F1) with two diet conditions (C, F, where C is the normal diet and F is the additional diet). I want to find DE genes between breeds (FS vs TX, FS vs F1 and TX vs F1) both with diet effect and without diet effect. In addition, I want to find that are differentially expressed within breed due to diet (FSc vs FSf, TXc vs TXf, F1c vs F1f). I designed the experiment like this:

sampleFiles<-list.files(directory, pattern = "*_counts")
sampleinfo<-read.csv("SampleTable_mRNA_all.csv", header=TRUE)
row.names(sampleinfo)
sample<-sampleinfo$Sample
breed<-sampleinfo$Breed
diet<-sampleinfo$Diet
id<-sampleinfo$SampleID
sampletable<-data.frame(sampleName=sample, fileName=sampleFiles,breed=breed, diet=diet, id=id)
group<-factor(paste(breed, diet, sep="."))
sampletable<-cbind(sampletable, group=group)
ddsMat<-DESeqDataSetFromHTSeqCount(sampleTable = sampletable, directory = directory, design=~group)
ddsMatcol<-collapseReplicates(ddsMat, sampletable$id)
dds<-DESeq(ddsMatcol)
res<-results(dds)

Then I made comparisions like this:

For example, within-breed diet effect on TX:
dTX_Diet<-results(dds, contrast=c("group", "TX.F", "TX.C"))
dTX_Diet_sig<-subset(dTX_Diet, padj< 0.05)

similarly, for identifying DE genes between FS and TX due to diet effect

dFS_TX<-results(dds, contrast=list(c("groupFS.F", "groupTX.C"), c("groupFS.C", "groupTX.F")))
dFS_TX_sig<-subset(dFS_TX, padj<0.05)

Could anybody please tell me if I would rather need to use LRT method?

In addition, I am interested to find DE genes in FS vs TX, FS vs F1 and TX vs F1 without including diet factor, in this case, how can I exclude the diet effect or the only way is to compare is that between FSc vs TXc and so on? 

Kisun

 

ADD COMMENTlink modified 3.9 years ago by Michael Love26k • written 3.9 years ago by pokharel.kisun0
Answer: differential expression analysis using deseq2
2
gravatar for Michael Love
3.9 years ago by
Michael Love26k
United States
Michael Love26k wrote:

hi Kisun,

The way you have set up the design is good for comparing pairs of groups, for example, comparing within breed F vs C is correct as you have it.

Also the difference of differences contrast looks correct to me, testing the diet effect across groups, (FS.F - FS.C) - (TX.F - TX.C), by rearranging to: FS.F + TX.C - (FS.C + TX.F).

In DESeq2, we use Wald tests for such contrasts, so your current approach is what I would recommend.

If you wanted to find DE genes across groups controlling for diet but not specifically testing across groups within one diet, then I would run DESeq() with a separate design:

~ diet + breed

And then test on breed, using, e.g. contrast=c("breed","X","Y").

The reason I would re-run with this design, is that typically, if I read that a model "controlled for diet" I would assume that it involves less terms than your former design. I would assume that you meant that the diet effect was modeled as a single multiplicative fold change for each gene, but that it does not interact with breed group.

ADD COMMENTlink written 3.9 years ago by Michael Love26k

Thanks a lot Mike. I am still confused with the later design... I found around 200 genes that are differentially expressed between FS and TX, can't there be some genes which were differentially expressed due to diet? One of the breed is responsive to diet while the other is not. I have 10 samples (4 normal diet + 6 flushing diet) from FS and 11 samples (5 normal diet + 6 flushing diet) from TX. Within breed diet effect showed that FS is not responsive to diet (i.e 0 de genes) whereas TX is responsive to diet (many de genes). Can I interpret results like this? Let's say,

FS vs TX (without considering diet effect) = 200

FS vs TX (taking diet as second factor) = 40

In order to find DE genes between breeds, I will simply take out those genes that were present in 40 from 200. It might not be the case that all 40 are also present in 200 but in a way it will be 200 - 40. Do you think this is a reasonable way to find DE genes between breeds that has nothing to do with diet conditions?

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by pokharel.kisun0
Ah I see now. Ignore my second design suggestion above. This sounds like you should use the first design and test FSc vs TXc.
ADD REPLYlink written 3.9 years ago by Michael Love26k

Hello Mike, I got the results but unable to figure out what the results are really trying to tell. The first comparision shows that there are very few differences between breeds due to diet differences and also the within-breed comparision in FS shows there is not diet effect in FS. However, FS.F vs TX.F shows that quite many genes are differentially expressed which included 117 genes upregulated in FS. I would be grateful if you could help me explain this:

FSvsTX(diet) = (FS.F - FS.C) - (TX.F - TX.C) = 35 (4up, 31down)

FS(diet) = FS.F - FS.C = 0

TX(diet) = TX.F - TX.C = 117 (71up, 46down)

Similarly, 

FSvsTX(Flu) = FS.F - TX.F = 621 (117 up, 504 down)

FSvsTX(Con) = FS.C - TX.C = 3 (1up, 2 down)

Thanks!

ADD REPLYlink written 3.9 years ago by pokharel.kisun0

I'd suggest you speak to someone locally for further assistance in interpreting results.

While I'm available as a software developer for supporting the software (if there are bugs, or if the documentation is not clear), and additionally to explain how to set up design formula and which contrasts can be used for various comparisons, the interpretations of results must be left to the analyst.

ADD REPLYlink written 3.9 years ago by Michael Love26k

Hi Mike, finally I have been able to understand those results. Would it be possible to make a PCA plot for the subsets such as FS.F and TX.F? I am particularly interested to see the clustering for contrasts with many DE genes.

ADD REPLYlink written 3.8 years ago by pokharel.kisun0
1

Yes just subset the dds using column indexing:

dds2 <- dds[ , dds$group == ... ]

And then continue with the PCA plot instructions in the vignette or workflow.

ADD REPLYlink written 3.8 years ago by Michael Love26k

Thanks! I slightly modified to include more than one groups.

dds_fFS_TX <- dds[ , dds$group %in% c("FS.F" ,"TX.F")]
ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by pokharel.kisun0

Hi Mike, 

Has there been any change/update in DESeq2 recently because of which I am unable to set up the contrast I mentioned in my question? 

dFS_TX<-results(dds, contrast=list(c("groupFS.F", "groupTX.C"), c("groupFS.C", "groupTX.F")))

It was working fine until about a month ago and suddenly I get error 

Error in checkContrast(contrast, resNames) :
  all elements of the contrast as a list of length 2 should be elements of 'resultsNames(object) 

Do you have any idea why I am getting this error or any solution to this? Thanks a lot. Kisun

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by pokharel.kisun0
1

hi,

Yes, we made an update to the default DESeq() steps regarding the way the coefficients are estimated. Any big change like this we put into the NEWS file, which you can find under Documentation on the package landing page, and I wrote a support post about it:

New function lfcShrink() in DESeq2

The short answer is: you can reproduce your old pipeline by running DESeq(dds, betaPrior=TRUE).

The longer answer is: for this comparison: whether the F/C ratio is different across FS and TX, I would probably set up an interaction model (where group is FS or TX and condition is F or C). It's in theory equivalent to your old code, although slightly different formulation and so don't expect identical results table, but this would be the way to do it using the new steps (edit: this last line won't work with the current release...)

dds <- DESeqDataSet(se, design=~group + condition + group:condition)
dds <- DESeq(dds)
res <- results(dds)
res <- lfcShrink(dds, coef=4, res=res)

 

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by Michael Love26k
dds<-DESeqDataSetFromMatrix(countData=geneCounts, colData=sampledata, design = ~Breed+Diet+Breed:Diet)
dds<-DESeq(dds, parallel = T)
res<-results(dds)
res<-lfcShrink(dds, coef = 4, res = res)
Error in designAndArgChecker(object, betaPrior) : 
  betaPrior=FALSE should be used for designs with interactions

colData looks like this:

DataFrame with 21 rows and 4 columns
          sampleid    Breed     Diet sizeFactor
          <factor> <factor> <factor>  <numeric>
TXC_1033      1033       TX        C  1.0770786
TXC_302        302       TX        C  1.1150486

 

I am using

R version 3.4.0 (2017-04-21) and DESeq2_1.16.1  

Do you know why I am getting this error?

 

ADD REPLYlink modified 2.5 years ago • written 2.5 years ago by pokharel.kisun0

Thanks for this report. Sorry, I forgot that I hadn't implemented use of lfcShrink on interaction terms in the current release. I hope to have this in the next release by October.

So then your options are (1) to use your existing code but just add betaPrior=TRUE when you call DESeq(), or (2) to omit the lfcShrink() line in the above code, and so the tests, p-values are fine but the LFC are the maximum likelihood estimates, not the shrunken ones.

ADD REPLYlink written 2.5 years ago by Michael Love26k

I will go with first option for now. :) 

ADD REPLYlink written 2.5 years ago by pokharel.kisun0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 139 users visited in the last hour