Search
Question: RNA-seq time course experiment - DESEq2 Contrasts
0
gravatar for Maithê Barros
12 weeks ago by
Maithê Barros0 wrote:

Hello,

I am doing RNA-seq analysis of DE genes from mare's endometrial biopsies. It is a timecourse experiment with samples at 0h, 24h and 48h across three different stages (follicular, luteal and anoestrous). I am having a hard time trying to figure out the design I should use and also having trouble when using contrast to return the results. 

My metadata looks like this:

sample,timepoint,horse,phase,phaseANDtimepoint
sample_1.bam,0h,1A,follicular,follicular_0h
sample_2.bam,24h,1A,follicular,follicular_24h
sample_3.bam,48h,1A,follicular,follicular_48h
sample_4.bam,0h,1B,follicular,follicular_0h
sample_6.bam,48h,1B,follicular,follicular_48h
sample_7.bam,0h,2C,follicular,follicular_0h
sample_8.bam,24h,2C,follicular,follicular_24h
sample_9.bam,48h,2C,follicular,follicular_48h
sample_10.bam,0h,3B,follicular,follicular_0h
sample_11.bam,24h,3B,follicular,follicular_24h
sample_12.bam,48h,3B,follicular,follicular_48h
sample_13.bam,0h,4B,follicular,follicular_0h
sample_14.bam,24h,4B,follicular,follicular_24h
sample_15.bam,48h,4B,follicular,follicular_48h
sample_16.bam,0h,4C,follicular,follicular_0h
sample_17.bam,24h,4C,follicular,follicular_24h
sample_18.bam,48h,4C,follicular,follicular_48h
sample_19.bam,0h,8B,luteal,luteal_0h
sample_20.bam,24h,8B,luteal,luteal_24h
sample_21.bam,48h,8B,luteal,luteal_48h
sample_22.bam,0h,8C,luteal,luteal_0h
sample_23.bam,24h,8C,luteal,luteal_24h
sample_24.bam,48h,8C,luteal,luteal_48h
sample_25.bam,0h,9B,luteal,luteal_0h
sample_26.bam,24h,9B,luteal,luteal_24h
sample_27.bam,48h,9B,luteal,luteal_48h
sample_28.bam,0h,9C,luteal,luteal_0h
sample_29.bam,24h,9C,luteal,luteal_24h
sample_30.bam,48h,9C,luteal,luteal_48h
sample_31.bam,0h,11B,luteal,luteal_0h
sample_32.bam,24h,11B,luteal,luteal_24h
sample_33.bam,48h,11B,luteal,luteal_48h
sample_34.bam,0h,11D,luteal,luteal_0h
sample_35.bam,24h,11D,luteal,luteal_24h
sample_36.bam,48h,11D,luteal,luteal_48h
sample_37.bam,0h,12A,anoestrous,anoestrous_0h
sample_38.bam,24h,12A,anoestrous,anoestrous_24h
sample_39.bam,48h,12A,anoestrous,anoestrous_48h
sample_40.bam,0h,12B,anoestrous,anoestrous_0h
sample_41.bam,24h,12B,anoestrous,anoestrous_24h
sample_42.bam,48h,12B,anoestrous,anoestrous_48h
sample_43.bam,0h,12D,anoestrous,anoestrous_0h
sample_44.bam,24h,12D,anoestrous,anoestrous_24h
sample_45.bam,48h,12D,anoestrous,anoestrous_48h
sample_46.bam,0h,13A,anoestrous,anoestrous_0h
sample_47.bam,24h,13A,anoestrous,anoestrous_24h
sample_48.bam,48h,13A,anoestrous,anoestrous_48h
sample_49.bam,0h,13B,anoestrous,anoestrous_0h
sample_50.bam,24h,13B,anoestrous,anoestrous_24h
sample_51.bam,48h,13B,anoestrous,anoestrous_48h
sample_52.bam,0h,13C,anoestrous,anoestrous_0h
sample_53.bam,24h,13C,anoestrous,anoestrous_24h
sample_54.bam,48h,13C,anoestrous,anoestrous_48h

 

At first I did not have the last column (phaseANDtimepoint) but I had to add since I want to retrieve the following comparisons/results:

1) Follicular vs Luteal at 0h, at 24h and at 48h

2) Luteal vs Anoestrous  at 0h, at 24h and at 48h

3) Follicular vs Anoestrous at 0h, at 24h and at 48h

My code looks like: 

library(GenomicFeatures)
txdb_exp2 <- makeTxDbFromGFF("Equus_caballus.EquCab2.89.gtf", format="gtf")
ebg_exp2 <- exonsBy(txdb_exp2, by="gene")
se_exp2 <- summarizeOverlaps(features=ebg_exp2, reads=bamfiles,
                             mode="Union",
                             singleEnd=FALSE,
                             ignore.strand=TRUE,
                             fragments=TRUE )
dim(se_exp2)
assayNames(se_exp2)
colSums(assay(se_exp2))
rowRanges(se_exp2)
sampleTableExp2 <- read.csv("exp2_metadata.csv")
colData(se_exp2) <- DataFrame(sampleTableExp2)
library(DESeq2)

at first, I have done:

dds_exp2 <- DESeqDataSet(se_exp2, design = ~ horse + timepoint)
dds_exp2 <- DESeq(dds_exp2)
resultsNames(dds_exp2)

but when trying to get the results between phases at a certain time point:

> res_exp2_follicular0h_luteal0h <- results(dds_exp2, contrast=c("phaseANDtimepoint", "follicular_0h", "luteal_0h"))
Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues,  : 
  follicular_0h and luteal_0h should be levels of phaseANDtimepoint such that phaseANDtimepoint_follicular_0h_vs_anoestrous_0h and phaseANDtimepoint_luteal_0h_vs_anoestrous_0h are contained in 'resultsNames(object)'

then I realised I had to include in the design the 'phaseANDtimepoint' info and I tried:

dds_exp2_phaseANDtimepoint <- DESeqDataSet(se_exp2, design = ~ horse + phaseANDtimepoint)
dds_exp2_phaseANDtimepoint <- DESeq(dds_exp2_phaseANDtimepoint)
resultsNames(dds_exp2_phaseANDtimepoint)
> dds_exp2_phaseANDtimepoint <- DESeqDataSet(se_exp2, design = ~ horse + phaseANDtimepoint)
Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.
Please read the vignette section 'Model matrix not full rank': vignette('DESeq2')

I do understand that my phaseANDtimepoint column repeats the info contained in the other columns, however I cannot get rid of them - the "phase" and "timepoint" columns from the metadata are needed to retrieve the PCA graph plotting my data the way I need. 

Ps.: I also thought about using the LRT analysis

ddsLRT_exp2 <- DESeqDataSet(se_exp2, design = ~ horse + timepoint)
ddsLRT_exp2 <- estimateSizeFactors(ddsLRT_exp2)
ddsLRT_exp2 <- estimateDispersions(ddsLRT_exp2)
ddsLRT_exp2 <- nbinomLRT(ddsLRT_exp2, reduced = ~ horse)
resLRT_exp2 <- results(ddsLRT_exp2)
summary(resLRT_exp2)
write.csv(resLRT_exp2, "resLRT_exp2.csv")

OR 

ddsLRT_phaseANDtimepoint_exp2 <- DESeqDataSet(se_exp2, design = ~ horse + phaseANDtimepoint)
ddsLRT_phaseANDtimepoint_exp2 <- estimateSizeFactors(ddsLRT_phaseANDtimepoint_exp2)
ddsLRT_phaseANDtimepoint_exp2 <- estimateDispersions(ddsLRT_phaseANDtimepoint_exp2)
ddsLRT_phaseANDtimepoint_exp2 <- nbinomLRT(ddsLRT_phaseANDtimepoint_exp2, reduced = ~ horse)
resLRT_phaseANDtimepoint_exp2 <- results(ddsLRT_phaseANDtimepoint_exp2)
summary(resLRT_phaseANDtimepoint_exp2)
write.csv(resLRT_phaseANDtimepoint_exp2, "resLRT_phaseANDtimepoint_exp2.csv")

 

but it did not work either and I would have only one result. Since I want to have different results comparing the phases at the different timepoints I do not think that is the best option, is it?

Many thanks!

 

 

ADD COMMENTlink modified 12 weeks ago by Michael Love20k • written 12 weeks ago by Maithê Barros0
0
gravatar for Michael Love
12 weeks ago by
Michael Love20k
United States
Michael Love20k wrote:

The second most recent question is very related:

A: DESeq2 design nested or time course

You cannot control for individual (in this case horse) and compare across condition for the same time point.

If you want to compare across condition you have to drop horse from the design, or you can instead change your questions to look at change over time, compared to the first time point, across condition, as we discuss in the other recent thread.

 

ADD COMMENTlink modified 12 weeks ago • written 12 weeks ago by Michael Love20k

I have changed my metadata to:

sample,horse,phaseANDtimepoint
sample_1.bam,1A,follicular_0h
sample_2.bam,1A,follicular_24h
sample_3.bam,1A,follicular_48h
sample_4.bam,1B,follicular_0h
sample_6.bam,1B,follicular_48h
sample_7.bam,2C,follicular_0h
sample_8.bam,2C,follicular_24h
sample_9.bam,2C,follicular_48h
sample_10.bam,3B,follicular_0h
sample_11.bam,3B,follicular_24h
sample_12.bam,3B,follicular_48h
sample_13.bam,4B,follicular_0h
sample_14.bam,4B,follicular_24h
sample_15.bam,4B,follicular_48h
sample_16.bam,4C,follicular_0h
sample_17.bam,4C,follicular_24h
sample_18.bam,4C,follicular_48h
sample_19.bam,8B,luteal_0h
sample_20.bam,8B,luteal_24h
sample_21.bam,8B,luteal_48h
sample_22.bam,8C,luteal_0h
sample_23.bam,8C,luteal_24h
sample_24.bam,8C,luteal_48h
sample_25.bam,9B,luteal_0h
sample_26.bam,9B,luteal_24h
sample_27.bam,9B,luteal_48h
sample_28.bam,9C,luteal_0h
sample_29.bam,9C,luteal_24h
sample_30.bam,9C,luteal_48h
sample_31.bam,11B,luteal_0h
sample_32.bam,11B,luteal_24h
sample_33.bam,11B,luteal_48h
sample_34.bam,11D,luteal_0h
sample_35.bam,11D,luteal_24h
sample_36.bam,11D,luteal_48h
sample_37.bam,12A,anoestrous_0h
sample_38.bam,12A,anoestrous_24h
sample_39.bam,12A,anoestrous_48h
sample_40.bam,12B,anoestrous_0h
sample_41.bam,12B,anoestrous_24h
sample_42.bam,12B,anoestrous_48h
sample_43.bam,12D,anoestrous_0h
sample_44.bam,12D,anoestrous_24h
sample_45.bam,12D,anoestrous_48h
sample_46.bam,13A,anoestrous_0h
sample_47.bam,13A,anoestrous_24h
sample_48.bam,13A,anoestrous_48h
sample_49.bam,13B,anoestrous_0h
sample_50.bam,13B,anoestrous_24h
sample_51.bam,13B,anoestrous_48h
sample_52.bam,13C,anoestrous_0h
sample_53.bam,13C,anoestrous_24h
sample_54.bam,13C,anoestrous_48h

and was trying to do the following which is the same as you advised me to do when I posted a question the other day (https://support.bioconductor.org/p/113148/)

dds_exp2 <- DESeqDataSet(se_exp2, design = ~ horse + phaseANDtimepoint)
dds_exp2 <- DESeq(dds_exp2)
resultsNames(dds_exp2)

Then I would do contrasts like:

res_exp2_follicular0h_luteal0h <- results(dds_exp2, contrast=c("phaseANDtimepoint", "follicular_0h", "luteal_0h"))
res_exp2_follicular0h_anoestrousl0h <- results(dds_exp2, contrast=c("phaseANDtimepoint", "follicular_0h", "anoestrous_0h"))
res_exp2_luteal0h_anoestrous0h <- results(dds_exp2, contrast=c("phaseANDtimepoint", "luteal_0h", "anoestrous_0h"))

but I keep receiving the following warning:

Error in checkFullRank(modelMatrix) : 
  the model matrix is not full rank, so the model cannot be fit as specified.
  One or more variables or interaction terms in the design formula are linear
  combinations of the others and must be removed.

I was reading more about it and reading other questions here on the website and I think I spotted the problem: I had to exclude sample 5 from my analysis because something went wrong during the RNA-seq and we could not afford to rerun it!

Therefore one of my biological replicates (horse 1B) has only samples at 0h and 48h (and does not have at 24h!). Do you think that can be the problem? If so, is there something I can do to run the analysis excluding only this samples? I am not willing to exclude this biological replicate from the analysis (we only have n=6 for each phase and we do not want to go down to 5 horses). If I exclude one horse from the "follicular" group I will have to exclude one from the "luteal" and one from the "anoestrous" phase too and that would be a problem for me!

Many thanks for your reply and for your time, Michael! You are always so helpful!

 

 

 
ADD REPLYlink written 12 weeks ago by Maithê Barros0

My answer above still stands. You are trying to control for horse and compare directly across groups. These are confounded.

ADD REPLYlink written 12 weeks ago by Michael Love20k

Many thanks for your reply, Michael. I had to step back from this analysis, but now I am back to it.

If I perform the analysis without controlling for horses, would that be correct? Taking into account that each group (follicular, luteal and anoestrous) have different horses?!

If so, would the correct design be:

dds_exp2 <- DESeqDataSet(se_exp2, design = ~ timepoint + phase)
dds_exp2 <- DESeq(dds_exp2)
resultsNames(dds_exp2)

?

 

ADD REPLYlink written 8 weeks ago by Maithê Barros0

I'm not suggesting to not control for horse, I'm saying that you can't control for horse with fixed effects and compare across two groups with different horses.

There is a different approach entirely in the limma-voom framework, which models correlations among samples, called duplicateCorrelation(). You may want to try that approach instead if you need to compare across groups where the groups have entirely different subjects, and you want to control for the subject baseline.

ADD REPLYlink written 8 weeks ago by Michael Love20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 330 users visited in the last hour