I am doing RNA-seq analysis of DE genes from mare's endometrial biopsies. It is a timecourse experiment with samples at 0h, 24h and 48h across three different stages (follicular, luteal and anoestrous). I am having a hard time trying to figure out the design I should use and also having trouble when using contrast to return the results.
My metadata looks like this:
sample,timepoint,horse,phase,phaseANDtimepoint sample_1.bam,0h,1A,follicular,follicular_0h sample_2.bam,24h,1A,follicular,follicular_24h sample_3.bam,48h,1A,follicular,follicular_48h sample_4.bam,0h,1B,follicular,follicular_0h sample_6.bam,48h,1B,follicular,follicular_48h sample_7.bam,0h,2C,follicular,follicular_0h sample_8.bam,24h,2C,follicular,follicular_24h sample_9.bam,48h,2C,follicular,follicular_48h sample_10.bam,0h,3B,follicular,follicular_0h sample_11.bam,24h,3B,follicular,follicular_24h sample_12.bam,48h,3B,follicular,follicular_48h sample_13.bam,0h,4B,follicular,follicular_0h sample_14.bam,24h,4B,follicular,follicular_24h sample_15.bam,48h,4B,follicular,follicular_48h sample_16.bam,0h,4C,follicular,follicular_0h sample_17.bam,24h,4C,follicular,follicular_24h sample_18.bam,48h,4C,follicular,follicular_48h sample_19.bam,0h,8B,luteal,luteal_0h sample_20.bam,24h,8B,luteal,luteal_24h sample_21.bam,48h,8B,luteal,luteal_48h sample_22.bam,0h,8C,luteal,luteal_0h sample_23.bam,24h,8C,luteal,luteal_24h sample_24.bam,48h,8C,luteal,luteal_48h sample_25.bam,0h,9B,luteal,luteal_0h sample_26.bam,24h,9B,luteal,luteal_24h sample_27.bam,48h,9B,luteal,luteal_48h sample_28.bam,0h,9C,luteal,luteal_0h sample_29.bam,24h,9C,luteal,luteal_24h sample_30.bam,48h,9C,luteal,luteal_48h sample_31.bam,0h,11B,luteal,luteal_0h sample_32.bam,24h,11B,luteal,luteal_24h sample_33.bam,48h,11B,luteal,luteal_48h sample_34.bam,0h,11D,luteal,luteal_0h sample_35.bam,24h,11D,luteal,luteal_24h sample_36.bam,48h,11D,luteal,luteal_48h sample_37.bam,0h,12A,anoestrous,anoestrous_0h sample_38.bam,24h,12A,anoestrous,anoestrous_24h sample_39.bam,48h,12A,anoestrous,anoestrous_48h sample_40.bam,0h,12B,anoestrous,anoestrous_0h sample_41.bam,24h,12B,anoestrous,anoestrous_24h sample_42.bam,48h,12B,anoestrous,anoestrous_48h sample_43.bam,0h,12D,anoestrous,anoestrous_0h sample_44.bam,24h,12D,anoestrous,anoestrous_24h sample_45.bam,48h,12D,anoestrous,anoestrous_48h sample_46.bam,0h,13A,anoestrous,anoestrous_0h sample_47.bam,24h,13A,anoestrous,anoestrous_24h sample_48.bam,48h,13A,anoestrous,anoestrous_48h sample_49.bam,0h,13B,anoestrous,anoestrous_0h sample_50.bam,24h,13B,anoestrous,anoestrous_24h sample_51.bam,48h,13B,anoestrous,anoestrous_48h sample_52.bam,0h,13C,anoestrous,anoestrous_0h sample_53.bam,24h,13C,anoestrous,anoestrous_24h sample_54.bam,48h,13C,anoestrous,anoestrous_48h
At first I did not have the last column (phaseANDtimepoint) but I had to add since I want to retrieve the following comparisons/results:
1) Follicular vs Luteal at 0h, at 24h and at 48h
2) Luteal vs Anoestrous at 0h, at 24h and at 48h
3) Follicular vs Anoestrous at 0h, at 24h and at 48h
My code looks like:
library(GenomicFeatures) txdb_exp2 <- makeTxDbFromGFF("Equus_caballus.EquCab2.89.gtf", format="gtf") ebg_exp2 <- exonsBy(txdb_exp2, by="gene") se_exp2 <- summarizeOverlaps(features=ebg_exp2, reads=bamfiles, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE ) dim(se_exp2) assayNames(se_exp2) colSums(assay(se_exp2)) rowRanges(se_exp2) sampleTableExp2 <- read.csv("exp2_metadata.csv") colData(se_exp2) <- DataFrame(sampleTableExp2) library(DESeq2)
at first, I have done:
dds_exp2 <- DESeqDataSet(se_exp2, design = ~ horse + timepoint) dds_exp2 <- DESeq(dds_exp2) resultsNames(dds_exp2)
but when trying to get the results between phases at a certain time point:
> res_exp2_follicular0h_luteal0h <- results(dds_exp2, contrast=c("phaseANDtimepoint", "follicular_0h", "luteal_0h")) Error in cleanContrast(object, contrast, expanded = isExpanded, listValues = listValues, : follicular_0h and luteal_0h should be levels of phaseANDtimepoint such that phaseANDtimepoint_follicular_0h_vs_anoestrous_0h and phaseANDtimepoint_luteal_0h_vs_anoestrous_0h are contained in 'resultsNames(object)'
then I realised I had to include in the design the 'phaseANDtimepoint' info and I tried:
dds_exp2_phaseANDtimepoint <- DESeqDataSet(se_exp2, design = ~ horse + phaseANDtimepoint) dds_exp2_phaseANDtimepoint <- DESeq(dds_exp2_phaseANDtimepoint) resultsNames(dds_exp2_phaseANDtimepoint)
> dds_exp2_phaseANDtimepoint <- DESeqDataSet(se_exp2, design = ~ horse + phaseANDtimepoint) Error in checkFullRank(modelMatrix) : the model matrix is not full rank, so the model cannot be fit as specified. One or more variables or interaction terms in the design formula are linear combinations of the others and must be removed. Please read the vignette section 'Model matrix not full rank': vignette('DESeq2')
I do understand that my phaseANDtimepoint column repeats the info contained in the other columns, however I cannot get rid of them - the "phase" and "timepoint" columns from the metadata are needed to retrieve the PCA graph plotting my data the way I need.
Ps.: I also thought about using the LRT analysis
ddsLRT_exp2 <- DESeqDataSet(se_exp2, design = ~ horse + timepoint) ddsLRT_exp2 <- estimateSizeFactors(ddsLRT_exp2) ddsLRT_exp2 <- estimateDispersions(ddsLRT_exp2) ddsLRT_exp2 <- nbinomLRT(ddsLRT_exp2, reduced = ~ horse) resLRT_exp2 <- results(ddsLRT_exp2) summary(resLRT_exp2) write.csv(resLRT_exp2, "resLRT_exp2.csv")
ddsLRT_phaseANDtimepoint_exp2 <- DESeqDataSet(se_exp2, design = ~ horse + phaseANDtimepoint) ddsLRT_phaseANDtimepoint_exp2 <- estimateSizeFactors(ddsLRT_phaseANDtimepoint_exp2) ddsLRT_phaseANDtimepoint_exp2 <- estimateDispersions(ddsLRT_phaseANDtimepoint_exp2) ddsLRT_phaseANDtimepoint_exp2 <- nbinomLRT(ddsLRT_phaseANDtimepoint_exp2, reduced = ~ horse) resLRT_phaseANDtimepoint_exp2 <- results(ddsLRT_phaseANDtimepoint_exp2) summary(resLRT_phaseANDtimepoint_exp2) write.csv(resLRT_phaseANDtimepoint_exp2, "resLRT_phaseANDtimepoint_exp2.csv")
but it did not work either and I would have only one result. Since I want to have different results comparing the phases at the different timepoints I do not think that is the best option, is it?