I have created a multifactor model with 3 factors: patient(patients A to I, to account for biological differences), condition(levels "Other_CD4", "VbOther" and "Clone", in that order) and timepoint (T1 and T2).
The comparisons I am interested in are really "Other CD4" to "VbOther" and "VbOther" to "Clone", as each represents a different stage of cellular differentiation.
I made a dataframe with all the factors in and performed differential expression (dds). I also made a PCA which did not look like there were any differences in timepoint between samples from T1 and T2, in all conditions.
I made a model with the design ~patient + condition + timepoint (dds), and the reduced design ~patient + condition (dds_reduced).
The LRT suggested that there are no genes have a padj < 0.1 in the additive model, suggesting to me that timepoint is not significant.
An example of the LRT output is:
>results_dds
log2 fold change (MLE): timepoint T2 vs T1
LRT p-value: '~ patient + condition + timepoint' vs '~ patient + condition'
DataFrame with 18516 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue padj
<numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
A1BG 96.891126 -0.0251046 0.250933 0.00990849 0.920708 1
A1BG-AS1 125.388293 -0.1924045 0.293704 0.43214358 0.510939 1
A2M 1.614425 -0.0614295 0.995438 0.00349552 0.952854 1
A2M-AS1 37.412347 -0.4648244 0.344586 1.79961673 0.179759 1
A2ML1 0.936242 1.0015319 3.732651 0.10950344 0.740710 1
... ... ... ... ... ... ...
ZYG11A 2.02023 -0.2401266 0.9370991 0.0727242 0.787411 1
ZYG11B 858.54543 -0.0565903 0.0940639 0.3585213 0.549329 1
ZYX 4428.36880 0.0223657 0.1067015 0.0437798 0.834264 1
ZZEF1 5559.62750 -0.0890766 0.0607072 2.1439690 0.143131 1
ZZZ3 965.87893 -0.0168050 0.0704468 0.0567291 0.811742 1
I cannot however seem to extract the LFC and padj for the comparison of conditions in dds_reduced, eg between Other_CD4 and VBOther.
Having demonstrated as above that timepoint is not important, is it statistically acceptable to recreate the table, but without timepoint as a factor eg:
DESeq2Table2 <- DESeqDataSetFromHTSeqCount(sampleTable = mysampleTable, directory = "H:/R/htseq_counts_experiment_5/", design = ~patient + condition, ignoreRank = FALSE)
dds_simple <- DESeq(DESeq2Table2)
or is there some way I can extract the original LFC values from dds_reduced?
Thank you!
library(tidyverse)
library(DESeq2)
library("RColorBrewer")
library("gplots")
#this sets the working directory
setwd("H:/R/htseq_counts_experiment_5/")
#note- the files all have a 0 in the header line- this needs manually removing before the table can be made
#This uses a metadata file with sample info
metadata_5 <- read.csv("H:/R/htseq_counts_experiment_5/metadata_5.csv")
metadata_5 <-as.data.frame(metadata_5)
#column1 name comes out bit weird- this changes the name
names(metadata_5)[1] <- "SampleName"
#this adds in a column to say what the file name is in the metadata file
metadata_5 = mutate(metadata_5, countFile = paste0(metadata_5$SampleName, ".stranded-counts.txt"))
metadata_5 <- lapply(metadata_5, as.character)
#this works and makes the sample table
mysampleTable <- data.frame(sampleName = metadata_5$SampleName, fileName = metadata_5$countFile, condition = metadata_5$Condition, patient = metadata_5$Subject, timepoint = metadata_5$Timepoint)
view(mysampleTable)
#this is the design
additive.model <- as.formula(~ patient + condition + timepoint)
#this makes the DESeq data set Remember to change the directory here!
DESeq2Table <- DESeqDataSetFromHTSeqCount(sampleTable = mysampleTable, directory = "H:/R/htseq_counts_experiment_5/", design = ~patient + condition + timepoint, ignoreRank = FALSE)
#this sets the factor levels
DESeq2Table$condition <- factor(DESeq2Table$condition,
levels=c("Other_CD4", "VbOther","Clone"))
#this shows the size of the table
dim(DESeq2Table)
#this line gets rid of anything with 5 reads or less
keep <- rowSums(counts(DESeq2Table)) > 5
DESeq2Table <- DESeq2Table[keep,]
#this shows the size of the table after removal of low counts
dim(DESeq2Table)
#this is the differential expression analysis and tells you the columns and the design
dds <- DESeq(DESeq2Table)
results_dds <-results(dds)
res.Other_CD4.VbOther <- results(dds, contrast=c("condition","VbOther","Other_CD4"))
res.Other_CD4.Clone <- results(dds, contrast=c("condition","Clone","Other_CD4"))
res.VbOther.Clone <- results(dds, contrast=c("condition","Clone","VbOther"))
#this shows the numbers of genes upregulated and downregulated
summary(res.VbOther.Clone)
summary(res.Other_CD4.VbOther)
sum ( res.Other_CD4.VbOther$padj < 0.1, na.rm=TRUE)
#the output is 7418 (this is the some of the up and down regulated in the above summary)
sum (res.VbOther.Clone$padj < 0.1, na.rm=TRUE)
#the output is 2917
#to make a reduced forumla we want to make a nbinomLRT. The results table shows the difference between the additive model and the reduced model.
reduced.model <- as.formula(~patient + condition)
ddsreduced <- DESeq(dds, test="LRT", reduced= reduced.model)
res_dds_reduced <-results(ddsreduced)
res_dds_reduced
#note the p-value is the difference between the reduced model and the full model
resLRT.Other_CD4.VbOther <- results(ddsreduced, contrast=c("condition","VbOther","Other_CD4"))
resLRT.Other_CD4.Clone <- results(ddsreduced, contrast=c("condition","Clone","Other_CD4"))
resLRT.VbOther.Clone <- results(ddsreduced, contrast=c("condition","Clone","VbOther"))
#this shows the numbers of genes upregulated and downregulated
summary(resLRT.VbOther.Clone)
summary(resLRT.Other_CD4.VbOther)
#this suggests there are no differences in the groups between T1 and T2 for any genes in any combination of groups
#this makes a new data set Remember to change the directory here!
DESeq2Table2 <- DESeqDataSetFromHTSeqCount(sampleTable = mysampleTable, directory = "H:/R/htseq_counts_experiment_5/", design = ~patient + condition, ignoreRank = FALSE)
#this sets the factor levels
DESeq2Table2$condition <- factor(DESeq2Table$condition,
levels=c("Other_CD4", "VbOther","Clone"))
#this is the differential expression analysis and tells you the columns and the design
dds_simple <- DESeq(DESeq2Table2)
ressimple.Other_CD4.VbOther <- results(dds_simple, contrast=c("condition","VbOther","Other_CD4"))
ressimple.Other_CD4.Clone <- results(dds_simple, contrast=c("condition","Clone","Other_CD4"))
ressimple.VbOther.Clone <- results(dds_simple, contrast=c("condition","Clone","VbOther"))
#this shows the numbers of genes upregulated and downregulated
summary(ressimple.VbOther.Clone)
summary(ressimple.Other_CD4.VbOther)
Thank you!