Extracting the LFC from a reduced model
1
0
Entering edit mode
Sonia • 0
@2f3f6904
Last seen 3.0 years ago
United Kingdom

I have created a multifactor model with 3 factors: patient(patients A to I, to account for biological differences), condition(levels "Other_CD4", "VbOther" and "Clone", in that order) and timepoint (T1 and T2).

The comparisons I am interested in are really "Other CD4" to "VbOther" and "VbOther" to "Clone", as each represents a different stage of cellular differentiation.

I made a dataframe with all the factors in and performed differential expression (dds). I also made a PCA which did not look like there were any differences in timepoint between samples from T1 and T2, in all conditions.

I made a model with the design ~patient + condition + timepoint (dds), and the reduced design ~patient + condition (dds_reduced).

The LRT suggested that there are no genes have a padj < 0.1 in the additive model, suggesting to me that timepoint is not significant.

An example of the LRT output is:

>results_dds
log2 fold change (MLE): timepoint T2 vs T1 
LRT p-value: '~ patient + condition + timepoint' vs '~ patient + condition' 
DataFrame with 18516 rows and 6 columns
           baseMean log2FoldChange     lfcSE       stat    pvalue      padj
          <numeric>      <numeric> <numeric>  <numeric> <numeric> <numeric>
A1BG      96.891126     -0.0251046  0.250933 0.00990849  0.920708         1
A1BG-AS1 125.388293     -0.1924045  0.293704 0.43214358  0.510939         1
A2M        1.614425     -0.0614295  0.995438 0.00349552  0.952854         1
A2M-AS1   37.412347     -0.4648244  0.344586 1.79961673  0.179759         1
A2ML1      0.936242      1.0015319  3.732651 0.10950344  0.740710         1
...             ...            ...       ...        ...       ...       ...
ZYG11A      2.02023     -0.2401266 0.9370991  0.0727242  0.787411         1
ZYG11B    858.54543     -0.0565903 0.0940639  0.3585213  0.549329         1
ZYX      4428.36880      0.0223657 0.1067015  0.0437798  0.834264         1
ZZEF1    5559.62750     -0.0890766 0.0607072  2.1439690  0.143131         1
ZZZ3      965.87893     -0.0168050 0.0704468  0.0567291  0.811742         1

I cannot however seem to extract the LFC and padj for the comparison of conditions in dds_reduced, eg between Other_CD4 and VBOther.

Having demonstrated as above that timepoint is not important, is it statistically acceptable to recreate the table, but without timepoint as a factor eg:

DESeq2Table2 <- DESeqDataSetFromHTSeqCount(sampleTable = mysampleTable, directory = "H:/R/htseq_counts_experiment_5/", design = ~patient + condition, ignoreRank = FALSE)

dds_simple <- DESeq(DESeq2Table2)

or is there some way I can extract the original LFC values from dds_reduced?

Thank you!

library(tidyverse)
library(DESeq2)

library("RColorBrewer")
library("gplots")

#this sets the working directory
setwd("H:/R/htseq_counts_experiment_5/")

#note- the files all have a 0 in the header line- this needs manually removing before the table can be made

#This uses a metadata file with sample info 
metadata_5 <- read.csv("H:/R/htseq_counts_experiment_5/metadata_5.csv")
metadata_5 <-as.data.frame(metadata_5)

#column1 name comes out bit weird- this changes the name
names(metadata_5)[1] <- "SampleName"

#this adds in a column to say what the file name is in the metadata file
metadata_5 = mutate(metadata_5, countFile = paste0(metadata_5$SampleName, ".stranded-counts.txt"))
metadata_5 <- lapply(metadata_5, as.character)

#this works and makes the sample table
mysampleTable <- data.frame(sampleName = metadata_5$SampleName, fileName = metadata_5$countFile, condition = metadata_5$Condition, patient = metadata_5$Subject, timepoint = metadata_5$Timepoint)
view(mysampleTable)

#this is the design
additive.model <- as.formula(~ patient + condition + timepoint)

#this makes the DESeq data set Remember to change the directory here!
DESeq2Table <- DESeqDataSetFromHTSeqCount(sampleTable = mysampleTable, directory = "H:/R/htseq_counts_experiment_5/", design = ~patient + condition + timepoint, ignoreRank = FALSE)

#this sets the factor levels
DESeq2Table$condition <- factor(DESeq2Table$condition,
                                levels=c("Other_CD4", "VbOther","Clone"))

#this shows the size of the table
dim(DESeq2Table)

#this line gets rid of anything with 5 reads or less 
keep <- rowSums(counts(DESeq2Table)) > 5
DESeq2Table <- DESeq2Table[keep,]

#this shows the size of the table after removal of low counts
dim(DESeq2Table)

#this is the differential expression analysis and tells you the columns and the design 
dds <- DESeq(DESeq2Table)


results_dds <-results(dds)

res.Other_CD4.VbOther <- results(dds, contrast=c("condition","VbOther","Other_CD4"))
res.Other_CD4.Clone <- results(dds, contrast=c("condition","Clone","Other_CD4"))
res.VbOther.Clone <- results(dds, contrast=c("condition","Clone","VbOther"))

#this shows the numbers of genes upregulated and downregulated
summary(res.VbOther.Clone)
summary(res.Other_CD4.VbOther)

sum ( res.Other_CD4.VbOther$padj < 0.1, na.rm=TRUE) 
#the output is 7418 (this is the some of the up and down regulated in the above summary)
sum (res.VbOther.Clone$padj < 0.1, na.rm=TRUE)
#the output is 2917

#to make a reduced forumla we want to make a nbinomLRT. The results table shows the difference between the additive model and the reduced model.
reduced.model <- as.formula(~patient + condition)

ddsreduced <- DESeq(dds, test="LRT", reduced= reduced.model)
res_dds_reduced <-results(ddsreduced)
res_dds_reduced


#note the p-value is the difference between the reduced model and the full model

resLRT.Other_CD4.VbOther <- results(ddsreduced, contrast=c("condition","VbOther","Other_CD4"))
resLRT.Other_CD4.Clone <- results(ddsreduced, contrast=c("condition","Clone","Other_CD4"))
resLRT.VbOther.Clone <- results(ddsreduced, contrast=c("condition","Clone","VbOther"))

#this shows the numbers of genes upregulated and downregulated

summary(resLRT.VbOther.Clone)
summary(resLRT.Other_CD4.VbOther)

#this suggests there are no differences in the groups between T1 and T2 for any genes in any combination of groups

#this makes a new data set Remember to change the directory here!
DESeq2Table2 <- DESeqDataSetFromHTSeqCount(sampleTable = mysampleTable, directory = "H:/R/htseq_counts_experiment_5/", design = ~patient + condition, ignoreRank = FALSE)

#this sets the factor levels
DESeq2Table2$condition <- factor(DESeq2Table$condition,
                                levels=c("Other_CD4", "VbOther","Clone"))

#this is the differential expression analysis and tells you the columns and the design 
dds_simple <- DESeq(DESeq2Table2)

ressimple.Other_CD4.VbOther <- results(dds_simple, contrast=c("condition","VbOther","Other_CD4"))
ressimple.Other_CD4.Clone <- results(dds_simple, contrast=c("condition","Clone","Other_CD4"))
ressimple.VbOther.Clone <- results(dds_simple, contrast=c("condition","Clone","VbOther"))

#this shows the numbers of genes upregulated and downregulated

summary(ressimple.VbOther.Clone)
summary(ressimple.Other_CD4.VbOther)
DESeq2 • 917 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 4 days ago
United States

extract the LFC and padj for the comparison of conditions in dds_reduced

You can run this model with that design as the full design, and specify test="Wald".

I don't really have much time to provide statistical consultation here on what are the best terms to include, how to make decision on what models to accept, etc. I have to limit to software-based questions.

ADD COMMENT
0
Entering edit mode

Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 552 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6