I am setting up the following DESeq2 analysis and want to know if I should be concerned about confounders. I am evaluating the effect of drug on tissue gene expression and there are different numbers of tumors (1-4) analyzed from each subject. Would this create a bias in the output?
Sample Subject Tissue Treatment
Sample1 3 Normal Drug
Sample2 4 Normal Drug
Sample3 1 Normal Drug
Sample4 5 Normal Drug
Sample5 6 Normal Drug
Sample6 7 Normal Drug
Sample7 8 Normal Drug
Sample8 9 Normal Drug
Sample9 10 Normal Drug
Sample10 2 Normal Drug
Sample11 2 Adenoma Drug
Sample12 2 Adenoma Drug
Sample13 2 Adenoma Drug
Sample14 3 Adenoma Drug
Sample15 4 Adenoma Drug
Sample16 5 Adenoma Drug
Sample17 6 Adenoma Drug
Sample18 1 Adenoma Drug
Sample19 7 Adenoma Drug
Sample20 8 Adenoma Drug
Sample21 9 Adenoma Drug
Sample22 10 Adenoma Drug
Sample23 10 Adenoma Drug
Sample24 10 Adenoma Drug
Sample25 10 Adenoma Drug
Sample26 2 Adenoma Drug
Sample27 13 Normal Placebo
Sample28 14 Normal Placebo
Sample29 15 Normal Placebo
Sample30 16 Normal Placebo
Sample31 17 Normal Placebo
Sample32 18 Normal Placebo
Sample33 19 Normal Placebo
Sample34 20 Normal Placebo
Sample35 11 Normal Placebo
Sample36 12 Normal Placebo
Sample37 13 Adenoma Placebo
Sample38 13 Adenoma Placebo
Sample39 14 Adenoma Placebo
Sample40 15 Adenoma Placebo
Sample41 16 Adenoma Placebo
Sample42 16 Adenoma Placebo
Sample43 17 Adenoma Placebo
Sample44 17 Adenoma Placebo
Sample45 18 Adenoma Placebo
Sample46 11 Adenoma Placebo
Sample47 19 Adenoma Placebo
Sample48 11 Adenoma Placebo
Sample49 20 Adenoma Placebo
Sample50 20 Adenoma Placebo
Sample51 20 Adenoma Placebo
Sample52 12 Adenoma Placebo
You definitely should use a factor not an integer. Patient 3 is not patient 1 + patient 2.
You'll need to provide code for me to see if you're following the example I meant to point you to, or what might be wrong.
Mike,
Below is the code we are using. Thank you for your help.
# Script to run DESeq analysis on paired samples (tumor/normal) comparing two treatment groups (drug/placebo).
# Load the analysis software.
library(DESeq2)
# Read the data.
df=read.delim("FAPEST_count_data.txt")
# Make a matrix of the counts. Any column with the word "Sample" in its
# title contains count data.
datacols=grep("Sample",names(df))
m=as.matrix(df[datacols])
# Assign row names to the matrix. Combine the values in the EnsemblId and
# GeneName columns into a row label.
rownames(m)=paste(df$GeneName)
# Read the data frame of sample information, including sample name,
# subject, tissue type, and treatment.
sampleinfo = read.delim("sample_info.txt")
# Create separate factor objects for each experiment factor.
Subject=factor(sampleinfo$Subject)
Tissue=factor(sampleinfo$Tissue)
Treatment=factor(sampleinfo$Treatment)
# Create a DESeqDataSet object. Make sure that treatment is the last
# factor in the experiment design expression.
dds=DESeqDataSetFromMatrix(countData=m,
colData=sampleinfo,
design=~Subject+Tissue+Treatment)
# Run DESeq
dds=DESeq(dds)
# Get the results.
dds_results=results(dds)
# Save the results to a file.
write.table(dds_results,"paired_analysis_results.txt",quote=F,sep="\t",row.names=T)
Thanks,
Don
Hi Don,
The section of the vignette with the recommendation I was referring to is named "Model matrix not full rank". The error message should also have pointed you to read over this section of the vignette.
The recommended section for you experimental design begins:
"Consider an experiment with grouped individuals, where we seek to test the group-specific effect of a treatment, while controlling for individual effects..."
This should apply to your case, as you have grouped individuals, where Normal and Adenoma are the groups.
Michael,
Could we also use placebo and drug as the groups and control for different tissue types? The only issue here is we have different patients in the placebo and control groups.
Thanks,
Don
I see, yes, you are right.
You should set it up such that, the variable I refer to in the vignette as "group" is the variable that has different patients across, and "condition" is the variable that has two measures for each patient.
So here group takes values Placebo and Drug, and condition takes values Normal and Adenoma.
And then you would have a design of ~group + group:individual.nested + group:condition
Michael,
Thank you for your help.
Don