DESEq2 multifactorial design
3
0
Entering edit mode
dondelker • 0
@dondelker-10757
Last seen 8.8 years ago

I am setting up the following DESeq2 analysis and want to know if I should be concerned about confounders. I am evaluating the effect of drug on tissue gene expression and there are different numbers of tumors (1-4) analyzed from each subject. Would this create a bias in the output? 

 Sample Subject Tissue Treatment
Sample1 3 Normal Drug
Sample2 4 Normal Drug
Sample3 1 Normal Drug
Sample4 5 Normal Drug
Sample5 6 Normal Drug
Sample6 7 Normal Drug
Sample7 8 Normal Drug
Sample8 9 Normal Drug
Sample9 10 Normal Drug
Sample10 2 Normal Drug
Sample11 2 Adenoma Drug
Sample12 2 Adenoma Drug
Sample13 2 Adenoma Drug
Sample14 3 Adenoma Drug
Sample15 4 Adenoma Drug
Sample16 5 Adenoma Drug
Sample17 6 Adenoma Drug
Sample18 1 Adenoma Drug
Sample19 7 Adenoma Drug
Sample20 8 Adenoma Drug
Sample21 9 Adenoma Drug
Sample22 10 Adenoma Drug
Sample23 10 Adenoma Drug
Sample24 10 Adenoma Drug
Sample25 10 Adenoma Drug
Sample26 2 Adenoma Drug
Sample27 13 Normal Placebo
Sample28 14 Normal Placebo
Sample29 15 Normal Placebo
Sample30 16 Normal Placebo
Sample31 17 Normal Placebo
Sample32 18 Normal Placebo
Sample33 19 Normal Placebo
Sample34 20 Normal Placebo
Sample35 11 Normal Placebo
Sample36 12 Normal Placebo
Sample37 13 Adenoma Placebo
Sample38 13 Adenoma Placebo
Sample39 14 Adenoma Placebo
Sample40 15 Adenoma Placebo
Sample41 16 Adenoma Placebo
Sample42 16 Adenoma Placebo
Sample43 17 Adenoma Placebo
Sample44 17 Adenoma Placebo
Sample45 18 Adenoma Placebo
Sample46 11 Adenoma Placebo
Sample47 19 Adenoma Placebo
Sample48 11 Adenoma Placebo
Sample49 20 Adenoma Placebo
Sample50 20 Adenoma Placebo
Sample51 20 Adenoma Placebo
Sample52 12 Adenoma Placebo

deseq2 • 1.9k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 20 hours ago
United States

Unbalanced designs are not the most powerful for a given total sample size, but you can still perform the analysis. See the vignette for a suggestion on how to make Adenoma vs Normal comparisons within subject.

ADD COMMENT
0
Entering edit mode
dondelker • 0
@dondelker-10757
Last seen 8.8 years ago

Michael,

Thank you for your help. We performed this analysis with balanced numbers of samples per patient and get the below message.

The design formula contains a numeric variable with integer values, specifying a model with increasing fold change for higher values. Did you mean for this to be a factor? if so, first convert this variable to a factor using the factor() function.

Do we need to use non-integer values for Subject? If we use non-integer values we get an error that the model matrix is not full rank.

Thanks,

Don

 

 

ADD COMMENT
1
Entering edit mode

You definitely should use a factor not an integer. Patient 3 is not patient 1 + patient 2.

You'll need to provide code for me to see if you're following the example I meant to point you to, or what might be wrong.

ADD REPLY
0
Entering edit mode

Mike,

 

Below is the code we are using. Thank you for your help.

 

# Script to run DESeq analysis on paired samples (tumor/normal) comparing two treatment groups (drug/placebo).

# Load the analysis software.
library(DESeq2)

# Read the data.
df=read.delim("FAPEST_count_data.txt")

# Make a matrix of the counts. Any column with the word "Sample" in its
# title contains count data.
datacols=grep("Sample",names(df))
m=as.matrix(df[datacols])

# Assign row names to the matrix. Combine the values in the EnsemblId and
# GeneName columns into a row label.
rownames(m)=paste(df$GeneName)

# Read the data frame of sample information, including sample name,
# subject, tissue type, and treatment.
sampleinfo = read.delim("sample_info.txt")

# Create separate factor objects for each experiment factor.
Subject=factor(sampleinfo$Subject)
Tissue=factor(sampleinfo$Tissue)
Treatment=factor(sampleinfo$Treatment)

# Create a DESeqDataSet object. Make sure that treatment is the last
# factor in the experiment design expression.
dds=DESeqDataSetFromMatrix(countData=m,
 colData=sampleinfo,
 design=~Subject+Tissue+Treatment)

# Run DESeq
dds=DESeq(dds)

# Get the results.
dds_results=results(dds)

# Save the results to a file.
write.table(dds_results,"paired_analysis_results.txt",quote=F,sep="\t",row.names=T)

 

Thanks,

Don

ADD REPLY
0
Entering edit mode

Hi Don,

The section of the vignette with the recommendation I was referring to is named "Model matrix not full rank". The error message should also have pointed you to read over this section of the vignette.

The recommended section for you experimental design begins:

"Consider an experiment with grouped individuals, where we seek to test the group-specific effect of a treatment, while controlling for individual effects..."

This should apply to your case, as you have grouped individuals, where Normal and Adenoma are the groups.

ADD REPLY
0
Entering edit mode

Michael,

Could we also use placebo and drug as the groups and control for different tissue types? The only issue here is we have different patients in the placebo and control groups.

Thanks,

Don

ADD REPLY
0
Entering edit mode

I see, yes, you are right.

You should set it up such that, the variable I refer to in the vignette as "group" is the variable that has different patients across, and "condition" is the variable that has two measures for each patient.

So here group takes values Placebo and Drug, and condition takes values Normal and Adenoma.

And then you would have a design of ~group + group:individual.nested + group:condition

ADD REPLY
0
Entering edit mode

Michael,

Thank you for your help.

Don

ADD REPLY
0
Entering edit mode
dondelker • 0
@dondelker-10757
Last seen 8.8 years ago

Below is the code we are using. Thank you for your help.

 

# Script to run DESeq analysis on paired samples (tumor/normal) comparing two treatment groups (drug/placebo).

# Load the analysis software.
library(DESeq2)

# Read the data.
df=read.delim("FAPEST_count_data.txt")

# Make a matrix of the counts. Any column with the word "Sample" in its
# title contains count data.
datacols=grep("Sample",names(df))
m=as.matrix(df[datacols])

# Assign row names to the matrix. Combine the values in the EnsemblId and
# GeneName columns into a row label.
rownames(m)=paste(df$GeneName)

# Read the data frame of sample information, including sample name,
# subject, tissue type, and treatment.
sampleinfo = read.delim("sample_info.txt")

# Create separate factor objects for each experiment factor.
Subject=factor(sampleinfo$Subject)
Tissue=factor(sampleinfo$Tissue)
Treatment=factor(sampleinfo$Treatment)

# Create a DESeqDataSet object. Make sure that treatment is the last
# factor in the experiment design expression.
dds=DESeqDataSetFromMatrix(countData=m,
 colData=sampleinfo,
 design=~Subject+Tissue+Treatment)

# Run DESeq
dds=DESeq(dds)

# Get the results.
dds_results=results(dds)

# Save the results to a file.
write.table(dds_results,"paired_analysis_results.txt",quote=F,sep="\t",row.names=T)

ADD COMMENT

Login before adding your answer.

Traffic: 699 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6