Question

Phyloseq to deseq2

0

Entering edit mode

Jitendra ▴ 10

@nabiyogesh-11718

Last seen 16 months ago

United Kingdom

I want to use deseq2 for differential abundance analysis for 16s amplicon data analysis.

data detail:

I do have three tissue:

summary(sample_data(physeq)$Tissue)
Leaf Root Soil

And each tissue has given 3 treatment.

summary(sample_data(physeq)$Treatment)
T1 T2 T3

Now I just want to use soil data and want to see the effect of treatment T1, T2 and T3 on microbial abundance in soil only.

for this how should I should I take only soil sample from physeq data and modify the below condition code.

diagdds = phyloseq_to_deseq2(kostic, ~ DIAGNOSIS)

I will be very thankful for your help.

Kind Regards Yogesh

deseq2 • 3.6k views

ADD COMMENT • link updated 6.7 years ago by Michael Love 43k • written 6.7 years ago by Jitendra ▴ 10

score 0 · Answer 1 · 2019-03-11

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 2 days ago

United States

Combine tissue and treatment into one variable, see the example in the beginning of the interactions section of the vignette.

ADD COMMENT • link 6.7 years ago Michael Love 43k

0

Entering edit mode

Hi Michael ,

Thanks a lot.

From the above data, I have extracted soil samples.I need your help to design the condition for deseq2:

here is the summary of data, I have soil sample collection from different villages from two different regions:

and we have given three treatment to each soil sample in each village: TREATMENT T1, T2 AND T3, now my aim is to find the differential abundance of the microbial community in T1 and T2 and T3 treatment condition:

head(sampledata(physeqsoil)) Sample Data: [6 samples by 5 sample variables]:

              X.SampleID       Treatment Tissue   Villages     Region
Soil-12.S42.L001 Soil-12.S42.L001    T1   Soil  Maidaipur   Rajshahi
Soil-13.S52.L001 Soil-13.S52.L001    T1   Soil      Gokul   Rajshahi
Soil-14.S62.L001 Soil-14.S62.L001    T1   Soil       Jiol   Rajshahi
Soil-15.S72.L001 Soil-15.S72.L001    T1   Soil Samastipur   Rajshahi
Soil-16.S2.L001   Soil-16.S2.L001    T1   Soil     Talund   Rajshahi
Soil-2.S1.L001     Soil-2.S1.L001    T1   Soil    Bogajan Mymensingh

summary(sample_data(physeq)$Treatment)

T1 T2 T3 54 54 54

should I import data to deseq2 like this: or I need to change something

psfdds<-phyloseq_to_deseq2(physeq_soil, ~ Region + Treatment)

I am very thankful for your help.

ADD REPLY • link 6.7 years ago Jitendra ▴ 10

0

Entering edit mode

You haven't followed my advice above. Have you read the section I referred to? I'm confused.

ADD REPLY • link 6.7 years ago Michael Love 43k

0

Entering edit mode

Dear Michael,

I read it, but I am much more interested to find treatment (T1, T2 and T3) effect on soil, root and shoot tissue separately so I extract all soil sample, root and shoot in separate files. and now wanted to proceed these file independently so that treatment effect can be analysed on microbial abundance. these samples also collected from different villages which located in two different regions (Mymensing and Rajshahi).

if still, you think I should get all tissue in one file and combine tissue and treatment and go forward:

then should I modify above code like this:

dds$group <- factor(paste0(dds$Tissue, dds$Treatment))
design(dds) <- ~ group

psfdds<-phyloseq_to_deseq2(physeq, ~ group)

Thanks Yogesh

ADD REPLY • link 6.7 years ago Jitendra ▴ 10

0

Entering edit mode

The analysis choices are up to you. This forum is for questions about the software. I'm saying (and the vignette says) if you want to extract treatment effects per tissue, you can combine tissue and treatment and use ~group. Then make comparisons between treatment within the same tissue using e.g.

contrasts=c("group","tissueX.treatmentA","tissueX.treatment0").

ADD REPLY • link 6.7 years ago Michael Love 43k

0

Entering edit mode

Dear Michael,

Thanks for your all help and time. I will run the analysis both way, it will help me to understand deseq2 more.

as I mention I have extracted soil samples and want to see the treatment effect T1, T2, T3.

I run deseq2 code like this but I am getting some messege:

psfdds<-phyloseq_to_deseq2(ps0,~Village + Treatment)
psfdds<-DESeq(psfdds, test="Wald", fitType="parametric")
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

could please suggest how can resolve it.

Thanks Yogesh

ADD REPLY • link 6.7 years ago Jitendra ▴ 10

0

Entering edit mode

Usually these messages can be ignored when they involve one or two genes and usually these genes have some outlier counts.

ADD REPLY • link 6.7 years ago Michael Love 43k

0

Entering edit mode

Dear Michael,

Thanks for your all help and time. I will run the analysis both way, it will help me to understand deseq2 more.

as I mention I have extracted soil samples and want to see the treatment effect T1, T2, T3.

I run deseq2 code like this but I am getting some messege:

psfdds<-phyloseq_to_deseq2(ps0,~Village + Treatment)
psfdds<-DESeq(psfdds, test="Wald", fitType="parametric")
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

could please suggest how can resolve it.

Thanks Yogesh

ADD REPLY • link 6.7 years ago Jitendra ▴ 10

0

Entering edit mode

Hi Michael,

I am able to generate plot from deseq2 analysis using below command.

But I am not sure whether it is T3vsT1 OR T2vsT1.

could you please suggest how I can improve this analysis more for publication,

the aim is to find which microbial species or genus are more abundant in Treatment T1 and Treatment T3 in comparison to T2 and how T3 microbial abundance is different from T1 Treatment.

I will be very thankful for your time and kind support.

Here is all code.

psfdds<-phyloseqtodeseq2(ps0,~Villages + Treatment) converting counts to integer mode psfdds<-DESeq(psfdds, test="Wald", fitType="parametric") estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing 2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest resultsNames(psfdds)
res<-results(psfdds, alpha=0.1) summary(res)

out of 62301 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 32, 0.051% LFC < 0 (down) : 36, 0.058% outliers [1] : 3939, 6.3% low counts [2] : 47758, 77% (mean count < 37) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results

mcols(res, use.names=TRUE)

mcols(res)$description

[1] "mean of normalized counts for all samples" [2] "log2 fold change (MLE): Treatment T3 vs T1" [3] "standard error: Treatment T3 vs T1"
[4] "Wald statistic: Treatment T3 vs T1"
[5] "Wald test p-value: Treatment T3 vs T1"
[6] "BH adjusted p-values"

sum(res$padj<0.05,na.rm=TRUE) [1] 56

sigtab<-cbind(as(sigtab, "data.frame"), as(tax_table(ps0)[rownames(sigtab),], "matrix")) dim(sigtab) [1] 56 13 colnames(sigtab) [1] "baseMean" "log2FoldChange" "lfcSE" "stat"
[5] "pvalue" "padj" "Kingdom" "Phylum"
[9] "Class" "Order" "Family" "Genus"
[13] "Species"

saveRDS(sigtab,file= "sigtab.RDS"))

saveRDS(sigtab,file="sigtab.RDS") scalefilldiscrete<-function(palname="Set1",...) {scalefillbrewer(palette=palname,...)} x<-tapply(sigtab$log2FoldChange, sigtab$Phylum, function(x) max(x) + ) x<-sort(x,TRUE) sigtab$Phylum<-factor(as.character(sigtab$Phylum), levels=names(x)) x<-tapply(sigtab$log2FoldChange, sigtab$Genus, function(x) max(x)) x<-sort(x,TRUE) sigtab$Genus<-factor(as.character(sigtab$Genus), levels=names(x))

plot.DESeq.UPvsDOWN<-ggplot(sigtab, aes(x=Genus, y=log2FoldChange, color=Phylum)) + geompoint(size=6) +themebw() +theme(axis.text.x = elementtext(angle = -90, hjust = 0, vjust=0.5)) + geomhline(yintercept=0) +theme(axis.text.y = element_text(angle =90, hjust=0.5))

ggsave("deseqlast.pdf")

Thanks Yogesh

ADD REPLY • link 6.7 years ago Jitendra ▴ 10

0

Entering edit mode

It looks like you are not following my advice above about using contrast. Please read over the documentation about using “contrast” in the vignette and in

?results

ADD REPLY • link 6.7 years ago Michael Love 43k

0

Entering edit mode

Dear Micheal,

you suggesting me to use group for deseq analysis,

How I should import data first in deseq2 from phyloseq and then how to run deseq2

if I am just import data without design it says design argument is missing.

psfdds<-phyloseqtodeseq2(ps0) converting counts to integer mode Error in all.vars(design) : argument "design" is missing, with no default

Kind Regards Yogesh

ADD REPLY • link 6.7 years ago Jitendra ▴ 10

0

Entering edit mode

You can use ~1 as long as you change the design later to the appropriate one with

design(dds) <- ...

ADD REPLY • link 6.7 years ago Michael Love 43k

0

Entering edit mode

Dear Micheal,

thanks a lot,

psfdds<-phyloseqtodeseq2(ps0, ~1)

converting counts to integer mode

psfdds$group <- factor(paste0(psfdds$Tissue,psfdds$Treatment))

design(psfdds) <- ~ group

do I also need to mention Treatment T2 as reference level or I should run further this command or I can use later contrast to extract the result as you mention

dds <- DESeq(psfdds, test="Wald", fitType="parametric")

Kind Regards Yogesh

ADD REPLY • link 6.7 years ago Jitendra ▴ 10

0

Entering edit mode

Hi Michael,

I am getting data like this for all comparision, but it seems there was some problem:

> psfdds <- DESeq(psfdds, test="Wald", fitType="parametric")

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 10886 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing

> resultsNames(psfdds)
[1] "Intercept"              "group_LeafT2_vs_LeafT1" "group_LeafT3_vs_LeafT1"
[4] "group_RootT1_vs_LeafT1" "group_RootT2_vs_LeafT1" "group_RootT3_vs_LeafT1"
[7] "group_SoilT1_vs_LeafT1" "group_SoilT2_vs_LeafT1" "group_SoilT3_vs_LeafT1"

> ressoil.T1_soil.T2_new <- results(psfdds, contrasts=c("group","TissueSoil.TreatmentT1","TissueSoil.TreatmentT2"))
> summary(ressoil.T1_soil.T2_new)

out of 59297 with nonzero total read count

adjusted p-value < 0.1

LFC > 0 (up)     : 58684, 99% 

LFC < 0 (down)   : 3, 0.0051% 

outliers [1]     : 0, 0% 

low counts [2]   : 3004, 5.1% 

(mean count < 0)

ADD REPLY • link 6.7 years ago Jitendra ▴ 10

0

Entering edit mode

I'm not sure if DESeq2 is an appropriate software here. Why don't you try a different method? Generally, I don't work in metagenomics and I'm skeptical of the idea of global normalization when there is no concept of "housekeeping genes". It works well in bulk RNA-seq, but I don't see why it should work across samples that share no taxa.

ADD REPLY • link 6.7 years ago Michael Love 43k