Question: Phyloseq to deseq2
0
gravatar for nabiyogesh
10 weeks ago by
nabiyogesh0
nabiyogesh0 wrote:

I want to use deseq2 for differential abundance analysis for 16s amplicon data analysis.

data detail:

I do have three tissue:

summary(sample_data(physeq)$Tissue)
Leaf Root Soil

And each tissue has given 3 treatment.

summary(sample_data(physeq)$Treatment)
T1 T2 T3

Now I just want to use soil data and want to see the effect of treatment T1, T2 and T3 on microbial abundance in soil only.

for this how should I should I take only soil sample from physeq data and modify the below condition code.

diagdds = phyloseq_to_deseq2(kostic, ~ DIAGNOSIS)

I will be very thankful for your help.

Kind Regards Yogesh

deseq2 • 165 views
ADD COMMENTlink modified 10 weeks ago by Michael Love23k • written 10 weeks ago by nabiyogesh0
Answer: Phyloseq to deseq2
0
gravatar for Michael Love
10 weeks ago by
Michael Love23k
United States
Michael Love23k wrote:

Combine tissue and treatment into one variable, see the example in the beginning of the interactions section of the vignette.

ADD COMMENTlink modified 10 weeks ago • written 10 weeks ago by Michael Love23k

Hi Michael ,

Thanks a lot.

From the above data, I have extracted soil samples.I need your help to design the condition for deseq2:

here is the summary of data, I have soil sample collection from different villages from two different regions:

and we have given three treatment to each soil sample in each village: TREATMENT T1, T2 AND T3, now my aim is to find the differential abundance of the microbial community in T1 and T2 and T3 treatment condition:

head(sampledata(physeqsoil)) Sample Data: [6 samples by 5 sample variables]:

              X.SampleID       Treatment Tissue   Villages     Region
Soil-12.S42.L001 Soil-12.S42.L001    T1   Soil  Maidaipur   Rajshahi
Soil-13.S52.L001 Soil-13.S52.L001    T1   Soil      Gokul   Rajshahi
Soil-14.S62.L001 Soil-14.S62.L001    T1   Soil       Jiol   Rajshahi
Soil-15.S72.L001 Soil-15.S72.L001    T1   Soil Samastipur   Rajshahi
Soil-16.S2.L001   Soil-16.S2.L001    T1   Soil     Talund   Rajshahi
Soil-2.S1.L001     Soil-2.S1.L001    T1   Soil    Bogajan Mymensingh

summary(sample_data(physeq)$Treatment)

T1 T2 T3 54 54 54

should I import data to deseq2 like this: or I need to change something

psfdds<-phyloseq_to_deseq2(physeq_soil, ~ Region + Treatment)

I am very thankful for your help.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by nabiyogesh0

You haven't followed my advice above. Have you read the section I referred to? I'm confused.

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Michael Love23k

Dear Michael,

I read it, but I am much more interested to find treatment (T1, T2 and T3) effect on soil, root and shoot tissue separately so I extract all soil sample, root and shoot in separate files. and now wanted to proceed these file independently so that treatment effect can be analysed on microbial abundance. these samples also collected from different villages which located in two different regions (Mymensing and Rajshahi).

if still, you think I should get all tissue in one file and combine tissue and treatment and go forward:

then should I modify above code like this:

dds$group <- factor(paste0(dds$Tissue, dds$Treatment))
design(dds) <- ~ group

psfdds<-phyloseq_to_deseq2(physeq, ~ group)

Thanks Yogesh

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by nabiyogesh0

The analysis choices are up to you. This forum is for questions about the software. I'm saying (and the vignette says) if you want to extract treatment effects per tissue, you can combine tissue and treatment and use ~group. Then make comparisons between treatment within the same tissue using e.g.

contrasts=c("group","tissueX.treatmentA","tissueX.treatment0").

ADD REPLYlink written 10 weeks ago by Michael Love23k

Dear Michael,

Thanks for your all help and time. I will run the analysis both way, it will help me to understand deseq2 more.

as I mention I have extracted soil samples and want to see the treatment effect T1, T2, T3.

I run deseq2 code like this but I am getting some messege:

psfdds<-phyloseq_to_deseq2(ps0,~Village + Treatment)
psfdds<-DESeq(psfdds, test="Wald", fitType="parametric")
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

could please suggest how can resolve it.

Thanks Yogesh

ADD REPLYlink written 9 weeks ago by nabiyogesh0

Usually these messages can be ignored when they involve one or two genes and usually these genes have some outlier counts.

ADD REPLYlink written 9 weeks ago by Michael Love23k

Dear Michael,

Thanks for your all help and time. I will run the analysis both way, it will help me to understand deseq2 more.

as I mention I have extracted soil samples and want to see the treatment effect T1, T2, T3.

I run deseq2 code like this but I am getting some messege:

psfdds<-phyloseq_to_deseq2(ps0,~Village + Treatment)
psfdds<-DESeq(psfdds, test="Wald", fitType="parametric")
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

could please suggest how can resolve it.

Thanks Yogesh

ADD REPLYlink written 9 weeks ago by nabiyogesh0

Hi Michael,

I am able to generate plot from deseq2 analysis using below command.

But I am not sure whether it is T3vsT1 OR T2vsT1.

could you please suggest how I can improve this analysis more for publication,

the aim is to find which microbial species or genus are more abundant in Treatment T1 and Treatment T3 in comparison to T2 and how T3 microbial abundance is different from T1 Treatment.

I will be very thankful for your time and kind support.

Here is all code.

psfdds<-phyloseqtodeseq2(ps0,~Villages + Treatment) converting counts to integer mode psfdds<-DESeq(psfdds, test="Wald", fitType="parametric") estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing 2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest resultsNames(psfdds)
res<-results(psfdds, alpha=0.1) summary(res)

out of 62301 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 32, 0.051% LFC < 0 (down) : 36, 0.058% outliers [1] : 3939, 6.3% low counts [2] : 47758, 77% (mean count < 37) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results

mcols(res, use.names=TRUE)

mcols(res)$description

[1] "mean of normalized counts for all samples" [2] "log2 fold change (MLE): Treatment T3 vs T1" [3] "standard error: Treatment T3 vs T1"
[4] "Wald statistic: Treatment T3 vs T1"
[5] "Wald test p-value: Treatment T3 vs T1"
[6] "BH adjusted p-values"

sum(res$padj<0.05,na.rm=TRUE) [1] 56

sigtab<-cbind(as(sigtab, "data.frame"), as(tax_table(ps0)[rownames(sigtab),], "matrix")) dim(sigtab) [1] 56 13 colnames(sigtab) [1] "baseMean" "log2FoldChange" "lfcSE" "stat"
[5] "pvalue" "padj" "Kingdom" "Phylum"
[9] "Class" "Order" "Family" "Genus"
[13] "Species"

saveRDS(sigtab,file= "sigtab.RDS"))

saveRDS(sigtab,file="sigtab.RDS") scalefilldiscrete<-function(palname="Set1",...) {scalefillbrewer(palette=palname,...)} x<-tapply(sigtab$log2FoldChange, sigtab$Phylum, function(x) max(x) + ) x<-sort(x,TRUE) sigtab$Phylum<-factor(as.character(sigtab$Phylum), levels=names(x)) x<-tapply(sigtab$log2FoldChange, sigtab$Genus, function(x) max(x)) x<-sort(x,TRUE) sigtab$Genus<-factor(as.character(sigtab$Genus), levels=names(x))

plot.DESeq.UPvsDOWN<-ggplot(sigtab, aes(x=Genus, y=log2FoldChange, color=Phylum)) + geompoint(size=6) +themebw() +theme(axis.text.x = elementtext(angle = -90, hjust = 0, vjust=0.5)) + geomhline(yintercept=0) +theme(axis.text.y = element_text(angle =90, hjust=0.5))

ggsave("deseqlast.pdf")

Thanks Yogesh

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by nabiyogesh0

It looks like you are not following my advice above about using contrast. Please read over the documentation about using “contrast” in the vignette and in

?results

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by Michael Love23k

Dear Micheal,

you suggesting me to use group for deseq analysis,

How I should import data first in deseq2 from phyloseq and then how to run deseq2

if I am just import data without design it says design argument is missing.

psfdds<-phyloseqtodeseq2(ps0) converting counts to integer mode Error in all.vars(design) : argument "design" is missing, with no default

Kind Regards Yogesh

ADD REPLYlink written 9 weeks ago by nabiyogesh0

You can use ~1 as long as you change the design later to the appropriate one with

design(dds) <- ...

ADD REPLYlink written 9 weeks ago by Michael Love23k

Dear Micheal,

thanks a lot,

psfdds<-phyloseqtodeseq2(ps0, ~1)

converting counts to integer mode

psfdds$group <- factor(paste0(psfdds$Tissue,psfdds$Treatment))

design(psfdds) <- ~ group

do I also need to mention Treatment T2 as reference level or I should run further this command or I can use later contrast to extract the result as you mention

dds <- DESeq(psfdds, test="Wald", fitType="parametric")

Kind Regards Yogesh

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by nabiyogesh0

Hi Michael,

I am getting data like this for all comparision, but it seems there was some problem:

> psfdds <- DESeq(psfdds, test="Wald", fitType="parametric")

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 10886 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing

> resultsNames(psfdds)
[1] "Intercept"              "group_LeafT2_vs_LeafT1" "group_LeafT3_vs_LeafT1"
[4] "group_RootT1_vs_LeafT1" "group_RootT2_vs_LeafT1" "group_RootT3_vs_LeafT1"
[7] "group_SoilT1_vs_LeafT1" "group_SoilT2_vs_LeafT1" "group_SoilT3_vs_LeafT1"

> ressoil.T1_soil.T2_new <- results(psfdds, contrasts=c("group","TissueSoil.TreatmentT1","TissueSoil.TreatmentT2"))
> summary(ressoil.T1_soil.T2_new)

out of 59297 with nonzero total read count

adjusted p-value < 0.1

LFC > 0 (up)     : 58684, 99% 

LFC < 0 (down)   : 3, 0.0051% 

outliers [1]     : 0, 0% 

low counts [2]   : 3004, 5.1% 

(mean count < 0)
ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by nabiyogesh0

I'm not sure if DESeq2 is an appropriate software here. Why don't you try a different method? Generally, I don't work in metagenomics and I'm skeptical of the idea of global normalization when there is no concept of "housekeeping genes". It works well in bulk RNA-seq, but I don't see why it should work across samples that share no taxa.

ADD REPLYlink written 9 weeks ago by Michael Love23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 203 users visited in the last hour