Phyloseq to deseq2
1
0
Entering edit mode
Jitendra ▴ 10
@nabiyogesh-11718
Last seen 5 months ago
United Kingdom

I want to use deseq2 for differential abundance analysis for 16s amplicon data analysis.

data detail:

I do have three tissue:

summary(sample_data(physeq)$Tissue)
Leaf Root Soil

And each tissue has given 3 treatment.

summary(sample_data(physeq)$Treatment)
T1 T2 T3

Now I just want to use soil data and want to see the effect of treatment T1, T2 and T3 on microbial abundance in soil only.

for this how should I should I take only soil sample from physeq data and modify the below condition code.

diagdds = phyloseq_to_deseq2(kostic, ~ DIAGNOSIS)

I will be very thankful for your help.

Kind Regards Yogesh

deseq2 • 3.2k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 6 days ago
United States

Combine tissue and treatment into one variable, see the example in the beginning of the interactions section of the vignette.

ADD COMMENT
0
Entering edit mode

Hi Michael ,

Thanks a lot.

From the above data, I have extracted soil samples.I need your help to design the condition for deseq2:

here is the summary of data, I have soil sample collection from different villages from two different regions:

and we have given three treatment to each soil sample in each village: TREATMENT T1, T2 AND T3, now my aim is to find the differential abundance of the microbial community in T1 and T2 and T3 treatment condition:

head(sampledata(physeqsoil)) Sample Data: [6 samples by 5 sample variables]:

              X.SampleID       Treatment Tissue   Villages     Region
Soil-12.S42.L001 Soil-12.S42.L001    T1   Soil  Maidaipur   Rajshahi
Soil-13.S52.L001 Soil-13.S52.L001    T1   Soil      Gokul   Rajshahi
Soil-14.S62.L001 Soil-14.S62.L001    T1   Soil       Jiol   Rajshahi
Soil-15.S72.L001 Soil-15.S72.L001    T1   Soil Samastipur   Rajshahi
Soil-16.S2.L001   Soil-16.S2.L001    T1   Soil     Talund   Rajshahi
Soil-2.S1.L001     Soil-2.S1.L001    T1   Soil    Bogajan Mymensingh

summary(sample_data(physeq)$Treatment)

T1 T2 T3 54 54 54

should I import data to deseq2 like this: or I need to change something

psfdds<-phyloseq_to_deseq2(physeq_soil, ~ Region + Treatment)

I am very thankful for your help.

ADD REPLY
0
Entering edit mode

You haven't followed my advice above. Have you read the section I referred to? I'm confused.

ADD REPLY
0
Entering edit mode

Dear Michael,

I read it, but I am much more interested to find treatment (T1, T2 and T3) effect on soil, root and shoot tissue separately so I extract all soil sample, root and shoot in separate files. and now wanted to proceed these file independently so that treatment effect can be analysed on microbial abundance. these samples also collected from different villages which located in two different regions (Mymensing and Rajshahi).

if still, you think I should get all tissue in one file and combine tissue and treatment and go forward:

then should I modify above code like this:

dds$group <- factor(paste0(dds$Tissue, dds$Treatment))
design(dds) <- ~ group

psfdds<-phyloseq_to_deseq2(physeq, ~ group)

Thanks Yogesh

ADD REPLY
0
Entering edit mode

The analysis choices are up to you. This forum is for questions about the software. I'm saying (and the vignette says) if you want to extract treatment effects per tissue, you can combine tissue and treatment and use ~group. Then make comparisons between treatment within the same tissue using e.g.

contrasts=c("group","tissueX.treatmentA","tissueX.treatment0").

ADD REPLY
0
Entering edit mode

Dear Michael,

Thanks for your all help and time. I will run the analysis both way, it will help me to understand deseq2 more.

as I mention I have extracted soil samples and want to see the treatment effect T1, T2, T3.

I run deseq2 code like this but I am getting some messege:

psfdds<-phyloseq_to_deseq2(ps0,~Village + Treatment)
psfdds<-DESeq(psfdds, test="Wald", fitType="parametric")
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

could please suggest how can resolve it.

Thanks Yogesh

ADD REPLY
0
Entering edit mode

Usually these messages can be ignored when they involve one or two genes and usually these genes have some outlier counts.

ADD REPLY
0
Entering edit mode

Dear Michael,

Thanks for your all help and time. I will run the analysis both way, it will help me to understand deseq2 more.

as I mention I have extracted soil samples and want to see the treatment effect T1, T2, T3.

I run deseq2 code like this but I am getting some messege:

psfdds<-phyloseq_to_deseq2(ps0,~Village + Treatment)
psfdds<-DESeq(psfdds, test="Wald", fitType="parametric")
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest

could please suggest how can resolve it.

Thanks Yogesh

ADD REPLY
0
Entering edit mode

Hi Michael,

I am able to generate plot from deseq2 analysis using below command.

But I am not sure whether it is T3vsT1 OR T2vsT1.

could you please suggest how I can improve this analysis more for publication,

the aim is to find which microbial species or genus are more abundant in Treatment T1 and Treatment T3 in comparison to T2 and how T3 microbial abundance is different from T1 Treatment.

I will be very thankful for your time and kind support.

Here is all code.

psfdds<-phyloseqtodeseq2(ps0,~Villages + Treatment) converting counts to integer mode psfdds<-DESeq(psfdds, test="Wald", fitType="parametric") estimating size factors estimating dispersions gene-wise dispersion estimates mean-dispersion relationship final dispersion estimates fitting model and testing 2 rows did not converge in beta, labelled in mcols(object)$betaConv. Use larger maxit argument with nbinomWaldTest resultsNames(psfdds)
res<-results(psfdds, alpha=0.1) summary(res)

out of 62301 with nonzero total read count adjusted p-value < 0.1 LFC > 0 (up) : 32, 0.051% LFC < 0 (down) : 36, 0.058% outliers [1] : 3939, 6.3% low counts [2] : 47758, 77% (mean count < 37) [1] see 'cooksCutoff' argument of ?results [2] see 'independentFiltering' argument of ?results

mcols(res, use.names=TRUE)

mcols(res)$description

[1] "mean of normalized counts for all samples" [2] "log2 fold change (MLE): Treatment T3 vs T1" [3] "standard error: Treatment T3 vs T1"
[4] "Wald statistic: Treatment T3 vs T1"
[5] "Wald test p-value: Treatment T3 vs T1"
[6] "BH adjusted p-values"

sum(res$padj<0.05,na.rm=TRUE) [1] 56

sigtab<-cbind(as(sigtab, "data.frame"), as(tax_table(ps0)[rownames(sigtab),], "matrix")) dim(sigtab) [1] 56 13 colnames(sigtab) [1] "baseMean" "log2FoldChange" "lfcSE" "stat"
[5] "pvalue" "padj" "Kingdom" "Phylum"
[9] "Class" "Order" "Family" "Genus"
[13] "Species"

saveRDS(sigtab,file= "sigtab.RDS"))

saveRDS(sigtab,file="sigtab.RDS") scalefilldiscrete<-function(palname="Set1",...) {scalefillbrewer(palette=palname,...)} x<-tapply(sigtab$log2FoldChange, sigtab$Phylum, function(x) max(x) + ) x<-sort(x,TRUE) sigtab$Phylum<-factor(as.character(sigtab$Phylum), levels=names(x)) x<-tapply(sigtab$log2FoldChange, sigtab$Genus, function(x) max(x)) x<-sort(x,TRUE) sigtab$Genus<-factor(as.character(sigtab$Genus), levels=names(x))

plot.DESeq.UPvsDOWN<-ggplot(sigtab, aes(x=Genus, y=log2FoldChange, color=Phylum)) + geompoint(size=6) +themebw() +theme(axis.text.x = elementtext(angle = -90, hjust = 0, vjust=0.5)) + geomhline(yintercept=0) +theme(axis.text.y = element_text(angle =90, hjust=0.5))

ggsave("deseqlast.pdf")

Thanks Yogesh

ADD REPLY
0
Entering edit mode

It looks like you are not following my advice above about using contrast. Please read over the documentation about using “contrast” in the vignette and in

?results

ADD REPLY
0
Entering edit mode

Dear Micheal,

you suggesting me to use group for deseq analysis,

How I should import data first in deseq2 from phyloseq and then how to run deseq2

if I am just import data without design it says design argument is missing.

psfdds<-phyloseqtodeseq2(ps0) converting counts to integer mode Error in all.vars(design) : argument "design" is missing, with no default

Kind Regards Yogesh

ADD REPLY
0
Entering edit mode

You can use ~1 as long as you change the design later to the appropriate one with

design(dds) <- ...

ADD REPLY
0
Entering edit mode

Dear Micheal,

thanks a lot,

psfdds<-phyloseqtodeseq2(ps0, ~1)

converting counts to integer mode

psfdds$group <- factor(paste0(psfdds$Tissue,psfdds$Treatment))

design(psfdds) <- ~ group

do I also need to mention Treatment T2 as reference level or I should run further this command or I can use later contrast to extract the result as you mention

dds <- DESeq(psfdds, test="Wald", fitType="parametric")

Kind Regards Yogesh

ADD REPLY
0
Entering edit mode

Hi Michael,

I am getting data like this for all comparision, but it seems there was some problem:

> psfdds <- DESeq(psfdds, test="Wald", fitType="parametric")

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
-- replacing outliers and refitting for 10886 genes
-- DESeq argument 'minReplicatesForReplace' = 7 
-- original counts are preserved in counts(dds)
estimating dispersions
fitting model and testing

> resultsNames(psfdds)
[1] "Intercept"              "group_LeafT2_vs_LeafT1" "group_LeafT3_vs_LeafT1"
[4] "group_RootT1_vs_LeafT1" "group_RootT2_vs_LeafT1" "group_RootT3_vs_LeafT1"
[7] "group_SoilT1_vs_LeafT1" "group_SoilT2_vs_LeafT1" "group_SoilT3_vs_LeafT1"

> ressoil.T1_soil.T2_new <- results(psfdds, contrasts=c("group","TissueSoil.TreatmentT1","TissueSoil.TreatmentT2"))
> summary(ressoil.T1_soil.T2_new)

out of 59297 with nonzero total read count

adjusted p-value < 0.1

LFC > 0 (up)     : 58684, 99% 

LFC < 0 (down)   : 3, 0.0051% 

outliers [1]     : 0, 0% 

low counts [2]   : 3004, 5.1% 

(mean count < 0)
ADD REPLY
0
Entering edit mode

I'm not sure if DESeq2 is an appropriate software here. Why don't you try a different method? Generally, I don't work in metagenomics and I'm skeptical of the idea of global normalization when there is no concept of "housekeeping genes". It works well in bulk RNA-seq, but I don't see why it should work across samples that share no taxa.

ADD REPLY

Login before adding your answer.

Traffic: 422 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6