Question

Using DESeq2 to analyze m6A read counts from Nanopore sequencing

0

Entering edit mode

YC • 0

@67c454c1

Last seen 3.7 years ago

Taiwan

Hi, I'm trying to find the differential m6A sites between control and treatment.

deseq_data <- DESeqDataSetFromMatrix(counts,
                                     colData = sampleinfo,
                                     design = ~ Treatment)
deseq_data=DESeq(deseq_data,test = "LRT",reduced = ~1,useT = T,minmu = 1e-6)

DEseq_df <- results(deseq_data,contrast = c("Treatment","A","B"),
                        independentFiltering=T,cooksCutoff = F)

Where the counts matrix is m6A read counts of each sites at transcript level.

Is it proper to using DEseq2 to analyze such a experiment?

Any suggestion is helpful, many thanks!

DESeq2 • 1.4k views

ADD COMMENT • link 3.7 years ago YC • 0

score 0 · Answer 1 · 2022-03-08

0

Entering edit mode

Michael Love 43k

@mikelove

Last seen 23 hours ago

United States

Without knowing too much details about the typical distribution, I can't say much. You can examine the plotDispEsts() and then also plotCounts() for individual genes to see if the results make sense with respect to the scaled count data..

ADD COMMENT • link 3.7 years ago Michael Love 43k

0

Entering edit mode

Thanks for the quick reply, the suggestions are helpful.

I spent some time to know how "dispersion" works in DESeq2.

I tried all the four fitTypes, and it seemed both parametric & local methods could fit the model, properly.

# Two biological replicates for treatment_A and treament_B
# Each data point is the read counts at the site of one transcript (ex. 100 read counts at the 10th base of AT1G01010.1)
# All the 0 sites were removed (counts: rowSum=0)
# The read counts are normalized by the default setting.
deseq_data <- DESeqDataSetFromMatrix(counts,
                                         colData = sampleinfo[c("A_R1","A_R2","B_R1","B_R2"),],
                                         design = ~ Treatment)

deseq_data_LRT_pa=DESeq(deseq_data,test = "LRT",fitType = "parametric",reduced = ~1,minmu = 1e-6)
deseq_data_LRT_loc=DESeq(deseq_data,test = "LRT",fitType = "local",reduced = ~1,minmu = 1e-6)
deseq_data_LRT_mean=DESeq(deseq_data,test = "LRT",fitType = "mean",reduced = ~1,minmu = 1e-6)
deseq_data_LRT_poi=DESeq(deseq_data,test = "LRT",fitType = "glmGamPoi",reduced = ~1,minmu = 1e-6)

plotDispEsts(deseq_data_LRT_pa,main = "Parametric")
plotDispEsts(deseq_data_LRT_loc,main ="local")
plotDispEsts(deseq_data_LRT_mean,main ="mean")
plotDispEsts(deseq_data_LRT_poi,main ="glmGamPoi")

enter image description here

However, the "local" fitType obtained much more significant sites than "parametric" fitType.

I had referred to the previous question, but I couldn't see the plots (local versus parametric fitType in DESeq2).

Which fitType works better in my case? (or I misunderstood what DESeq2 exactly did, and the analysis is not suitable for such a data, please let me know)

Willing to provide more details if needed.

Any suggestion is helpful, many thanks!

ADD REPLY • link 3.7 years ago YC • 0

1

Entering edit mode

Parametric works best here. You can also remove genes with very low counts, those are not needed, e.g. at the top of your script:

keep <- rowSums(counts(dds) >= 10) >= X
dds <- dds[keep,]

Where X can be a value like the smallest group size. This will just improve speed and make plots easier to see.

Then again, look at plotCounts for top ranked genes to see what you think.

ADD REPLY • link 3.7 years ago Michael Love 43k

0

Entering edit mode

It really helps!

After I trimmed out the very low counts, the number of significant sites is much more increased in parametric fitType.

And the results of plotCounts also seemed well.

Thanks!

ADD REPLY • link 3.7 years ago YC • 0