Using DESeq2 to analyze m6A read counts from Nanopore sequencing
1
0
Entering edit mode
YC • 0
@67c454c1
Last seen 3.7 years ago
Taiwan

Hi, I'm trying to find the differential m6A sites between control and treatment.

deseq_data <- DESeqDataSetFromMatrix(counts,
                                     colData = sampleinfo,
                                     design = ~ Treatment)
deseq_data=DESeq(deseq_data,test = "LRT",reduced = ~1,useT = T,minmu = 1e-6)

DEseq_df <- results(deseq_data,contrast = c("Treatment","A","B"),
                        independentFiltering=T,cooksCutoff = F)

Where the counts matrix is m6A read counts of each sites at transcript level.

Is it proper to using DEseq2 to analyze such a experiment?

Any suggestion is helpful, many thanks!

DESeq2 • 1.4k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 23 hours ago
United States

Without knowing too much details about the typical distribution, I can't say much. You can examine the plotDispEsts() and then also plotCounts() for individual genes to see if the results make sense with respect to the scaled count data..

ADD COMMENT
0
Entering edit mode

Thanks for the quick reply, the suggestions are helpful.

I spent some time to know how "dispersion" works in DESeq2.

I tried all the four fitTypes, and it seemed both parametric & local methods could fit the model, properly.

# Two biological replicates for treatment_A and treament_B
# Each data point is the read counts at the site of one transcript (ex. 100 read counts at the 10th base of AT1G01010.1)
# All the 0 sites were removed (counts: rowSum=0)
# The read counts are normalized by the default setting.
deseq_data <- DESeqDataSetFromMatrix(counts,
                                         colData = sampleinfo[c("A_R1","A_R2","B_R1","B_R2"),],
                                         design = ~ Treatment)

deseq_data_LRT_pa=DESeq(deseq_data,test = "LRT",fitType = "parametric",reduced = ~1,minmu = 1e-6)
deseq_data_LRT_loc=DESeq(deseq_data,test = "LRT",fitType = "local",reduced = ~1,minmu = 1e-6)
deseq_data_LRT_mean=DESeq(deseq_data,test = "LRT",fitType = "mean",reduced = ~1,minmu = 1e-6)
deseq_data_LRT_poi=DESeq(deseq_data,test = "LRT",fitType = "glmGamPoi",reduced = ~1,minmu = 1e-6)

plotDispEsts(deseq_data_LRT_pa,main = "Parametric")
plotDispEsts(deseq_data_LRT_loc,main ="local")
plotDispEsts(deseq_data_LRT_mean,main ="mean")
plotDispEsts(deseq_data_LRT_poi,main ="glmGamPoi")

enter image description here enter image description here enter image description here enter image description here

However, the "local" fitType obtained much more significant sites than "parametric" fitType.

I had referred to the previous question, but I couldn't see the plots (local versus parametric fitType in DESeq2).

Which fitType works better in my case? (or I misunderstood what DESeq2 exactly did, and the analysis is not suitable for such a data, please let me know)

Willing to provide more details if needed.

Any suggestion is helpful, many thanks!

ADD REPLY
1
Entering edit mode

Parametric works best here. You can also remove genes with very low counts, those are not needed, e.g. at the top of your script:

keep <- rowSums(counts(dds) >= 10) >= X
dds <- dds[keep,]

Where X can be a value like the smallest group size. This will just improve speed and make plots easier to see.

Then again, look at plotCounts for top ranked genes to see what you think.

ADD REPLY
0
Entering edit mode

It really helps!

After I trimmed out the very low counts, the number of significant sites is much more increased in parametric fitType.

And the results of plotCounts also seemed well.

Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 863 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6