Entering edit mode
akhaira
•
0
@akhaira-23607
Last seen 4.4 years ago
Hi, I am attempting to analyze 4 files with a total of 12,800,011 elements using BiSeq. Currently, this is the code I am using to obtain beta results:
rrbs.clust <- clusterSites(object = rrbs, groups = colData(rrbs)$group, perc.samples = 4/5, min.sites = 4, max.dist = 30)
ind.cov <- totalReads(rrbs.clust) > 0
quant <- quantile(totalReads(rrbs.clust)[ind.cov], 0.9)
rrbs.clust.lim <- limitCov(rrbs.clust, maxCov = quant)
predictedMeth <- predictMeth(object = rrbs.clust.lim)
betaResults <- betaRegression(formula = ~group, link = "probit", object = predictedMeth, type = "BR")`
I am not sure why, but when I go to view the betaResults
, I see a data table with a majority of 'NA's instead of actual values:
head(betaResults)
. . chr pos p.val meth.group1 meth.group2 meth.diff
1.1 17 30458 NA NA NA NA
1.2 17 30462 NA NA NA NA
1.3 17 30464 NA NA NA NA
1.4 17 30475 NA NA NA NA
1.5 17 30480 NA NA NA NA
1.6 17 30487 NA NA NA NA
estimate std.error pseudo.R.sqrt cluster.id
1.1 NA NA NA 17_1
1.2 NA NA NA 17_1
1.3 NA NA NA 17_1
1.4 NA NA NA 17_1
1.5 NA NA NA 17_1
1.6 NA NA NA 17_1
Very few of the rows actually have numbers (about 840/ 5670), but the ones that do, have very high p values. Does anyone have any idea why I'm seeing all these NAs, or what might be causing the p values to appear very high?
Sorry for joining the discussion late. It looks like you have only 4 samples, with 2 per condition? I think this could be a reason why there are so many NAs. It's possible that the equation can't be solved, not even when using type = "BR" (see betareg package). It's also possible that some samples have missing data, which leaves you with zero replicates for some CpG sites. You definitely need more than just 2 replicates per group if you want to fit a beta regression model (or really do any statistical tests).
Oh ok thank you for the reply. I have a bigger dataset that I can test on too, but I was just trying to run the smaller one first to see if it would work. I think I will try the larger one instead, then.
Check that your input data does not contain NA values. Also check the formatting / encoding of your input object(s).
Hi Kevin,
None of the input data contains NA values; they all have actual data values. What do you mean by check the formatting or encoding of the input objects?
For example, is the initial
rrbs
object aBSraw
object?; iscolData(rrbs)$group
a factor? Or, what if you check some of these sites in the original data to see what are the values, e.g.,chr17:30464
?Yes, the initial rrbs object is a BSraw class. Entering
colData(rrbs)$group
outputs[1] control control patient patient Levels: control patient
. Some of the sites in the original data have 0 values while others have non zero values (e.g. forchr17:30464
I have the values6.18556701030928, 6, 91
for methylation percentages, count methylated and count unmethylated, respectively.I see, then could you check the input and output of each command subsequent to
clusterSites
just to make sure that each is running as expected? Keep in mind that I am only replying here on behalf of the author(s), who has so far not appeared.Yes, the only step before this one is the data import, but that seems to be ok, as I can see the output of that command produces the desired BSraw object.