Reason for NA in table after beta regression?
0
0
Entering edit mode
akhaira • 0
@akhaira-23607
Last seen 11 months ago

Hi, I am attempting to analyze 4 files with a total of 12,800,011 elements using BiSeq. Currently, this is the code I am using to obtain beta results:

rrbs.clust <- clusterSites(object = rrbs, groups = colData(rrbs)$group, perc.samples = 4/5, min.sites = 4, max.dist = 30) ind.cov <- totalReads(rrbs.clust) > 0 quant <- quantile(totalReads(rrbs.clust)[ind.cov], 0.9) rrbs.clust.lim <- limitCov(rrbs.clust, maxCov = quant) predictedMeth <- predictMeth(object = rrbs.clust.lim) betaResults <- betaRegression(formula = ~group, link = "probit", object = predictedMeth, type = "BR")  I am not sure why, but when I go to view the betaResults, I see a data table with a majority of 'NA's instead of actual values: head(betaResults) . . chr pos p.val meth.group1 meth.group2 meth.diff 1.1 17 30458 NA NA NA NA 1.2 17 30462 NA NA NA NA 1.3 17 30464 NA NA NA NA 1.4 17 30475 NA NA NA NA 1.5 17 30480 NA NA NA NA 1.6 17 30487 NA NA NA NA estimate std.error pseudo.R.sqrt cluster.id 1.1 NA NA NA 17_1 1.2 NA NA NA 17_1 1.3 NA NA NA 17_1 1.4 NA NA NA 17_1 1.5 NA NA NA 17_1 1.6 NA NA NA 17_1  Very few of the rows actually have numbers (about 840/ 5670), but the ones that do, have very high p values. Does anyone have any idea why I'm seeing all these NAs, or what might be causing the p values to appear very high? biseq beta regression • 189 views ADD COMMENT 1 Entering edit mode Sorry for joining the discussion late. It looks like you have only 4 samples, with 2 per condition? I think this could be a reason why there are so many NAs. It's possible that the equation can't be solved, not even when using type = "BR" (see betareg package). It's also possible that some samples have missing data, which leaves you with zero replicates for some CpG sites. You definitely need more than just 2 replicates per group if you want to fit a beta regression model (or really do any statistical tests). ADD REPLY 0 Entering edit mode Oh ok thank you for the reply. I have a bigger dataset that I can test on too, but I was just trying to run the smaller one first to see if it would work. I think I will try the larger one instead, then. ADD REPLY 0 Entering edit mode Check that your input data does not contain NA values. Also check the formatting / encoding of your input object(s). ADD REPLY 0 Entering edit mode Hi Kevin, None of the input data contains NA values; they all have actual data values. What do you mean by check the formatting or encoding of the input objects? ADD REPLY 1 Entering edit mode For example, is the initial rrbs object a BSraw object?; is colData(rrbs)$group a factor? Or, what if you check some of these sites in the original data to see what are the values, e.g., chr17:30464 ?

0
Entering edit mode

Yes, the initial rrbs object is a BSraw class. Entering colData(rrbs)\$group outputs [1] control control patient patient Levels: control patient. Some of the sites in the original data have 0 values while others have non zero values (e.g. for chr17:30464 I have the values 6.18556701030928, 6, 91 for methylation percentages, count methylated and count unmethylated, respectively.

0
Entering edit mode

I see, then could you check the input and output of each command subsequent to clusterSites` just to make sure that each is running as expected? Keep in mind that I am only replying here on behalf of the author(s), who has so far not appeared.

0
Entering edit mode

Yes, the only step before this one is the data import, but that seems to be ok, as I can see the output of that command produces the desired BSraw object.