Question

Reason for NA in table after beta regression?

0

Entering edit mode

akhaira • 0

@akhaira-23607

Last seen 5.5 years ago

Hi, I am attempting to analyze 4 files with a total of 12,800,011 elements using BiSeq. Currently, this is the code I am using to obtain beta results:

rrbs.clust <- clusterSites(object = rrbs, groups = colData(rrbs)$group, perc.samples = 4/5, min.sites = 4, max.dist = 30)
ind.cov <- totalReads(rrbs.clust) > 0
quant <- quantile(totalReads(rrbs.clust)[ind.cov], 0.9)
rrbs.clust.lim <- limitCov(rrbs.clust, maxCov = quant)
predictedMeth <- predictMeth(object = rrbs.clust.lim)
betaResults <- betaRegression(formula = ~group, link = "probit", object = predictedMeth, type = "BR")`

I am not sure why, but when I go to view the betaResults, I see a data table with a majority of 'NA's instead of actual values:

head(betaResults)
 . . chr   pos p.val meth.group1 meth.group2 meth.diff
1.1  17 30458    NA          NA          NA        NA
1.2  17 30462    NA          NA          NA        NA
1.3  17 30464    NA          NA          NA        NA
1.4  17 30475    NA          NA          NA        NA
1.5  17 30480    NA          NA          NA        NA
1.6  17 30487    NA          NA          NA        NA
    estimate std.error pseudo.R.sqrt cluster.id
1.1       NA        NA            NA       17_1
1.2       NA        NA            NA       17_1
1.3       NA        NA            NA       17_1
1.4       NA        NA            NA       17_1
1.5       NA        NA            NA       17_1
1.6       NA        NA            NA       17_1

Very few of the rows actually have numbers (about 840/ 5670), but the ones that do, have very high p values. Does anyone have any idea why I'm seeing all these NAs, or what might be causing the p values to appear very high?

biseq beta regression • 1.9k views

ADD COMMENT • link 5.7 years ago akhaira • 0

1

Entering edit mode

Sorry for joining the discussion late. It looks like you have only 4 samples, with 2 per condition? I think this could be a reason why there are so many NAs. It's possible that the equation can't be solved, not even when using type = "BR" (see betareg package). It's also possible that some samples have missing data, which leaves you with zero replicates for some CpG sites. You definitely need more than just 2 replicates per group if you want to fit a beta regression model (or really do any statistical tests).

ADD REPLY • link 5.7 years ago Katja Hebestreit ▴ 130

0

Entering edit mode

Oh ok thank you for the reply. I have a bigger dataset that I can test on too, but I was just trying to run the smaller one first to see if it would work. I think I will try the larger one instead, then.

ADD REPLY • link 5.7 years ago akhaira • 0

0

Entering edit mode

Check that your input data does not contain NA values. Also check the formatting / encoding of your input object(s).

ADD REPLY • link 5.7 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Hi Kevin,

None of the input data contains NA values; they all have actual data values. What do you mean by check the formatting or encoding of the input objects?

ADD REPLY • link 5.7 years ago akhaira • 0

1

Entering edit mode

For example, is the initial rrbs object a BSraw object?; is colData(rrbs)$group a factor? Or, what if you check some of these sites in the original data to see what are the values, e.g., chr17:30464 ?

ADD REPLY • link 5.7 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Yes, the initial rrbs object is a BSraw class. Entering colData(rrbs)$group outputs [1] control control patient patient Levels: control patient. Some of the sites in the original data have 0 values while others have non zero values (e.g. for chr17:30464 I have the values 6.18556701030928, 6, 91 for methylation percentages, count methylated and count unmethylated, respectively.

ADD REPLY • link 5.7 years ago akhaira • 0

0

Entering edit mode

I see, then could you check the input and output of each command subsequent to clusterSites just to make sure that each is running as expected? Keep in mind that I am only replying here on behalf of the author(s), who has so far not appeared.

ADD REPLY • link 5.7 years ago Kevin Blighe ★ 4.0k

0

Entering edit mode

Yes, the only step before this one is the data import, but that seems to be ok, as I can see the output of that command produces the desired BSraw object.

ADD REPLY • link 5.7 years ago akhaira • 0