edgeR: "no non-missing arguments to min; returning Inf" after filterByExpression
1
0
Entering edit mode
hcnbox • 0
@hcnbox-23063
Last seen 2.5 years ago

I have a data set of RNAseq performed on an Affymetrix HiSeq 2500, and am in the process of cleaning and pre-analysis, following the vignette from the edgeR package. The point where I'm stuck is in removing genes that are lowly expressed. There are 6 conditions in my data set; the edger file has dimensions (24421, 6). When I run the sample code from the edgeR vignette that evaluates for row sums == 0, I find there are 5100 rows that meet this condition, leaving 19321 rows with sums >0. So up to that point things appear to be fine. Then I run the code line

>  keep.expr <- filterByExprs(edger, group=group). 

This gives the following warning:

 "Warning message in min(n[n > 1L]):
"no non-missing arguments to min; returning Inf"

and also creates the keep.expr output file. The keep.expr file should contain a boolean for each row signifying if it was kept or not, plus the RNAseq values for rows kept. It has no RNAseq values at all. I used

> length(which(keep.exprs))
> length(which(!keep.exprs)) 

to find the total of "TRUE" and "FALSE" entries in keep.expr, and found that the booleans from filterByExprs are all "FALSE". There should have been 19321 "TRUE" entries along with their RNAseq values, and 5100 "FALSE" entries. The next command in the vignette after filterByExprs uses keep.expr to trim the edger file. When I do that the dimensions of new edger file are (0, 6), consistent with there being no RNAseq values left after doing the filterByExprs. I've been studying the edgeRUserGuide but cannot find what is missing that keeps filterByExprs from working properly.

Heber

filterByExpression edgeR RNAseq analysis edger • 556 views
ADD COMMENT
0
Entering edit mode

I assume you mean have used an Illumina HiSeq 2500. Affymetrix is a microarray platform and they don't make sequencers.

ADD REPLY
0
Entering edit mode
@gordon-smyth
Last seen 3 hours ago
WEHI, Melbourne, Australia

Please reinstall edgeR to get the latest version. You are running into a bug that was fixed a year ago in edgeR 3.25.4. The current release of Bioconductor is 3.10 but the bug is also fixed in edgeR for Bioconductor 3.9.

The basic problem though is that your experiment has no replication. You apparently have 6 different conditions and only 6 samples.

If you want get almost the same result as filterByExp would give on your data, you could simply keep genes with at least 15 reads:

keep <- rowSums(dge$counts) >= 15
ADD COMMENT
0
Entering edit mode

Thank you. You are correct that it was an Illumina, not Affymetrix. I was tired when I wrote that. And yes, the data have no replicates. I used the "keep" function and that solved the issue. I appreciate your help, which confirmed to me that I used a good workaround, and also gave me some context for a good minimum number of reads. Now, I'm having problems at the removal of heteroscedasticity step. Voom says I have NA values; I confirmed that there are 3 such values. I don't know how they appeared in the data file. So I'm working on figuring out where they are and why they appeared. Hopefully I will figure it out and won't have to post a new request for help. If you don't mind a question on that here, since I do not have replicates does that affect the Voom function to keep it from working?

ADD REPLY

Login before adding your answer.

Traffic: 205 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6