Question: MetagenomeSeq sparsefeature question
0
3.9 years ago by
United States
yang.xiang10300 wrote:

Hi,

I'm using metagneomeSeq to analyze data.

After creating MRexperiment, I had 31 features within 87 samples.

However, after using sparsefeature > 0) < 10), there was 0 feature left, which was weird, because none of the rows actually had sum of counts less than 10 in the original MRexperiment.

> sparseFeatures = which(rowSums(MRcounts(Phylum) > 0) < 10)
> Phylumtrim = Phylum[-sparseFeatures, ]
> Phylumtrim
MRexperiment (storageMode: environment)
assayData: 0 features, 87 samples
element names: counts

When I use cumNormStat, I received a warning about "empty samples", but all of my samples had at least 1 feature that had counts over 10.

Phylumnorm = cumNorm(Phylumtrim,p= cumNormStat(Phylumtrim))
Error in cumNormStat(Phylumtrim) : Warning empty sample

How should I fix this problem? Or should I just skip sparsefeature?

Thanks

modified 3.9 years ago by James W. MacDonald50k • written 3.9 years ago by yang.xiang10300
2
3.9 years ago by
United States
James W. MacDonald50k wrote:

Does this help explain what you see?

> dat <- matrix(runif(1000), ncol = 10)
> z <- which(rowSums(dat > 0) < 10)
> z
integer(0)
> dat[-z,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> zz <- rowSums(dat > 0) < 10
> dim(dat[!zz,])
[1] 100  10

Sorry, I'm pretty new to both the package and R. Let me try to figure this out.

there are 10 columns (10 samples), and z is the rows that have sum of counts less than 10.

dat [-z, ] means keeping rows that have sum of counts more than 10?

and then zz is what left from dat [-z, ]? Since zz has all the features, dat [-z, ] has 0 features?

So I should change the code or should I skip the function?

Thanks

1

I guess I am trying to show you two things at once.

First, z and zz are identical except I added a call to which() for z. So z tells us 'which of the rows have fewer than 10 items that are greater than zero?', and zz gives us a boolean vector with TRUE if there are fewer than 10 items that are greater than zero, and FALSE otherwise. Since there are no rows that fulfill this criterion, zz is a vector of all FALSE, and z is an integer(0) because there are no TRUE values (e.g., when you say 'which of these rows fulfill this criterion, you get an integer(0) because none of them do so).

When you do something like

dat[-z,]

and z is a vector of numbers, say (1,3,4,5,7) you are really doing

dat[c(-1,-3,-4,-5,-7),]

As an example:

> z <- c(1,3,4,5,7)
> dat<- matrix(1:10, ncol=10, nrow = 10)
> dat[-z,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    2    2    2    2    2    2    2    2    2     2
[2,]    6    6    6    6    6    6    6    6    6     6
[3,]    8    8    8    8    8    8    8    8    8     8
[4,]    9    9    9    9    9    9    9    9    9     9
[5,]   10   10   10   10   10   10   10   10   10    10
> dat[c(-1,-3,-4,-5,-7),]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    2    2    2    2    2    2    2    2    2     2
[2,]    6    6    6    6    6    6    6    6    6     6
[3,]    8    8    8    8    8    8    8    8    8     8
[4,]    9    9    9    9    9    9    9    9    9     9
[5,]   10   10   10   10   10   10   10   10   10    10

But if z is integer(0), then you return NOTHING because the negation of integer(0) is integer(0)! So you need to use zz, which is the boolean representation of what you are trying to do. It is usually safer to use a boolean vector rather than a vector of rows for this reason.

Rather than belabor the point further, as this is really just basic R stuff, I highly recommend you read 'An Introduction to R', which you can find at r-project.org. If you are having problems with basic subsetting of things, then you need to get the basics down first.

The second point I am trying to make is this; you already know that your filtering criterion will not filter out any rows. But then you go ahead and use a subsetting function when you know it should not filter anything out. So you are doing something that you expect will do nothing. I suppose in a pedantic sense it is useful to see if a filtering criterion that isn't supposed to filter will not do so, but from a 'trying to get things done' perspective it isn't really a useful exercise.

I got it. Thank you so much for your explanation.

Are you also familiar with MetagenomeSeq? I also have a question about subsample normalization. So I have 5 treatments and under each treatment I have different matrix. Now I just want to compare matrix within one treatment, should I renormalize my data only for that treatment (take out original data only for that treatment and normalized them again) or should I just take out normalized data for that treatment and create a new MRexperiment?

Thanks

1

Thanks James for the thorough explanation on subsetting.

Yang, you should normalize your sequencing experiment all together and work with that.

Great! So for any pairwise comparison, I just need to subsample my normalized data and compare them.

Thank you so much.