TCGA data analysis
1
0
Entering edit mode
lily ▴ 20
@lily-11438
Last seen 7 months ago
India

I have RSEM-normalized-log2 transform data downloaded from Firehose and I found that there are number of missing data and filled as NAs. However, when I checked the raw counts for the same datasets, it was given as 0. So, for downstream analysis can I convert all the NAs as 0. Please guide me.

RSEM • 452 views
0
Entering edit mode
@kevin
Last seen 5 hours ago
Republic of Ireland

I would check the accompanying notes to see exactly what post-processing has been performed on these by the Broad Institute. It would seem likely, based on the information that you provide, that they decided to convert values of 0 to NA to avoid producing a 'negative infinity' (log2(0) == -Inf).

However, if you have raw counts already, then why not use those? - these can easily be used with EdgeR or DESeq2.

TCGA raw HTSeq counts are also held at UCSC's Xena Browser.

Kevin

0
Entering edit mode

Thank you for the response. I have taken the normalised data so that I can proceed with the feature selection and machine learning approach directly. But the problem here is there are so many missing values and I am not able to discriminate the two classes with better accuracy, sensitivity and specificity. Also, I have done the imputation method (mean), here I got the very high accuracy. So please suggest me should I take the raw counts data and perform the pre-processing steps.

0
Entering edit mode

So please suggest me should I take the raw counts data and perform the pre-processing steps.

You could try it, if you have time, and then come back with the answer if possible. There will likely be a difference between using the RSEM values and those values produced via a standard EdgeR or DESeq2 normalisation + transformation.

1
Entering edit mode

Let me try.