Affycoretools package, HTA 2.0 annotation for non-coding RNA
1
0
Entering edit mode
@marco_aurelio-23595
Last seen 16 months ago
Spain

Hello everybody. I'm newbie in this field. I have recently run the HTA 2.0 annotation using the two packages affycoretools and as outlined in the post https://support.bioconductor.org/p/89308/.

files=list.celfiles(dir,full.names = TRUE)

#Normalization

#Save data
write.exprs(norm_data,file="./BaselineVsControls/Data_normalized.txt")

med<-new("ExpressionSet", exprs=as.matrix(data))

main <- getMainProbes("pd.hta.2.0")[replace(getMainProbes("pd.hta.2.0")$type==1,is.na(getMainProbes("pd.hta.2.0")$type==1),FALSE),]
#Annotate
eset.main <- annotateEset(med, pd.hta.2.0)
eset.main <- eset.main[main$transcript_cluster_id,] eset.main.med <- rowMedians(exprs(eset.main))  Everything runs perfectly, however now I want to export only those that are non-coding. Using Thermo TAC (Transcriptome Analyse Control) program, there is a "non-coding gene" column in the final list. However, I couldn't find a way to do that in Bioconductor. Do you have any idea? Thank you in advance. affy rna hta2.0 r • 514 views ADD COMMENT 2 Entering edit mode @james-w-macdonald-5106 Last seen 8 hours ago United States There isn't a function per se, but you can do it easily enough. > library(pd.hta.2.0) > load(system.file("/extdata/netaffxTranscript.rda", package = "pd.hta.2.0")) > annot <- pData("netaffxTranscript") ## should be OK, but make sure > annot <- annot[featureNames(eset.main),] > eset.noncoding <- eset.main[annot$locustype %in% "NonCoding",]


0
Entering edit mode

OR, you could just add it to the fData slot

fdat <- fData(eset.main)
fdat$LOCUSTYPE <- annot$locustype
fData(eset.main) <- fdat

0
Entering edit mode

When I run

annot <- pData("netaffxTranscript")


I got an error:

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'pData' for signature '"character"'

Do u know how to fix this?

0
Entering edit mode

Yes. Don't do what I said, but instead what I meant.

annot <- pData(netaffxTranscript)



And for your personal edification, you could have figured that out on your own. You got an error saying some blahblahblah about the pData function. At which point you should think 'Huh, what does the help page say about that function? This guy MacDonald seems to have set me wrong.'

And ?pData would present

Retrieve information on experimental phenotypes recorded in eSet and
ExpressionSet-derived classes.

Description:

These generic functions access the phenotypic data (e.g.,
covariates) and meta-data (e.g., descriptions of covariates)
associated with an experiment.

Usage:

phenoData(object)
phenoData(object) <- value
varLabels(object)
varLabels(object) <- value
pData(object)
pData(object) <- value

Arguments:

object: Object, possibly derived from 'eSet-class' or
'AnnotatedDataFrame'.

value: Value to be assigned to corresponding object.



The critical part being where it says that the object has to be either some sort of eSet-class or an AnnotatedDataFrame. And if you then did

> class("netaffxTranscript")
[1] "character"
## and then
> class(netaffxTranscript)
[1] "AnnotatedDataFrame"
attr(,"package")
[1] "Biobase"



You would then know that you need to pass the netaffxTranscript object, rather than the character string "netaffxTranscript".

0
Entering edit mode

Thank you so much for your reply and I apologize. I didn't know about the existence of netaffxTranscript, since I didn't find any efficient guide to analyze HTA 2.0 arrays and it's my first time using R and Bioconductor.

0
Entering edit mode

Hello James. One question. I promise you that I have tried to look for it on the internet and on this forum, but I have found nothing.

When I use my code to do the annotation of the main probes, I do this:

#Read my data

#Create expressionset and annotation
med<-new("ExpressionSet", exprs=as.matrix(data_BvsP))

main <- getMainProbes("pd.hta.2.0")[replace(getMainProbes("pd.hta.2.0")$type==1,is.na(getMainProbes("pd.hta.2.0")$type==1),FALSE),]
eset.main <- annotateEset(med, pd.hta.2.0)
eset.main <- eset.main[main$transcript_cluster_id,] #Create expresion Data annotated data_BvsP <- eset.main@assayData$exprs
data_BvsP <- cbind(data_BvsP,eset.main@featureData@data)

#Remove NA from Symbol
data_BvsP<- data_BvsP[!is.na(data_BvsP$SYMBOL),] #Remove duplicated Genename data_BvsP <- data_BvsP[which(!duplicated(data_BvsP$GENENAME)),]

#Write results
write.table(data_BvsP,file="./BaselineVsProgression/CleanData.txt")


When I do this, I have a lot of "NA" in Gene Symbol column. However, when I use your code, I could see that the dataframe that creates the pData function, the mrnaAsigment column, most of the column does not contain "NA"

Do you know why? And what is your recommendation?

Thank you very much in advance.

0
Entering edit mode

My recommendation is to not do what you are doing.

Why are you writing the data to disk, and then having to recreate the ExpressionSet by hand? That makes no sense to me. You are apparently going through all the motions to generate a HTAFeatureSet, then throwing all that out by writing the results to a text file. The whole rationale for using Bioconductor tools in the first place is to have these nice data containers that hold all your data, that you can operate on as if they were simple data.frame objects.

There is nothing you can do with a text file that you couldn't have done with the original HTAFeatureSet object (well, except for reading into Excel...) and there are any number of things that you could do with the HTAFeatureSet that you can't do with the text file.

Anyway, this isn't a site intended for people to get their code checked by an expert (or god forbid, me), but instead is intended to provide help in using the existing functionality in Bioconductor. Any code that you write is your responsibility.

I personally just write the code to read in the CEL files and annotate the resulting object, and if I need to re-run, I re-run. Although you could also use BiocFileCache, or alternative do my usual bootleg version and just drop .Rdata or .RDS files for the big things that take time to generate and wrap in an if statement:

if(!file.exists("wowthattookalongtime.Rdata")) {
eset <- rma(dat)
eset <- annotateEset(eset, pd.hta.2.0)
annot <- pData("netaffxTranscript")
annot <- annot[featureNames(eset),]
fdat <- fData(eset)
fdat$LOCUSTYPE <- annot$locustype
fData(eset) <- fdat
save(eset, file = "wowthattookalongtime.Rdata")
} else {
}