Hi, BioC community:
I am relatively new to microarray analysis and initially know about a few important packages like
affy. However, I get familiarized with the basic workflow of microarray analysis such as background correction, normalization. However, I have a gene-level expression data matrix was obtained using
RMA, and I intend to run
PCA for the purpose of dimension reduction for the features.
Essentially, I have gene-level expression data matrix (32830 features of rows, 735 genes of columns), and I have profile data of the target (735 rows and 6 columns). I used data from this source.
after go through few microarray analysis tutorials on Bioconductor, I tried basic workflow as follow:
# load gene expression data matrix HTA20_rma <- load("data/HTA20_RMA.RData") # load sample annotation file (profile data of target variable) pheno=read.csv("data/anoSC1_v11_nokey.csv",stringsAsFactors = FALSE) ## select top 3 genes threesymbs=c("ANXA1","IFIT1","RPS24") #get symbols for the above gene level expression matrix library(org.Hs.eg.db) symbol=as.vector(unlist(mget(gsub("_at","",rownames(eset_HTA20)), envir=org.Hs.egSYMBOL, ifnotfound=NA))) mypreds=rownames(eset_HTA20)[match(threesymbs,symbol)] #find row names corresponding to the 3 genes
I am not quite sure what would be the correct procedure after finished above workflow, seeking possible guidance.
I want to find out which genes have a possible correlation with target data profile. How can I find out the gene that changes in expression? How can I make feature selection for loaded gene expression data matrix? What would be a logical continuation workflow of my above attempt? Is there anyone possibly points me out how to conduct feature selection, PCA analysis on gene-level expression data matrix? Thanks in advance