Question

limma for no replicate array

0

Entering edit mode

linouhao ▴ 20

@linouhao-15901

Last seen 13 months ago

United States

hi, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE2058, this is a two sample array data, can limma used for this? thanks a lot

limma • 887 views

ADD COMMENT • link updated 3.7 years ago by James W. MacDonald 65k • written 3.7 years ago by linouhao ▴ 20

Gordon Smyth · Answer 1 · 2020-09-09

1

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 8 hours ago

United States

To perform inferential comparisons? No. If there are no replicates, all you can get are log fold changes which are super unreliable. This has nothing to do with limma, of course, but is just a statistical fact.

You can use pretty much anything to get the log fold changes, including just downloading the series matrix file and using Excel.

ADD COMMENT • link 3.7 years ago James W. MacDonald 65k

0

Entering edit mode

do you mean just use B/A to get the fold change? thanks a lot

ADD REPLY • link 3.7 years ago linouhao ▴ 20

1

Entering edit mode

Well, the matrix data are already logged, so it would be subtraction, not division. And you should probably use GEOquery to make your life easier. And maybe limma makes things easier, now that I think about it:

> library(GEOquery)
> z <- getGEO("GSE2058")[[1]]
Found 1 file(s)
GSE2058_series_matrix.txt.gz
Using locally cached version: C:\Users\Public\Documents\Wondershare\CreatorTemp\RtmpA9zrp0/GSE2058_series_matrix.txt.gz
Parsed with column specification:
cols(
  ID_REF = col_double(),
  GSM36719 = col_double(),
  GSM93732 = col_double()
)
Using locally cached version of GPL1741 found here:
C:\Users\Public\Documents\Wondershare\CreatorTemp\RtmpA9zrp0/GPL1741.soft 
|=========================================================================| 100%

> head(exprs(z))
     GSM36719    GSM93732
1 -0.18302237 -0.23091036
2 -0.03665611 -0.11760428
3 -0.09913147 -0.17198248
4 -0.00415596 -0.15809532
5 -0.25742658 -0.05848743
6  0.06569681  0.10791701

> design <- matrix(c(1,1,0,1), ncol = 2)
> fit <- lmFit(z, design)
> head(fit$coef)
           x1          x2
1 -0.18302237 -0.04788799
2 -0.03665611 -0.08094817
3 -0.09913147 -0.07285101
4 -0.00415596 -0.15393936
5 -0.25742658  0.19893915
6  0.06569681  0.04222020

> o <- order(abs(fit$coef[,2]), decreasing = TRUE)
> zz <- data.frame(fit$genes[o,], logFC = fit$coef[o,2])
> head(zz, 10)
       ID    GB_ACC SPOT_ID
7972 7972  AL137648        
5603 5603 NM_001147        
7740 7740  BF033678        
7578 7578  AU121991        
599   599 NM_002371        
4884 4884    M73554        
5598 5598 NM_001935        
7608 7608  BF184283        
7843 7843  AW404507        
469   469 NM_014740        
                                                                   GENE_NAME
7972                                                   DKFZp434J1813 protein
5603                                                          angiopoietin 2
7740                       interferon induced transmembrane protein 3 (1-8U)
7578                                upregulated by 1,25-dihydroxyvitamin D-3
599                                      mal, T-cell differentiation protein
4884                           cyclin D1 (PRAD1: parathyroid adenomatosis 1)
5598 dipeptidylpeptidase IV (CD26, adenosine deaminase complexing protein 2)
7608                       interferon induced transmembrane protein 1 (9-27)
7843                                           immunoglobulin kappa constant
469                                                    KIAA0111 gene product
        Locus CloneID      logFC
7972          3088609 -1.3606571
5603   8p23.1 2174441  1.2374534
7740          2949427  1.0654002
7578        1 2888464  1.0485764
599  2cen-q13  504786  1.0248880
4884    11q13 1994739 -1.0098985
5598   2q24.3 2171743 -1.0071710
7608          2902903  0.9837436
7843     2p12 2996770 -0.9635938
469            417827  0.8816404

And the logFC is Drug sensitive - 3 chemotherapy sensitive:

> pData(z)[,8]
[1] "3 chemotherapy sensitive ovarian cancers"
[2] "Drug-sensitive ovarian cancer cell line "

ADD REPLY • link 3.7 years ago James W. MacDonald 65k

1

Entering edit mode

Beware that these are two-color arrays, and the comparisons of interest have already been made between the two channels on each individual array. The processed expression values from each array are already the logFCs of interest. My understanding of the experiment is that comparing the two arrays is not of interest because the samples on the two arrays aren't directly comparable.

ADD REPLY • link 3.7 years ago Gordon Smyth 50k

0

Entering edit mode

thanks a lot, after read your instructions, I read the GSM information carefully again. yes, one GSM36719 is about

Sample GSM36719     Query DataSets for GSM36719
Status  Public on Dec 11, 2005
Title   Chemosensitive and chemoresistant ovarian cancers RNA profiling
Sample type RNA

the other GSM93732

Sample GSM93732     Query DataSets for GSM93732
Status  Public on Jan 26, 2006
Title   Expression profiling of ovarain cancer cell lines
Sample type RNA

Channel 1
Source name Drug-sensitive ovarian cancer cell line
Organism    Homo sapiens
Characteristics A drug sensitive ovarian cancer cell line.
Extracted molecule  total RNA
Label   Cy5

Channel 2
Source name Drug-resistant ovarian cancer cell line
Organism    Homo sapiens
Characteristics A multi-drug resistant ovarian cancer cell line.
Extracted molecule  total RNA
Label   Cy3
Channel 1
Source name 3 chemotherapy sensitive ovarian cancers
Organism    Homo sapiens
Extracted molecule  total RNA

Channel 2
Source name 3 chemotherapy resistant ovarian cancers
Organism    Homo sapiens
Extracted molecule  total RNA

it is really not comparable directly.

so if I want to find some genes if senstive to the drug, for example PSM2， I need just to map the ID(which is just numer ) to gene symbol according to https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?view=data&acc=GPL1741&id=10367&db=GeoDb_blob03, due to the early year, the gene annotation is not that clear,

and I can get two independent results from the 2 GSM, am I right? you also mentioned it is two color array, so can I trust the logtc directly?

thanks alot

ADD REPLY • link updated 3.7 years ago by Gordon Smyth 50k • written 3.7 years ago by linouhao ▴ 20

0

Entering edit mode

you are reaaly a very kind and helpful people, thanks a lot. due to my error description, the geo data seems not comparable directly as Gordon Smyth describe. how do you think of the annotation if you want to find the genes of interest?

ADD REPLY • link 3.7 years ago linouhao ▴ 20

0

Entering edit mode

you are reaaly a very kind and helpful people, thanks a lot. due to my error description, the geo data seems not comparable directly as Gordon Smyth describe. how do you think of the annotation if you want to find the genes of interest?

ADD REPLY • link 3.7 years ago linouhao ▴ 20