9 FCS-GSEA analysis

9.1 FCS method introduction

ORA method is easy to conduct, but it will lose information when genes’ differences are slight. For example, if a gene is essential in some pathway but is low-expressed, it will be filtered by fold change cutoff.

Unlike ORA, FCS tools do not set a threshold for differentially expressed genes. Instead, it gives each detected gene a differential expression score and then evaluates whether the scores are more positive or negative than expected by chance for each gene set.

According to article “Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges”:

FCS approaches include GSEA, GlobalTest, sigPathway, SAFE, GSA, PADOG, PCOT2, FunCluster, SAM-GS, Category etc.

Next, we will go through the popular Gene Set Enrichment Analysis (GSEA) tool, which uses FCS permutation approaches to determine whether a gene set is significantly associated with higher or lower scores.

9.2 GSEA intruduction

To do GSEA, user must have an ordered gene rank list (e.g., genes with decreasing logFC order).

Three steps in GSEA:

  1. Calculate the enrichment score (ES): represents the amount to which the genes in the set are over-represented at either the top or bottom of the list. GSEA starts at the top of the ranked gene list to calculate the enrichment score. If a gene is a member of the candidate gene set, it adds to a running sum. Otherwise, it subtracts.

  2. Estimate the statistical significance of the ES: this calculation is done by a phenotypic-based permutation test in order to produce a null distribution for the ES.

  3. Adjust for multiple hypothesis testing for when a large number of gene sets are being analyzed at one time: the enrichment scores for each set are normalized, and a false discovery rate is calculated.

GSEA overview

Figure 9.1: GSEA overview

If you are interested in GSEA detailed procedure, please visit: https://www.pathwaycommons.org/guide/primers/data_analysis/gsea OR https://github.com/crazyhottommy/RNA-seq-analysis/blob/master/GSEA_explained.md

9.3 Basic usage

The simplest arguments are:

  • id: pre-ranked genelist with decreasing order (e.g. logFC order/correlation order). Entrez, Ensembl and Symbol are accepted
  • geneset: gene set is a two-column data frame with term id and gene id. (It’s recommended to use geneset)
  • p_cutoff: numeric of cutoff for both pvalue and adjusted pvalue, default is 0.05.
  • q_cutoff: numeric of cutoff for qvalue, default is 0.15.

9.3.1 1st step: prepare pre-ranked gene list

data(geneList, package = "genekitr")
##       948      1638    158471     10610      6947 100133941 
##  5.780170  5.633027  4.683610  3.875120  3.357670  3.322533

9.3.2 2nd step: prepare gene set

gs <- geneset::getGO(org = "human",ont = "mf")

9.3.3 3rd step: GSEA analysis

gse <- genGSEA(genelist = geneList, geneset = gs)

Now, let’s look at the result:

It is a list mainly including analysis result (gsea_df), input genelist (genelist) and input geneset (geneset).

## [1] "list"
## [1] "gsea_df"  "genelist" "geneset"  "exponent" "org"
##           ID    logfc
## 1       CD36 5.780170
## 2        DCT 5.633027
## 3     PRUNE2 4.683610
## 4 ST6GALNAC2 3.875120
## 5       TCN1 3.357670
## 6       CD24 3.322533
##           mf  gene
## 1 GO:0000009  PIGV
## 2 GO:0000009 ALG12
## 3 GO:0000010 PDSS1
## 4 GO:0000010 PDSS2
## 5 GO:0000014 ENDOG
## 6 GO:0000014 ERCC1
head(gse$gsea_df, 5)
##              Hs_MF_ID                           Description setSize
## GO:0140097 GO:0140097     catalytic activity, acting on DNA     207
## GO:0140098 GO:0140098     catalytic activity, acting on RNA     350
## GO:0008094 GO:0008094 ATP-dependent activity, acting on DNA      97
## GO:0016887 GO:0016887               ATP hydrolysis activity     422
## GO:0004386 GO:0004386                     helicase activity     147
##            enrichmentScore       NES       pvalue     p.adjust       qvalue
## GO:0140097      -0.5894504 -2.428361 7.555161e-17 8.884870e-14 7.380200e-14
## GO:0140098      -0.4747857 -2.065130 4.912469e-14 2.888532e-11 2.399353e-11
## GO:0008094      -0.6697833 -2.480446 7.069515e-13 2.771250e-10 2.301933e-10
## GO:0016887      -0.4341955 -1.929356 7.323023e-12 2.152969e-09 1.788359e-09
## GO:0004386      -0.5805801 -2.284009 1.724354e-11 3.583755e-09 2.976839e-09
##            rank                   leading_edge
## GO:0140097 1298  tags=30%, list=7%, signal=28%
## GO:0140098 2872 tags=38%, list=15%, signal=33%
## GO:0008094 1172  tags=37%, list=6%, signal=35%
## GO:0016887 2528 tags=29%, list=14%, signal=26%
## GO:0004386 1295  tags=31%, list=7%, signal=29%
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         geneID
## GO:0140097                                                                                                                                                                                                                                                                                                                                                                                                                                                                          3980/348654/1786/1105/10856/1660/7150/63922/5889/10728/57697/7156/27301/7517/10146/5932/56652/3978/54984/5810/5424/54107/8607/5422/2956/7374/11144/7153/5985/4436/8458/2237/5426/64858/5982/1763/5984/10714/1736/5111/55345/4172/84515/8091/83990/4171/4176/5983/10721/79075/55247/4174/4173/79915/4175/8438/54821/5427/641/5888/7516/9156
## GO:0140098 138428/122665/5435/60625/5976/57472/55278/10623/3396/167227/7015/91801/8731/8846/80745/80119/10557/55695/10556/79828/60528/91893/29883/55661/339175/23517/22907/80746/22894/171568/51163/11128/1653/6832/57505/55308/55794/5437/5511/79066/221078/11102/115708/85463/51010/114049/112479/5438/54512/201626/9836/25885/23016/124454/57062/55798/221830/96764/9533/117246/10799/29063/10940/79039/79731/27292/55280/661/54913/51651/9879/55687/55621/57696/56339/29960/55157/10849/56931/246243/54931/64794/57647/1660/9775/23210/54606/2193/54148/57697/81875/9704/5433/27161/56919/1662/24140/79691/10146/54888/10621/10212/114034/28960/92935/64216/10171/4839/64425/4234/56915/10436/90459/25819/80324/10785/10622/8886/1665/51728/10535/11218/10056/84172/87178/10248/23404/2237/5393/9188/51106/5557/83990/9156
## GO:0008094                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      1105/10856/1660/63922/5889/57697/7517/10146/56652/8607/2956/11144/7153/5985/4436/8458/5982/1763/5984/4172/84515/83990/4171/4176/5983/10721/79075/4174/4173/79915/4175/8438/54821/641/5888/7516
## GO:0016887                                                                                                                        22880/3327/80119/10131/730211/23347/55661/23517/3313/22907/3308/55636/3799/64222/1653/6832/5704/9126/55308/55794/391634/23400/154664/3312/81570/5700/338322/5706/4643/57062/4292/5981/6782/3320/10694/23195/79039/547/8243/3329/5702/3324/55210/9879/6891/6950/57696/10575/1105/10856/3309/7203/64794/57647/1660/908/9775/22824/63922/54606/57697/9704/56919/1662/3800/23046/2181/10146/10576/10212/2182/10051/56652/1836/10112/10592/3323/51182/490/8607/8886/1665/11218/22948/5985/81930/3835/4436/10808/80179/9188/166378/3832/11004/9928/5982/1763/9585/5984/3833/63979/146909/4172/9493/84515/83990/56992/4171/4176/5983/4174/4173/79915/4175/9319/91607/8438/29028/4998/54821/641/5888
## GO:0004386                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  9879/57696/1105/10856/64794/57647/1660/9775/63922/54606/57697/9704/56919/1662/10146/10212/56652/8607/8886/1665/11218/5985/8458/9188/5982/1763/5984/55345/4172/84515/83990/4171/4176/5983/10721/79075/4174/4173/4175/3070/91607/8438/54821/641/5888
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      geneID_symbol
## GO:0140097                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              LIG3/GEN1/DNMT1/CHD1/RUVBL2/DHX9/TOP1/CHTF18/RAD51C/PTGES3/FANCM/TOP3A/APEX2/XRCC3/G3BP1/RBBP8/TWNK/LIG1/PINX1/RAD1/POLD1/POLE3/RUVBL1/POLA1/MSH6/UNG/DMC1/TOP2A/RFC5/MSH2/TTF2/FEN1/POLE/DCLRE1B/RFC2/DNA2/RFC4/POLD3/DKC1/PCNA/ZGRF1/MCM3/MCM8/HMGA2/BRIP1/MCM2/MCM7/RFC3/POLQ/DSCC1/NEIL3/MCM5/MCM4/ATAD5/MCM6/RAD54L/ERCC6L/POLE2/BLM/RAD51/XRCC2/EXO1
## GO:0008094                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 CHD1/RUVBL2/DHX9/CHTF18/RAD51C/FANCM/XRCC3/G3BP1/TWNK/RUVBL1/MSH6/DMC1/TOP2A/RFC5/MSH2/TTF2/RFC2/DNA2/RFC4/MCM3/MCM8/BRIP1/MCM2/MCM7/RFC3/POLQ/DSCC1/MCM5/MCM4/ATAD5/MCM6/RAD54L/ERCC6L/BLM/RAD51/XRCC2
## GO:0004386                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      DDX46/DDX55/CHD1/RUVBL2/DDX31/DHX37/DHX9/EIF4A3/CHTF18/DDX56/FANCM/DHX34/DHX33/DDX10/G3BP1/DDX39A/TWNK/RUVBL1/DDX18/DHX15/DDX20/RFC5/TTF2/DDX21/RFC2/DNA2/RFC4/ZGRF1/MCM3/MCM8/BRIP1/MCM2/MCM7/RFC3/POLQ/DSCC1/MCM5/MCM4/MCM6/HELLS/SLFN11/RAD54L/ERCC6L/BLM/RAD51
##            Count
## GO:0140097    62
## GO:0140098   134
## GO:0008094    36
## GO:0016887   122
## GO:0004386    45

About the gsea_df result

  • Description: Gene set name

  • setSize: number of genes with gene-level statistic values. To be more specific, if we input a pathway gene set with 58 genes (e.g. HALLMARK_MYC_TARGETS_V2 ), while our gene list only have 54 of it, so the result will only show setSize = 54

  • enrichmentScore: also called ES, same as in Broad GSEA implementation. It reflects the degree to which a gene set is overrepresented at the top or bottom of a ranked list of genes.

  • NES: normalized enrichment score is the primary statistic for examining gene set enrichment results. By normalizing the enrichment score, GSEA accounts for differences in gene set size and in correlations between gene sets and the expression dataset. Therefore, NES can be used to compare analysis results across gene sets. A positive normalized enrichment scores (NES) will indicate that genes in set S will be mostly represented at the top of your list (logFC > 0 or up-regulated genes).

  • rank: The position in the ranked list at which the maximum enrichment score occurred. If gene sets achieve the maximum enrichment score near the top or bottom of the ranked list, the rank at max is either very small or very large.

  • leading_edge: includes three statistics
    • tags: The percentage of gene hits before (for positive ES) or after (for negative ES) the peak in the running enrichment score, which indicates the percentage of genes contributing to the enrichment score.
    • list: The percentage of genes in the ranked gene list before (for positive ES) or after (for negative ES) the peak in the running enrichment score. This gives an indication of where in the list the enrichment score is attained.
    • signal: If the gene set is entirely within the first N positions in the list, then the signal strength is maximal or 100%. If the gene set is spread throughout the list, then the signal strength decreases towards 0%.
  • geneID & geneID_symbol: If input symbols are mixed with aliases, a new column “geneID_symbol” will be added; if all symbols are official, only “geneID” column will return

9.4 Advanced usage

9.4.1 Additional arguments

Please refer to ORA part

9.4.2 Simplify GO GSEA result

Please refer to ORA part

9.5 Export GSEA result

Genekitr provides an easy way to export analysis result for editing and sharing.

Multiple data are saved as different sheets in one excel file. Besides, column names are automatically formatted with bold style.

genekitr::expoSheet(data_list = gse,
                    data_name = names(gse),
                    filename = "gsea_result.xlsx",
                    dir = "./")
Export result

Figure 9.2: Export result