mexpress

a methylation & expression visualization tool

What?

MEXPRESS is a data visualization tool designed for the easy visualization of TCGA expression, DNA methylation and clinical data, as well as the relationships between them. You can find all the details in our publication.

How?

Easy! Just pick a gene and cancer type you are interested in and MEXPRESS will show you the corresponding expression, DNA methylation and clinical data, as well as the relationship between these data sets.

Getting started

This is the MEXPRESS start screen:

This the MEXPRESS start screen.

Selecting data

On the left side you will notice the text input field where you can enter a gene name (both HGNC symbols and Ensembl gene IDs are recognized) as well as the list of cancer types you can choose from.

Let's say you are interested in the expression and methylation data of the GSTP1 gene in prostate cancer. Simply enter GSTP1 in the text field and select prostate adenocarcinoma (PRAD).

MEXPRESS is now ready to plot.

Creating the plot

All that is left to do after selecting a gene and cancer type is clicking the plot button and the data you selected will be visualised. Samples are arranged from left to right, while the different data types (clinical, expression and methylation) are arranged from the top to the bottom of the plot.

Hovering over one of the methylation line plots will bring up the ID of the corresponding probe. Click on a methylation line plot to fix the probe ID on the figure. Click it again to remove the probe ID (or click another methylation line plot). You can also highlight the promoter probes by clicking the button right above the legend.

If you would like to download the figure you just created, simply click the png or svg button in the upper right corner (depending on which image format you want).

This is the final MEXPRESS plot.

If you would like to emphasize the probes that are located in a gene's promoter region, you click the highlight promoter probes button. This will turn the highlighting of the promoter probes on as you can see in the image below.

Highlighting the promoter probes.

About the plot

At the top of the figure you find the HGNC gene symbol, the Ensembl gene ID and the genomic location of GSTP1. The gene symbol link will open the GeneCards page about GSTP1 in a new tab or window. The genomic location link on the other hand will open the UCSC genome browser at this location in a new tab or window.

Next up is the legend of the plot, which adapts to the cancer type you selected, and below the legend is the actual plot. From top to bottom you will see the clinical data, the expression values and the methylation data. Let's take a detailed look at each part.

expression data

A closer look at the expression data.

The height of the orange line represents the logarithm of the level 3 RNA-sequencing data in TCGA (normalized RNASeqV2 values per gene). The expression data forms the basis of the whole plot, because the samples are ranked based on their expression value for the gene we selected with the highest expression on the left side and the lowest on the right.

methylation data

A closer look at the methylation data.

On the left hand side, you see the GSTP1 gene, two CpG islands (in green) and the different GSTP1 transcripts. The arrow on the gene indicates its direction. When–like in this example–the arrow points down, the gene is located on the + strand. If it points up, the gene lies on the - strand.

On the right hand side, you see all the Infinium 450k probes that are linked to GSTP1. The height of the blue lines indicates the beta value for a probe. When there is no data available for a certain probe, no line is plotted and instead it simply says "no data". Gaps in the line indicate that there was no methylation data for one or more samples.

Just as for the expression data, the samples are ranked along the x axis (they were ordered based on their GSTP1 expression value). Thin blue lines connect the probes to their respective genomic locations. Hovering your mouse cursor over a methylation data plot will highlight the corresponding probe on the left hand side and the name of the probe will be shown. You can fix the highlighting of a probe by clicking on its data plot. Clicking the same data plot a second time will clear the highlighting.

Highlighting a methylation probe.

Once you have fixed a probe's highlighting by clicking on the data plot, you can click on the probe's name to reveal the probe's genomic location and annotation.

Revealing a probe's annotation.

The values on the far right represent the Pearson product-moment correlation coefficient between the methylation values for a probe and the expression values. For GSTP1 you can see that there are quite a few probes with a strong negative correlation between methylation and expression, indicating that GSTP1 expression might be controlled through DNA methylation (which has already been described, see Millar et al., 1999). As the plot's legend explains, the asterisks give you an indication of the significance of the correlations.

clinical data

A closer look at the clinical data.

For every cancer type, the most relevant clinical parameters have been extracted from TCGA. In order to represent all the data as barplots, some clinical parameters have been converted to numeric values. One example is the pathologic stage (not shown in this example) where values such as Stage IIA and Stage IV were converted to the values 2 and 4 respectively.

The names of the different clinical parameters are listed on the left and the Pearson product-moment correlation values or the p values for Wilcoxon rank-sum test can be found on the right. Whenever a clinical parameter contains only two levels (e.g. male or female) a p value is calculated instead of a correlation coefficient. This p value indicates the difference in expression between the two groups for this parameter. For the sample type parameter, the expression is always compared between the normal and tumor samples.

The sample type annotation can be found in the legend of the plot. In this case the dark blue samples are normal whereas the light blue bars indicate tumor samples. Again, the findings from Millar et al. (1999) are confirmed, because it is clear that the normal samples tend to have a higher GSTP1 expression than the tumor samples.

As described earlier, the samples are sorted based on their expression value by default. By clicking on the name of the annotation parameter you're interested in, you can rearrange the samples by the annotation you just selected. So if you would for example like to compare age to the expression and methylation of a certain gene or to the other clinical parameters, you just have to click on "age at diagnosis" and the samples will be reordered.

Let's get started!