psiplot is an R package for generating plots of percent spliced-in (PSI) values of alternatively-spliced exons that were computed by vast-tools, an RNA-Seq pipeline for alternative splicing analysis. The plots are generated using ggplot2
.
See Releases for the latest stable release or get the most up-to-date development version via devtools:
psiplot takes as input the PSI and/or cRPKM results generated by vast-tools (e.g. after running vast-tools combine
or vast-tools diff
). For example,
psi <- read.table("INCLUSION_LEVELS_FULL-Mmu53.tab", header = TRUE, sep = "\t",
stringsAsFactors = FALSE)
crpkm <- read.table("cRPKM-Mmu53.tab", header = TRUE, sep = "\t",
stringsAsFactors = FALSE)
This README uses the provided sample datasets, psi
, crpkm
and crpkm_counts
, as example input data. We also provide an optional sample configuration table, config
, which is used for customizing psiplots.
Events can be plotted individually as a scatterplot (a.k.a “psiplots”) or collectively as a heatmap. A function to produce psiplots for multiple events is also provided.
The function plot_event()
generates a plot of a single event:
Alternatively, to plot an event by gene name (for example, MTA1):
Several events can be plotted together with the function plot_multievent()
. This allows users to compare the inclusion patterns of small groups of events:
In addition, cRPKM expression values generated by vast-tools can also be plotted in a similar manner using the function plot_expr()
:
The function plot_multi
generates a heatmap of multiple events using geom_tile
. Alternatively, if you have the R package pheatmap
or gplots
installed, you can set usepkgs = 'pheatmap'
or usepkgs = 'gplots'
to generate a heatmap. By default, rows and columns will be ordered by hierarchical clustering. One benefit of using pheatmap
or gplots
is the ability to include group annotations from the config file:
Other use cases:
There are two ways to customize the plots: using a configuration file or using R arguments.
config
fileIn vast-tools plot
, an optional config file can be used to customize the plots’ visual appearance. The same config file can be supplied here as well.
plot_event(psi[1,], config = "/path/to/config")
# config can also be pre-loaded into a data frame
cfg <- read.table("/path/to/config", header = TRUE, sep = "\t", stringsAsFactor = FALSE)
plot_event(psi[1,], config = cfg)
plot_multi(psi, config = cfg)
plot_expr(crpkm[1,], config = cfg)
This file is tab-delimited and must be manually created. For example:
# sample config data
config
#> # A tibble: 8 x 5
#> Order SampleName SubgroupName GroupName RColorCode
#> <int> <chr> <chr> <chr> <chr>
#> 1 8 Sample8 Subgroup5 Tissues black
#> 2 1 Sample1 Subgroup1 ESC red
#> 3 2 Sample2 Subgroup1 ESC red
#> 4 3 Sample3 Subgroup2 ESC red
#> 5 4 Sample4 Subgroup3 Neural blue
#> 6 5 Sample5 Subgroup3 Neural blue
#> 7 6 Sample6 Subgroup4 Muscle darkgreen
#> 8 7 Sample7 Subgroup4 Muscle darkgreen
The header is required, but the order of the columns is flexible. There are two required columns and three optional:
Required:
groupmean=TRUE
).Optional:
subg=TRUE
). See Subgrouping samples for more details.colors()
).Tips:
col
option.NA
, then a default ggplot2 color palette will be used for annotating sample groups. For example:config2 <- config
config2$RColorCode <- NA
head(config2)
#> # A tibble: 6 x 5
#> Order SampleName SubgroupName GroupName RColorCode
#> <int> <chr> <chr> <chr> <lgl>
#> 1 8 Sample8 Subgroup5 Tissues NA
#> 2 1 Sample1 Subgroup1 ESC NA
#> 3 2 Sample2 Subgroup1 ESC NA
#> 4 3 Sample3 Subgroup2 ESC NA
#> 5 4 Sample4 Subgroup3 Neural NA
#> 6 5 Sample5 Subgroup3 Neural NA
plot_event(psi[1,], config2)
plot_multi
, specifying a config file will disable clustering of samples and instead use the config order.subg
is set to TRUE
. Similarly, if the config file includes subgroups, but subg
is set to FALSE
, they will not be used.The colors and other graphical parameters can also be configured in R via arguments. The plot_*
family of methods provide a limited set of graphical arguments and can be used in conjunction with a config file. See the help pages for each method (e.g. ?plot_event
) and for more details on the available options.
For example, the following command uses config
to customize the colors and groupings, sets the point symbol, and restricts the y-axis to (20, 80):
Note: Samples 6, 7, 8 fall outside of this range and thus are not shown.
Because the plot_*
methods return ggplot2 objects, you can further customize the plots by appending additional aesthetics or themes (if the option is not supported within the method itself). For example, to increase the text size of the legend:
library(ggplot2)
plot_event(psi[1,], config = config) + theme(legend.text = element_text(size = 20))
Certain options may interfere or overwite those already set within the plot_*
methods, so please use at your discretion.
The qual
argument can be used to indicate the minimum value of the first vast-tools quality score that a PSI value needs to have in order to be accepted. PSI values with lower quality scores will be coerced to NA.
The possible values for this score are, from worse to better: N
, VLOW
, LOW
, OK
, and SOK
. By default, the minimum quality accepted is VLOW
. For more information on vast-tools quality scores, please see the description of vast-tools output (column 8).
The subg
argument allows pooling samples into subgroups if there is a SubgroupName column included in the supplied config
file (see Customizing plots). The PSI of a subgroup is the mean of the PSIs of its individual samples, after filtering out the samples with quality score below the threshold indicated by qual
(see Filtering samples by quality score).
Alternatively, rather than plotting the average PSI, the individual values can be plotted by setting the option subj.show
to “all” or “beeswarm” (default is “mean”). The latter uses ggbeeswarm to make beeswarm scatterplots. One limitation of beeswarm plots is that error bars cannot be plotted, at least to my knowledge. If error bars are needed, choose “all”:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
config2 <- config %>%
mutate(
SubgroupName = ifelse(Order <= 5, "Subgroup1", "Subgroup2")
)
plot_event(psi[1,], config = config2, subg = TRUE, subg.show = "all")
To illustrate a subgrouped plot using beeswarm plots, we’ll constrat a larger PSI table with 100 samples:
set.seed(416)
large_psi <- tibble(
GENE = "TEST",
EVENT = "HsaEx0000",
COORD = "chr1:100-200",
LENGTH = 100,
FullCO = "chr1:1,100-200,210",
COMPLEX = "S",
SampleName = paste0("Sample", sprintf("%0.3d", 1:100)),
PSI = c(rbeta(30, 10, 90), rbeta(30, 10, 11), rbeta(40, 9,2))*100,
Q = "OK,OK,OK,OK,S@50,50" # dummy quality score - not used
)
psi_temp <- tidyr::spread(large_psi, SampleName, PSI) %>%
select(-Q)
qual_temp <- large_psi %>%
mutate(SampleName = paste0(SampleName, "-Q")) %>%
select(-PSI) %>%
tidyr::spread(SampleName, Q)
# Put everything together
large_config <- tibble(
SampleName = large_psi$SampleName,
GroupName = c(rep("ESC", 30), rep("Muscle", 30), rep("Brain", 40)),
SubgroupName = c(rep("Subgroup1", 10),
rep("Subgroup2", 20),
rep("Subgroup3", 30),
rep("Subgroup4", 40))
)
large_psi <- inner_join(psi_temp, qual_temp)
#> Joining, by = c("GENE", "EVENT", "COORD", "LENGTH", "FullCO", "COMPLEX")
large_psi <- large_psi[, c(names(large_psi)[1:6],
sort(names(large_psi)[7:ncol(large_psi)]))]
plot_event(large_psi[1,], large_config, subg = T, subg.show = "beeswarm")