4 Visualising Results for Multiple Stages
As well as visualising the results for a single stage, we might want to visualise the results over multiple stages. The basic overall results can be retrieved from a single call to the WRC results API, but to view the stage times and rankings across multiple stages requires retrieving detailed for each stage and then combining it into a single dataframe.
4.1 Load Base Data
To get the splits data from a standing start, we can load in the current season list, select the rally we want, look up the itinerary from the rally, extract the sections and then the stages and the retrieve the stage ID for the stage we are interested in.
To begin with, load in our WRC API helper functions:
source('code/wrc-api.R')
library(tidyr)
Now let’s grab some data:
= get_active_season()
s = get_eventId_from_name(s, 'arctic')
eventId
= get_rally_entries(eventId)
entries
= get_itinerary(eventId)
itinerary = get_sections(itinerary)
sections = get_stages(sections)
stages = get_stages_lookup(stages)
stages_lookup
= get_stage_list(stages)
stage_list = stages$code
stage_codes # To generate stage codes as an ordered factor:
# factor(stages$code, levels = stages$code)
4.2 Retrieving Mutliple Stage Results
To being with, lets get the overall results at the end of each stage:
= get_multi_overall(stage_list)
multi_overall_results
%>% tail(2) multi_overall_results
## entryId stageTimeMs stageTime penaltyTimeMs penaltyTime totalTimeMs
## 530 21542 11203500 PT3H6M43.5S 0 PT0S 11203500
## 531 21569 11754200 PT3H15M54.2S 0 PT0S 11754200
## totalTime position diffFirstMs diffFirst diffPrevMs diffPrev stageId
## 530 PT3H6M43.5S 51 3773900 PT1H2M53.9S 696500 PT11M36.5S 1749
## 531 PT3H15M54.2S 52 4324600 PT1H12M4.6S 550700 PT9M10.7S 1749
4.2.1 Reshaping Overall Position Data
We can reduce the amount of data by casting the long raw result to a wide format, widening the data on a particular field of interest. For example, we can widen generate a wide dataframe describing overall positions, \({_S}o\) at the end of each stage, where a particular driver’s position is given as \({_S}o_i\):
= multi_overall_results %>%
multi_overall_wide_pos get_multi_stage_generic_wide(stage_list,
'position')
%>% head(2) multi_overall_wide_pos
## entryId 1747 1743 1750 1751 1748 1745 1744 1742 1746 1749
## 1 21530 9 9 9 8 8 7 6 22 20 20
## 2 21531 5 5 5 5 5 5 5 5 5 5
4.2.2 Reshaping Overall Rally Time Data
We can also create a wide format report of the overall times, where each column gives the overall, accumulated rally time up to and including each stage, \({_S}T\); each cell then represents the accumulated time for a particular driver, \({_S}T_i\).
The times themselves appear in units of milliseconds, so first create a column corresponding to time in seconds, then widen using those values:
= multi_overall_results %>%
multi_overall_results mutate(totalTimeS = totalTimeMs/1000)
= multi_overall_results %>%
multi_overall_wide_time get_multi_stage_generic_wide(stage_list,
'totalTimeS')
%>% head(2) multi_overall_wide_time
## entryId 1747 1743 1750 1751 1748 1745 1744 1742 1746 1749
## 1 21530 980.5 1960.5 2802.5 3351.0 4102.2 4955.9 5512.6 6876.6 7491.6 8095.8
## 2 21531 974.5 1942.7 2788.5 3335.1 4085.3 4938.9 5491.5 6275.6 6883.1 7491.1
We note that with stages presented in order, the rally time is strictly increasing across the rows.
We further note that we can derive stage times from the overall rally times by calculating the columnwise differences \({_S}t={_S}T-{_{S-1}}T: 1<S<N\) for an \(N\) stage rally.
4.2.3 Reshaping Time to First Data
Another useful time is the time to first, which is to say, the gap, \({_S}GAP_i\). Noting that the overall rally leader may change at the end of each stage, this measure is essentially a rebasing measure relative to a particular position rather than a particular driver:
= multi_overall_results %>%
multi_overall_results mutate(diffFirstS = diffFirstMs/1000)
= multi_overall_results %>%
multi_overall_wide_gap get_multi_stage_generic_wide(stage_list,
'diffFirstS')
%>% head(2) multi_overall_wide_gap
## entryId 1747 1743 1750 1751 1748 1745 1744 1742 1746 1749
## 1 21530 22.7 49.8 57.3 53.1 54.2 70.6 76.8 664.5 666.1 666.2
## 2 21531 16.7 32.0 43.3 37.2 37.3 53.6 55.7 63.5 57.6 61.5
However, we could also calculate the gap to leader from the overall times, first by identifying the minimum accumulated time at each stage (that is, the minimum time, excluding null values, in each overall time column) and then by subtracting those values from each row in the overall times dataframe, which is to say:
\[ \textrm{GAP}_{i} = {_S}t_i - \textrm{min}({_S}T) \]
If we try to subtract a list of values (for example, \(\forall S \in [1 \le S \le N]: \textrm{min}({_S}T)\)) from an R dataframe, we need to tell R how we want that subtraction performed. Internally, the dataframe is represented as a long list of values made up from values in the first column, then the second, and so on. If we subtract a list of N values from the dataframe, the values are selected from the first N items in this long serialised version of the dataframe, then the next N values and so on.
So to subtract a “dummy” row of values from the dataframe, we need another approach. The purrr::map2df()
function allows us to apply a function, in this case the subtraction -
function, with a set of specified values we want to subtract, from each row in the dataframe.
So let’s create a set of values representing the minimum overall time in each stage. The matrixStats::colMins()
function will find the minimum values by row from a matrix, so cast the stage time columns from the wide dataframe to an appropriately sized matrix and then find the minimum in each column, ignoring null values:
= as.matrix(multi_overall_wide_time[,as.character(stage_list)],
overall_m ncols=length(stage_list))
= matrixStats::colMins(overall_m, na.rm=TRUE)
mins_overall
mins_overall
## [1] 957.8 1910.7 2745.2 3297.9 4048.0 4885.3 5435.8 6212.1 6825.5 7429.6
We can now subtract this “dummy row” of values from each row in the dataframe to find the gap to leader for each row on each stage:
::map2_df(multi_overall_wide_time[,as.character(stage_list)],
purrr`-`) %>% head(2) mins_overall,
## # A tibble: 2 x 10
## `1747` `1743` `1750` `1751` `1748` `1745` `1744` `1742` `1746`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 22.7 49.80000 57.3 53.10000 54.20000 70.60000 76.8 664.5 666.1
## 2 16.7 32 43.3 37.20000 37.3 53.60000 55.70000 63.5 57.6
## # … with 1 more variable: 1749 <dbl>
Comparison with the “diffToFirst” times should show them to be the same.
4.2.4 Mapping Stage and Driver Identifiers to Meaningful Labels
To improve the look of the table, we might use stage codes and driver codes to label the columns and identify the rows.
To start with, we can map the column names that correspond to stage codes via a lookup list of stage ID to stage code values:
= function(df, stage_list) {
map_stage_codes # Get stage codes lookup id->code
= get_stages_lookup(stages, 'stageId', 'code')
stages_lookup_code
#https://stackoverflow.com/a/34299333/454773
::rename(df, replace = stages_lookup_code,
plyrwarn_missing = FALSE)
}
= multi_overall_wide_time %>%
multi_overall_wide_time map_stage_codes(stage_list)
%>% head(2) multi_overall_wide_time
## entryId SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 21530 980.5 1960.5 2802.5 3351.0 4102.2 4955.9 5512.6 6876.6 7491.6 8095.8
## 2 21531 974.5 1942.7 2788.5 3335.1 4085.3 4938.9 5491.5 6275.6 6883.1 7491.1
We can also create a function to replace the entry ID with the driver code. Note also the select statement at the end that puts the columns into a sensible order:
= get_car_data(entries)
cars
= function(df, cars){
map_driver_names %>%
df merge(cars[,c('entryId','code')],
by='entryId') %>%
# Limit columns and set column order
select(-'entryId') %>%
# Move last column to first
select('code', everything())
}
= multi_overall_wide_time %>%
multi_overall_wide_time map_driver_names(cars)
%>% head(2) multi_overall_wide_time
## code SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 OGI 980.5 1960.5 2802.5 3351.0 4102.2 4955.9 5512.6 6876.6 7491.6 8095.8
## 2 EVA 974.5 1942.7 2788.5 3335.1 4085.3 4938.9 5491.5 6275.6 6883.1 7491.1
4.2.5 Rebasing Overall Times
We can rebase the overall times with respect to a particular driver in the normal way, calculating the difference between each row and the row corresponding to a specified driver:
= multi_overall_wide_time[2,]$code
example_driver
= rebase(multi_overall_wide_time,
overall_wide_time_rebased
example_driver, stage_codes,id_col='code')
%>% head(3) overall_wide_time_rebased
## code SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 OGI 6.0 17.8 14.0 15.9 16.9 17.0 21.1 601.0 608.5 604.7
## 2 EVA 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## 3 NEU -4.3 -2.2 -12.5 -8.0 -2.6 -18.7 -17.5 -37.6 -36.5 -41.7
4.2.6 Finding Changes in Rebased Gaps Across Stages
The rebasing operation essentially allows us to select a row of times for one particular driver and then subtract that row from every other row to give us a direct comparison of the gap between a specified driver and every other driver.
But we can also perform a consecutive column-wise differencing operation on the rebased times that allows to see how much time was gained or relative to a particular driver in going from one stage to the next (observant readers may note that this results in the rebased stage time for each stage…).
To subtract one column from the next, create two offset dataframes, one containing all but the first stage (first stage column) and one containing all but the last stage (final stage column). If we subtract one dataframe from the other, it gives us our column differences. Inserting the original first column back in its rightful place gives us the columnwise differences table:
#https://stackoverflow.com/a/50411529/454773
= overall_wide_time_rebased
df
# [-1] drops the first column, [-ncol()] drops the last
= df[,stage_codes][-1] - df[,stage_codes][-ncol(df[,stage_codes])]
df_
# The split time to the first split is simply the first split time
1]] = df[stage_codes[1]]
df_[stage_codes[# Return the dataframe in a sensible column order
%>% select(stage_codes) %>% head(3) df_
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(stage_codes)` instead of `stage_codes` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 6.0 11.8 -3.8 1.9 1.0 0.1 4.1 579.9 7.5 -3.8
## 2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## 3 -4.3 2.1 -10.3 4.5 5.4 -16.1 1.2 -20.1 1.1 -5.2
(A similar technique could be used to recreate stage times from the overall times.)
4.3 Visualising Overall Results
The overall stage results provides information regarding the overall times and positions at the end of each stage; the rebased overall times provide us with gap information from a specified driver to every other driver.
So how might we use exploratory data visualisation techniques to support a conversation with that data or highlight potential stories hidden within it?
4.3.1 Visualising First Position
One way of enriching the wide position table might be to highlight the driver in first position at the end of each stage. We can do this using the formattable::formattable()
function.
First, let’s tidy up the overall position table:
= multi_overall_wide_pos %>%
multi_overall_wide_pos map_stage_codes(stage_codes) %>%
map_driver_names(cars)
%>% head(2) multi_overall_wide_pos
## code SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 OGI 9 9 9 8 8 7 6 22 20 20
## 2 EVA 5 5 5 5 5 5 5 5 5 5
We can also reorder the table by the final stage position, as described by the last stage code in the stage_codes
list. However, because we want to sort on a column name as provided by a variable, we need to use the !!
operator to force the evaluation of the single variable value to the column name symbol (as.symbol()
):
= multi_overall_wide_pos %>%
multi_overall_wide_pos ::arrange(!!as.symbol(stage_codes[length(stage_codes)]))
dplyr
%>% head(2) multi_overall_wide_pos
## code SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 TÄN 1 1 1 1 1 1 1 1 1 1
## 2 ROV 3 3 3 2 2 2 2 2 2 2
Create a function to highlight the first position car:
library(formattable)
= function (...)
highlight_first
{formatter("span",
style = function(x) ifelse(x==1,
style(display = "block",
padding = "0 4px",
`color` = "black",
`column-width`='4em',
`border-radius` = "4px",
`background-color` = 'lightgrey'),
style()))
}
And then use that function to help style the table:
%>%
multi_overall_wide_pos head(3) %>%
formattable(# Align values in the center of each column
align='c',
list(area(col = stage_codes) ~ highlight_first()))
code | SS1 | SS2 | SS3 | SS4 | SS5 | SS6 | SS7 | SS8 | SS9 | SS10 |
---|---|---|---|---|---|---|---|---|---|---|
TÄN | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
ROV | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
NEU | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 |
4.3.2 Visualising Position Changes Over a Rally
Another useful way of summarising the positions is to chart showing the evolution of position changes.
The chart is constructed most straightforwardly from tidy (long format) data:
<- multi_overall_wide_pos %>%
overall_pos_long_top10 head(10) %>%
#gather(key ="Stage",
# value ="Pos",
# stage_codes)
# pivot longer replaces gather
pivot_longer(c(stage_codes),
names_to ="Stage",
values_to ="Pos")
%>% head(3) overall_pos_long_top10
## # A tibble: 3 x 3
## code Stage Pos
## <chr> <chr> <int>
## 1 TÄN SS1 1
## 2 TÄN SS2 1
## 3 TÄN SS3 1
It will be convenient for the stage codes to be represented as ordered factors:
= overall_pos_long_top10 %>%
overall_pos_long_top10 mutate(Stage = factor(Stage,
levels = stage_codes))
$Stage[1] overall_pos_long_top10
## [1] SS1
## Levels: SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
We can then create a bump chart style plot showing how each driver’s positioned changed across stages:
library(ggplot2)
= 1:max(overall_pos_long_top10$Pos)
pos_range = ggplot(overall_pos_long_top10, aes(x=Stage, y=Pos)) +
g_pos geom_line(aes(group = code)) +
# Invert scale and relabel y-axis
# https://stackoverflow.com/a/28392170/454773
scale_y_continuous(trans = "reverse",
breaks = pos_range) +
theme_classic()
g_pos
We can producing a cleaner chart by adding driver labels to the start and end of each line using the directlabels:geom_dl()
function, as well as dropping the axes:
library(directlabels)
+
g_pos geom_dl(aes(label = paste0(' ',code)), # Add space before label
# Add label at end of line
method = list('last.bumpup',
# cex is text label size
cex = 0.5)) +
geom_dl(aes(label = paste0(code, ' ')), # Add space before label
# Add label at start of line
method = list('first.points', cex = 0.5)) +
theme_void()
In the above chart, you may notice a gap at first stage position 6 where Ogier was originally placed. A more robust way to prepare the data for this sort of charts is to filter the data by class, for example limiting the data to cars in the WRC group/class, and then reranking the position by group/class. A finished position chart can then position the drivers by class ranking and use labels to overplot actual overall rally positions on stages where the overall stage position differs from the class rank.
4.3.3 Visualising Position Gains/Losses
To visualise position changes for a driver from one driver to the next, we can create a table of position differences. Let’s abstract out the code we used to find differences between columns in to a function:
= function(df, cols, dropfirst=FALSE, firstcol=NULL){
coldiffs = as.character(cols)
cols # [-1] drops the first column, [-ncol()] drops the last
= df[,cols][-1] - df[,cols][-ncol(df[,cols])]
df_
# The split time to the first split is simply the first split time
1]] = df[cols[1]]
df_[cols[# Return the dataframe in a sensible column order
= df_ %>% select(cols)
df_
if (!is.null(firstcol))
1]] = firstcol
df_[, cols[
if (dropfirst)
-1]
df_[,cols][else
df_ }
Let’s put that function through its paces. First, we can drop the first column:
coldiffs(multi_overall_wide_pos, stage_codes, dropfirst=TRUE) %>% head(2)
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(cols)` instead of `cols` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0 -1 0 0 0 0 0 0
Then we can retain the first column and replace it with a specified value:
coldiffs(multi_overall_wide_pos, stage_codes, firstcol=999) %>% head(2)
## SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 999 0 0 0 0 0 0 0 0 0
## 2 999 0 0 -1 0 0 0 0 0 0
We can now fund the position changes from stage to stage as well as tidying up the identifiers:
= multi_overall_wide_pos %>% coldiffs(stage_codes,
pos_diffs firstcol=0)
$code = multi_overall_wide_pos$code
pos_diffs
# Reorder the columns by moving last column to first
= pos_diffs %>% select('code', everything())
pos_diffs
%>% head(3) pos_diffs
## code SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 TÄN 0 0 0 0 0 0 0 0 0 0
## 2 ROV 0 0 0 -1 0 0 0 0 0 0
## 3 NEU 0 0 0 0 0 -1 0 0 0 0
Note that we need to be careful with the sense of how we read this table: a negative position change means the driver has improved their position. It might be more meaningful to have position gain/loss, rather than strict position difference columns, where a positive value denotes an improved position:
= pos_diffs
pos_gains = -pos_gains[,stage_codes[-1]]
pos_gains[,stage_codes]
%>% head(3) pos_gains
## code SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 TÄN 0 0 0 0 0 0 0 0 0 0
## 2 ROV 0 0 1 0 0 0 0 0 0 0
## 3 NEU 0 0 0 0 1 0 0 0 0 0
One way of highlighting position changes is to use coloured up/down arrows:
= function(...){
updown formatter("span",
style = function(x) style(color = ifelse(x>0,
"green",
ifelse(x<0,
"red",
"lightgrey"))),
function(x) icontext(ifelse (x >0,
# i.e. gained position
"arrow-up",
ifelse (x < 0,
# Lost position
"arrow-down" ,
# No position change
"resize-horizontal"))))
}
Let’s see how that works (note that we need to cast the table as.htmlwidget()
in order to render the arrows appropriately :
%>%
pos_gains head(3) %>%
formattable(list(
area(col = stage_codes) ~ updown())) %>%
as.htmlwidget()
We can extend the formatter to also display the number of positions gained or lost:
= function(...){
updown2 formatter("span",
style = function(x) style(color = ifelse(x>0,
"green",
ifelse(x<0,
"red",
"lightgrey"))),
function(x) icontext(ifelse (x >0,
# i.e. gained position
"arrow-up",
ifelse (x < 0,
"arrow-down" ,
"resize-horizontal")),
# Add in the pos change value
ifelse (x!=0, paste0('(',abs(x),')'),'')))
}
Let’s see how it looks:
%>%
pos_gains head(3) %>%
formattable(list(
area(col = stage_codes) ~ updown2())) %>%
as.htmlwidget()
Another way of visualising position changes is to create a simple summarising sparkline using the sparkline::spk_chr()
function.
This requires us first to cast the data into a long format:
library(sparkline)
<- pos_gains %>%
pos_gain_long_top10 head(10) %>%
gather(key ="Stage",
value ="PosChange",
stage_codes)
%>% head(3) pos_gain_long_top10
## code Stage PosChange
## 1 TÄN SS1 0
## 2 ROV SS1 0
## 3 NEU SS1 0
We can then generate sparklines showing position changes:
<- pos_gain_long_top10 %>%
pos_gain_sparkline_top10 group_by(code) %>%
summarize(spk_ = spk_chr(PosChange,
type ="bar"))
# We need to create an htmlwidget form of the table
= as.htmlwidget(formattable(pos_gain_sparkline_top10))
out
# The table also has a requirement on the sparkline package
$dependencies = c(out$dependencies,
out:::widget_dependencies("sparkline",
htmlwidgets"sparkline"))
out
Note that to render the sparkline, we need to cast the formatted table to an htmlwidget
and also ensure that the required sparkline
Javascript package is loaded into the widget.
One issue with the sparkline bars is that the scales may differ. For example, across different drivers, a position change of +1 for one driver may have the same height as a position change of +2 for another driver.
4.3.4 Per Driver Position Charts
As well as generating summary chapters over a set of drivers, we can also generate charts on a per driver basis, cf. sparkline charts.
For example, we can create a simple chart that captures a single driver’s position over several stages, optionally using the gt::ggplot_image()
function to create an HTML embeddable image tag with the chart encoded as a data URI:
= get_person_id(cars, 'rov', ret='code')
rovCode
= function(df_long, code, embed=FALSE,
get_pos_chart height=30, aspect_ratio=1, size=5) {
# Get the data for the specified driver
= df_long[df_long['code']==code,]
subdf
= max(10.6, max(subdf$Pos)+0.1)
ymax
= ggplot(subdf,
g aes(x=as.integer(Stage), y=Pos, group=code)) +
geom_step(direction='mid', color='blue', size=size) +
geom_hline(yintercept=0.8, linetype='dotted',
size=size, color='black') +
geom_hline(yintercept=3.35, linetype='dotted',
size=size, color='black') +
geom_hline(yintercept=10.5, color='darkgrey',
size=size) +
#scale_y_continuous(trans = "reverse") +
scale_y_reverse( lim=c(ymax, 0.8)) +
theme_void() + scale_x_continuous(expand=c(0,0)) #+
#theme(aspect.ratio=0.1)
if (embed)
::ggplot_image(g, height = height, aspect_ratio=aspect_ratio)
gtelse
g
}
get_pos_chart(overall_pos_long_top10, rovCode)
If we generate an image for each driver, we can then create a column of images to showing the change in position over the rally for each one on a row by row basis:
= multi_overall_wide_pos %>% head(5)
overall_wide_pos_top5
= overall_wide_pos_top5$code
top5codes
= list()
gt_pos_plots # Iterate through each driver in the top 5
for (c in 1:length(top5codes)){
# Add each plot to the plot list
# The split is generated for the top 5
length(gt_pos_plots) + 1]] <-
gt_pos_plots[[get_pos_chart(overall_pos_long_top10, top5codes[c],
embed=T, aspect_ratio=3, size=5)
}
We can then add the charts to the wide timing results dataframe as an extra column:
$poschart = gt_pos_plots
overall_wide_pos_top5
formattable(overall_wide_pos_top5)
code | SS1 | SS2 | SS3 | SS4 | SS5 | SS6 | SS7 | SS8 | SS9 | SS10 | poschart |
---|---|---|---|---|---|---|---|---|---|---|---|
TÄN | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
ROV | 3 | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
NEU | 4 | 4 | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | |
BRE | 2 | 2 | 2 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | |
EVA | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
# How do we suppress stripes in formattable tables?
4.3.5 Visualising Time to First
A convenient way of visualising the gap to leader across stages is to create a sparkline using the sparkline::spk_chr()
function. This function requires data in a tidy (long) format which is then grouped for each driver.
Let’s remind ourselves of what the data looks like:
= multi_overall_wide_gap %>%
multi_overall_wide_gap_top10 map_stage_codes(stage_list) %>%
map_driver_names(cars) %>%
::arrange(!!as.symbol(stage_codes[length(stage_codes)])) %>%
dplyrhead(10)
multi_overall_wide_gap_top10
## code SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 TÄN 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## 2 ROV 10.6 20.4 26.1 21.2 23.6 25.0 23.3 24.1 19.2 17.5
## 3 NEU 12.4 29.8 30.8 29.2 34.7 34.9 38.2 25.9 21.1 19.8
## 4 BRE 3.6 16.2 23.6 22.4 28.9 42.8 50.6 53.4 54.0 52.6
## 5 EVA 16.7 32.0 43.3 37.2 37.3 53.6 55.7 63.5 57.6 61.5
## 6 KAT 19.9 38.8 47.5 46.6 52.4 74.5 83.8 94.4 93.2 97.8
## 7 SOL 31.1 45.9 50.5 49.4 51.3 62.7 80.4 86.8 84.2 99.0
## 8 SUN 18.8 34.5 46.9 54.1 70.9 82.1 98.2 109.3 117.3 129.0
## 9 GRE 41.8 65.8 84.7 94.6 112.1 134.3 156.5 181.8 200.5 219.4
## 10 LAP 42.8 88.2 123.4 146.9 186.2 215.8 239.3 288.8 327.1 367.0
We can create the required long form data from the wide gap table as follows, retrieving just the top 10 drivers based on the gap on the final stage, for convenience:
<- multi_overall_wide_gap_top10 %>%
overall_long_gap_top10 gather(key ="Stage",
value ="Gap", stage_codes)
%>% head(3) overall_long_gap_top10
## code Stage Gap
## 1 TÄN SS1 0.0
## 2 ROV SS1 10.6
## 3 NEU SS1 12.4
With the data in the appropriate form, we can create the sparkline, using a bar chart format:
<- overall_long_gap_top10 %>%
overall_long_gap_top10 group_by(code) %>%
summarize(spk_ = spk_chr(-Gap,
type ="bar"))
= function(df){
spark_df # We need to create an htmlwidget form of the table
= as.htmlwidget(formattable(df))
out
# The table also has a requirement on the sparkline package
$dependencies = c(out$dependencies,
out:::widget_dependencies("sparkline",
htmlwidgets"sparkline"))
out
}
spark_df(overall_long_gap_top10)
4.3.6 Visualising Rebased Gaps
As well as using the sparkline bar chart to visualise the gap to leader, we can use a similar approach to visualise the gap to other drivers following a rebasing step.
= rebase(multi_overall_wide_gap_top10,
overall_wide_gap_rebased
example_driver, stage_codes,id_col='code')
overall_wide_gap_rebased
## code SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 TÄN -16.7 -32.0 -43.3 -37.2 -37.3 -53.6 -55.7 -63.5 -57.6 -61.5
## 2 ROV -6.1 -11.6 -17.2 -16.0 -13.7 -28.6 -32.4 -39.4 -38.4 -44.0
## 3 NEU -4.3 -2.2 -12.5 -8.0 -2.6 -18.7 -17.5 -37.6 -36.5 -41.7
## 4 BRE -13.1 -15.8 -19.7 -14.8 -8.4 -10.8 -5.1 -10.1 -3.6 -8.9
## 5 EVA 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
## 6 KAT 3.2 6.8 4.2 9.4 15.1 20.9 28.1 30.9 35.6 36.3
## 7 SOL 14.4 13.9 7.2 12.2 14.0 9.1 24.7 23.3 26.6 37.5
## 8 SUN 2.1 2.5 3.6 16.9 33.6 28.5 42.5 45.8 59.7 67.5
## 9 GRE 25.1 33.8 41.4 57.4 74.8 80.7 100.8 118.3 142.9 157.9
## 10 LAP 26.1 56.2 80.1 109.7 148.9 162.2 183.6 225.3 269.5 305.5
And now generate the sparklines:
<- overall_wide_gap_rebased %>%
overall_spark_gap_rebased gather(key ="Stage",
value ="Gap", stage_codes) %>%
group_by(code) %>%
summarize(spk_ = spk_chr(Gap, type ="bar"))
spark_df(overall_spark_gap_rebased)
4.4 Retrieving Stage Times and Results
This far, we have focused on exploring the overall results data. If required, we can also retrieve detailed results for multiple stages by requesting stage results for a specified list of stages:
= get_multi_stage_times(stage_list)
multi_stage_times
%>% tail(2) multi_stage_times
## stageTimeId stageId entryId elapsedDurationMs elapsedDuration status
## 539 96810 1749 21571 1301693 00:21:41.6930000 Completed
## 540 96793 1749 21541 NA <NA> DNS
## source position diffFirstMs diffFirst diffPrevMs diffPrev
## 539 Default 52 699224 00:11:39.2240000 153590 00:02:33.5900000
## 540 Default NA NA <NA> NA <NA>
We can then cast the data to a wide format and relabel the resulting dataframe.
One recipe for doing this is pretty well proven now, so let’s make it more convenient:
= function(df, stage_list, cars) {
relabel_times_df %>%
df map_stage_codes(stage_list) %>%
map_driver_names(cars)
}
And let’s create another function to relabel a dataframe and grab the top 10:
= function(df) {
clean_top10 %>% relabel_times_df(stage_list, cars) %>%
df ::arrange(!!as.symbol(stage_codes[length(stage_codes)])) %>%
dplyrhead(10)
}
We can now get a wide format dataframe containing individual stage times cleaned and reduced to the top 10:
= multi_stage_times %>%
multi_stage_times_wide get_multi_stage_times_wide(stage_list) %>%
clean_top10()
multi_stage_times_wide
## code SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 ROV 968.4 962.7 840.2 547.8 752.5 838.7 548.8 777.1 608.5 602.469
## 2 BRE 961.4 965.5 841.9 551.5 756.6 851.2 558.3 779.1 614.0 602.731
## 3 NEU 970.2 970.3 835.5 551.1 755.6 837.5 553.8 764.0 608.6 602.834
## 4 TÄN 957.8 952.9 834.5 552.7 750.1 837.3 550.5 776.3 613.4 604.114
## 5 OGI 980.5 980.0 842.0 548.5 751.2 853.7 556.7 1364.0 615.0 604.272
## 6 EVA 974.5 968.2 845.8 546.6 750.2 853.6 552.6 784.1 607.5 608.028
## 7 KAT 977.7 971.8 843.2 551.8 755.9 859.4 559.8 786.9 612.2 608.774
## 8 SUN 976.6 968.6 846.9 559.9 766.9 848.5 566.6 787.4 621.4 615.819
## 9 SOL 988.9 967.7 839.1 551.6 752.0 848.7 568.2 782.7 610.8 618.976
## 10 GRE 999.6 976.9 853.4 562.6 767.6 859.5 572.7 801.6 632.1 623.075
Or we can get the stage rankings for each stage:
= multi_stage_times %>%
multi_stage_wide_pos get_multi_stage_generic_wide(stage_list,
'position') %>%
clean_top10()
%>% head(2) multi_stage_wide_pos
## code SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 ROV 3 2 4 2 5 3 1 3 2 1
## 2 BRE 2 3 5 5 8 6 6 4 7 2
Or the gap to stage winner:
= multi_stage_times %>%
multi_stage_wide_gap mutate(diffFirstS = diffFirstMs/1000) %>%
get_multi_stage_generic_wide(stage_list,
'diffFirstS') %>%
clean_top10()
%>% head(2) multi_stage_wide_gap
## code SS1 SS2 SS3 SS4 SS5 SS6 SS7 SS8 SS9 SS10
## 1 ROV 10.6 9.8 5.7 1.2 2.4 1.4 0.0 13.1 1.0 0.000
## 2 BRE 3.6 12.6 7.4 4.9 6.5 13.9 9.5 15.1 6.5 0.262
4.5 Visualising Stage Times
Two of the most useful chart types, at least in terms of glanceable displays, that we can generate from the detailed stage times over the course of the rally are at the individual level: sparklines showing the individual stage positions and the gap to stage winner.
So let’s start off with those before looking at grouped and individual stage position charts.
4.5.1 Stage Position Sparklines
Our pattern is now pretty well proven so we can routinise our sparkline production:
= function(df, col, typ='Gap'){
generate_spark_bar %>% gather(key ="Stage",
df value =!!typ, stage_codes) %>%
group_by(code) %>%
summarize(spk_ = spk_chr(-!!as.symbol(typ), type ="bar"))
}
Let’s see how it works:
= generate_spark_bar(multi_stage_wide_pos)
multi_stage_long_pos spark_df(multi_stage_long_pos)
4.5.2 Gap to Stage Winner Sparklines
For the gap to stage winner,
= generate_spark_bar(multi_stage_wide_gap)
multi_stage_long_gap
spark_df(multi_stage_long_gap)
4.5.3 Stage Position Charts
Let’s reuse the approach we used before for generating the position chart:
= multi_stage_wide_pos %>%
multi_stage_long_pos pivot_longer(c(stage_codes),
names_to ="Stage",
values_to ="Pos") %>%
mutate(Stage = factor(Stage,
levels = stage_codes))
= 1:max(multi_stage_long_pos$Pos)
pos_range
ggplot(multi_stage_long_pos, aes(x=Stage, y=Pos)) +
geom_line(aes(group = code)) +
# Invert scale and relabel y-axis
# https://stackoverflow.com/a/28392170/454773
scale_y_continuous(trans = "reverse",
breaks = pos_range) +
geom_dl(aes(label = paste0(' ',code)), # Add space before label
# Add label at end of line
method = list('last.bumpup',
# cex is text label size
cex = 0.5)) +
geom_dl(aes(label = paste0(code, ' ')), # Add space before label
# Add label at start of line
method = list('first.points', cex = 0.5)) +
theme_void()
In this case we see the very obvious problem presented by drivers falling outside the typical position range. We really do need to organise this chart by group rank!
4.5.4 Individual Stage position Charts
At the individual level, we can also reuse the approach we developed for charting overall stage positions at an individual level:
get_pos_chart(multi_stage_long_pos, rovCode)
= multi_stage_wide_pos$code
top10_codes
= list()
gt_stage_pos_plots
# Iterate through each driver in the top 5
for (c in 1:length(top10_codes)){
# Add each plot to the plot list
length(gt_stage_pos_plots) + 1]] <-
gt_stage_pos_plots[[get_pos_chart(multi_stage_long_pos, top10_codes[c],
embed=T, aspect_ratio=3, size=5)
}
$poschart = gt_stage_pos_plots
multi_stage_wide_pos
formattable(multi_stage_wide_pos)
code | SS1 | SS2 | SS3 | SS4 | SS5 | SS6 | SS7 | SS8 | SS9 | SS10 | poschart |
---|---|---|---|---|---|---|---|---|---|---|---|
ROV | 3 | 2 | 4 | 2 | 5 | 3 | 1 | 3 | 2 | 1 | |
BRE | 2 | 3 | 5 | 5 | 8 | 6 | 6 | 4 | 7 | 2 | |
NEU | 4 | 7 | 2 | 4 | 6 | 2 | 4 | 1 | 3 | 3 | |
TÄN | 1 | 1 | 1 | 8 | 1 | 1 | 2 | 2 | 6 | 4 | |
OGI | 9 | 10 | 6 | 3 | 3 | 8 | 5 | 47 | 8 | 5 | |
EVA | 5 | 5 | 9 | 1 | 2 | 7 | 3 | 6 | 1 | 6 | |
KAT | 8 | 8 | 7 | 7 | 7 | 9 | 7 | 7 | 5 | 7 | |
SUN | 7 | 6 | 10 | 9 | 10 | 4 | 8 | 8 | 10 | 8 | |
SOL | 10 | 4 | 3 | 6 | 4 | 5 | 9 | 5 | 4 | 9 | |
GRE | 11 | 9 | 11 | 11 | 11 | 10 | 10 | 9 | 11 | 10 |