7 Working With Split Times
In this chapter we’ll explore some of the ways in which we might start to work with split times data from a particular stage. On the one hand, we can consider split times purely on a time basis; on the other, we can explore the split times in terms of pace differences calculated by normalising the split times over split distances.
As with stage times, we can rebase split times to compare a specified driver directly with other drivers, not just the stage winner. We can also calculate how much time was spent on each split section by finding the difference between consecutive split times for a particular driver.
From the split section times, we can also calculate various derived measures, such as the ultimate possible stage time, based on the sum of fastest times to complete each split section.
Having access to split times also sets up the potential for rating the performance of a driver on each split against various stage and stage route metrics, such as road order or route metrics such as the “wiggliness” of the split section, although these will not be covered here.
7.1 Load Base Data
To get the splits data from a standing start, we can load in the current season list, select the rally we want, look up the itinerary from the rally, extract the sections and then the stages and the retrieve the stage ID for the stage we are interested in.
Let’s start by loading in key helper libraries:
source('code/wrc-api.R')
source('code/wrc-wrangling.R')
source('code/wrc-charts.R')
And getting some initial data:
= get_active_season()
s = get_eventId_from_name(s, 'arctic')
eventId
= get_itinerary(eventId)
itinerary = get_sections(itinerary)
sections = get_stages(sections)
stages = get_stages_lookup(stages)
stages_lookup
# For driver details
= get_rally_entries(eventId)
entries = get_car_data(entries) cars
As a working example, let’s define an example stage by its code:
= 'SS3' stage_code
Get a sample stage ID:
= stages_lookup[[stage_code]] stageId
7.2 Get Splits Data
The split times represented the accumulated time going the the stage at each split point.
The split times do not include the overall stage time, so we need to be mindful that if we want to report on a stage the split times in and of themselves do not contain any information about the final section of the stage between the final split point and the stage finish line.
We can load the splits data if we know the event and stage ID:
= get_splits(eventId, stageId) splits
The splits data actually comprises two dataframes in columns splitPoints
and entrySplitPointTimes
.
7.2.1 Split Locations
The splitPoints
dataframe contains information about the splits locations:
= splits$splitPoints
splits_locations %>% arrange(number) %>% head(2) splits_locations
## splitPointId stageId number distance
## 1 3615 1750 1 4.83
## 2 3601 1750 2 9.02
We can also generate a list of the split IDs:
= splits_locations$splitPointId
splits_list splits_list
## [1] 3593 3601 3615 3617 3621
We can retrieve the split codes ordered by the distance into the stage of each split point from a lookup on the split points:
= function(splits){
get_split_cols = as.character(arrange(splits$splitPoints,
split_cols $splitPointId)
distance)
split_cols
}
= get_split_cols(splits)
split_cols split_cols
## [1] "3615" "3601" "3621" "3617" "3593"
Since the the split points dataframe does not including the final timing point location (i.e. the stage finish), we can get the full stage distance from the stages
dataframe:
'code']==stage_code, 'distance'] stages[stages[
## [1] 24.43
7.2.2 Mapping Split Codes to Split Numbers
To provide a human readable version of the split identifiers, let’s map them onto a more meaningful label:
= function(x){
get_split_label paste0('split_', splits_locations[splits_locations$splitPointId==x,
'number'])
}
$splitname = sapply(splits_locations$splitPointId,
splits_locations
get_split_label)
%>% head(3) splits_locations
## splitPointId stageId number distance splitname
## 1 3593 1750 5 23.21 split_5
## 2 3601 1750 2 9.02 split_2
## 3 3615 1750 1 4.83 split_1
We can generate a lookup list of split point names and IDs as:
get_stages_lookup(splits_locations, 'splitPointId', 'splitname')
## 3593 3601 3615 3617 3621
## "split_5" "split_2" "split_1" "split_4" "split_3"
We can use this in a function that provides an annotated and ordered form of the split locations dataframe:
= function(splits){
get_split_locations = splits$splitPoints
splits_locations $splitname = sapply(splits_locations$splitPointId,
splits_locations
get_split_label)%>%
splits_locations arrange(number)
}
= get_split_locations(splits)
splits_locations splits_locations
## splitPointId stageId number distance splitname
## 1 3615 1750 1 4.83 split_1
## 2 3601 1750 2 9.02 split_2
## 3 3621 1750 3 14.87 split_3
## 4 3617 1750 4 20.63 split_4
## 5 3593 1750 5 23.21 split_5
Create a convenience list of splits names:
# We could create these as ordered factors?
= splits_locations$splitname
split_names
split_names
## [1] "split_1" "split_2" "split_3" "split_4" "split_5"
7.2.3 Split Times
The second dataframe returned from the splits API call contains the splits times, accessed via the get_driver_splits()
function defined previously and imported from the wrc-api.R file. The data is returned in a long format, with each row describing a single split time for a particular driver on the single stage the split times were retrieved for.
= get_driver_splits(splits)
driver_splits %>% head(2) driver_splits
## splitPointTimeId splitPointId entryId elapsedDuration splitDateTime
## 1 123482 3615 21540 PT2M41.4S 2021-02-27T07:10:41.4
## 2 123483 3601 21540 PT4M32.5S 2021-02-27T07:12:32.5
## splitDateTimeLocal elapsedDurationS
## 1 2021-02-27T09:10:41.4+02:00 161.4
## 2 2021-02-27T09:12:32.5+02:00 272.5
The get_multi_split_times(stage_list)
function can also provide us the long form data for multiple stages given one or more stage IDs.
7.2.4 Wide Driver Split Times
We can cast the the driver split points into a wide format using the split point codes, ordered by split distance into the stage, as the widened column names:
= function(splits){
get_splits_wide = get_driver_splits(splits)
driver_splits
= get_split_cols(splits)
split_cols = c('entryId', 'splitPointId', 'elapsedDurationS')
splits_cols
= driver_splits %>%
splits_wide group_by(entryId) %>%
select(all_of(splits_cols)) %>%
::spread(key = splitPointId,
tidyrvalue = elapsedDurationS) %>%
select(all_of(c('entryId', split_cols))) %>%
# If we don't cast, it's a
# non-rankable rowwise df
as.data.frame()
splits_wide }
Rather than retrieve the split times into a long format, with one row per driver split, we can now retrieve the data into a wide form with one row per driver and a column for each split on the stage:
= get_splits_wide(splits)
splits_wide
%>% head(2) splits_wide
## entryId 3615 3601 3621 3617 3593
## 1 21530 161.7 272.3 471.3 690.4 789.2
## 2 21531 162.3 273.7 472.3 692.1 792.4
The following function, which closely resembles a similar function function for relabeling stage codes, allows us to rename the split points with more meaningful splits names:
= function(df, splits_list) {
map_split_codes # Get stage codes lookup id->code
= get_stages_lookup(splits_locations,
splits_lookup_code 'splitPointId', 'splitname')
#https://stackoverflow.com/a/34299333/454773
::rename(df, replace = splits_lookup_code,
plyrwarn_missing = FALSE)
}
For example:
= get_splits_wide(splits) %>%
splits_wide map_split_codes(splits_list) %>%
map_driver_names(cars)
%>% head(10) splits_wide
## code split_1 split_2 split_3 split_4 split_5
## 1 OGI 161.7 272.3 471.3 690.4 789.2
## 2 EVA 162.3 273.7 472.3 692.1 792.4
## 3 NEU 159.1 268.6 465.8 682.7 782.7
## 4 ROV 159.6 270.6 468.3 686.6 786.9
## 5 KAT 161.4 270.3 469.8 689.5 789.9
## 6 GRE 165.8 278.3 479.3 700.4 800.4
## 7 TÄN 157.5 267.3 463.8 681.2 781.6
## 8 SUN 161.8 274.5 474.3 694.2 794.2
## 9 BRE 158.4 270.3 471.2 688.4 788.5
## 10 SOL 163.6 275.1 471.3 688.3 786.7
We can also update our helper function to relabel stages with a more general function:
= function(df, s_list, cars, typ='stage') {
relabel_times_df2 if (typ=='split')
= df %>% map_split_codes(s_list)
df else
= df %>% map_stage_codes(s_list)
df
%>%
df map_driver_names(cars)
}
7.3 Rebasing Split Times
The split times describe the split times recorded for each driver, but in many situations we may be interested in knowing the difference in split times for a specific driver relative to every other driver.
More formally, for drivers \(j\) on stage \(S\) and split \(s\), we find the rebased stage times relative to driver \(j\) as:
\[ {_{S,s}}t_{i}^{j} = {_{S,s}}t_{i} - {_{S,s}}t_{j} \] although we may want to negate that value depending on the sense of whether we want to focus on times from the selected driver’s perspective, or from the perspective of the field of drivers they are being rebased against. . To calculate the rebased times, we note that the wide dataframe format gives rows containing the split times for each driver, which is to say \({_{S,*}}t_i\).
To calculate the rebased times, we can simply subtract the row corresponding to the driver we want to rebase relative to, from the other driver rows.
Recall the heart of the rebase function we have previously defined:
#https://stackoverflow.com/a/32267785/454773
= function(df, id, rebase_cols,
rebase_essence id_col='entryId') {
= df
df_
# The rebase values are the ones
# we want to subtract from each row
= c(df[df[[id_col]]==id, rebase_cols])
rebase_vals
# Do the rebasing
= df[,rebase_cols] - rebase_vals
df_[,rebase_cols]
df_ }
Let’s try the original function with an example driver:
= get_person_id(cars, 'ogier', ret='code')
ogierEntryId
= rebase(splits_wide, ogierEntryId,
ogier_rebased id_col='code')
split_names,
%>% head(10) ogier_rebased
## code split_1 split_2 split_3 split_4 split_5
## 1 OGI 0.0 0.0 0.0 0.0 0.0
## 2 EVA 0.6 1.4 1.0 1.7 3.2
## 3 NEU -2.6 -3.7 -5.5 -7.7 -6.5
## 4 ROV -2.1 -1.7 -3.0 -3.8 -2.3
## 5 KAT -0.3 -2.0 -1.5 -0.9 0.7
## 6 GRE 4.1 6.0 8.0 10.0 11.2
## 7 TÄN -4.2 -5.0 -7.5 -9.2 -7.6
## 8 SUN 0.1 2.2 3.0 3.8 5.0
## 9 BRE -3.3 -2.0 -0.1 -2.0 -0.7
## 10 SOL 1.9 2.8 0.0 -2.1 -2.5
7.4 Visualising Rebased Split Times Using Sparklines
If we cast the data back to a tidy long form data, we can easily generate a graphical summary from the long form data:
library(tidyr)
<- ogier_rebased %>%
ogier_rebased_long gather(key ="Split",
value ="TimeInS",
all_of(split_names))
%>% head(10) ogier_rebased_long
## code Split TimeInS
## 1 OGI split_1 0.0
## 2 EVA split_1 0.6
## 3 NEU split_1 -2.6
## 4 ROV split_1 -2.1
## 5 KAT split_1 -0.3
## 6 GRE split_1 4.1
## 7 TÄN split_1 -4.2
## 8 SUN split_1 0.1
## 9 BRE split_1 -3.3
## 10 SOL split_1 1.9
For example, we can use the sparkline::spk_chr()
function to generate an HTML sparkline widget that we can embed in a formattable::formattable()
generated table:
library(formattable)
library(sparkline)
<- ogier_rebased_long %>%
ogier_sparkline group_by(code) %>%
summarize(spk_ = spk_chr(TimeInS, type ="bar"))
# We need to create an htmlwidget form of the table
= as.htmlwidget(formattable(head(ogier_sparkline, 5)))
out
# The table also has a requirement on the sparkline package
$dependencies = c(out$dependencies,
out:::widget_dependencies("sparkline",
htmlwidgets"sparkline"))
out
7.5 Finding the Rank Position at Each Split Point
It can often be tricky to work out the rank at each split by eye, so let’s create a simple function to display the rank at each split for us:
= function(df, split_cols){
get_split_rank # We need to drop any list names
= as.character(split_names)
split_names
%>% mutate(across( split_cols, dense_rank ))
df
}
get_split_rank(splits_wide, split_names) %>% head(5)
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(split_cols)` instead of `split_cols` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
## code split_1 split_2 split_3 split_4 split_5
## 1 OGI 6 5 6 7 6
## 2 EVA 8 7 8 8 9
## 3 NEU 3 2 2 2 2
## 4 ROV 4 4 3 3 4
## 5 KAT 5 3 4 6 7
7.6 Finding Split Section Durations
Inspection of the split times data show the split times to be a strictly increasing function over the ordered in-stage split locations. That is, the times represent the time in stage up to that split point, rather than the fractional time taken to get from one split timing point to another.
In many cases, it will be convenient to know how much time a driver took to get from one split point to the next, not least because this allows us to identify sections of the stage where a driver may have particularly gained or lost time, or to identify where a driver may be making or losing time consistently across different parts of the stage.
In abstract terms, then, what we want to calculate is the time taken for a driver \(i\) on stage \(S\) to go between two split points, \(m\) and \(n\), where \(m=0\) is the start, \(m={_S}s_{max}+1\) is the stage end, and \({_S}s_{max}\) is the number of split points on stage \(S\):
\[ {_{S,m,n}}t_{i} = {_{S,n}}t_{i} - {_{S,m}}t_{i}: 0{\le}m<n{\le}{_S}s_{max} \]
For a specific, known stage, we might write the simpler:
\[ S={stagenum}; {_{m,n}}t_{i} = {_{n}}t_{i} - {_{m}}t_{i}: 0{\le}m<n{\le}s_{max} \]
For a driver, i, we note that the accumulated stage time on stage \(S\) as given by the split times is:
\[ S={stagenum};t_{i}=\sum_{s=0}^{s{_{max}}}{_{s,s+1}}t_{i}: \]
To get the duration between two split points, we can create two temporary dataframes, one representing the original split times without the first split \({_{{s+1},s_{max}}}t\), one representing the split times without the last split, \({_{s,s_{max}-1}}t\). Subtracting one dataframe from the other this finds the difference across all consecutive columns:
\[ {_{{s+1},s_{max}}}t - {_{s,s_{max}-1}}t \]
Let’s see how that works in practice:
#https://stackoverflow.com/a/50411529/454773
= function(df, split_cols,
get_split_duration retId=TRUE, id_col='entryId') {
# Drop names if they are set
= as.character(split_cols)
split_cols
# [-1] drops the first column, [-ncol()] drops the last
= df[,split_cols][-1] - df[,split_cols][-ncol(df[,split_cols])]
df_
# The split time to the first split is simply the first split time
1]] = df[split_cols[1]]
df_[split_cols[
if (retId) {
# Add in the entryId column
= df[[id_col]]
df_[[id_col]]
# Return the dataframe in a sensible column order
%>% select(c(id_col, all_of(split_cols)))
df_ else {
}
df_
}
}
Let’s see how that works:
= get_split_duration(splits_wide,
split_durations_wide id_col='code') split_names,
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(id_col)` instead of `id_col` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
%>% head(5) split_durations_wide
## code split_1 split_2 split_3 split_4 split_5
## 1 OGI 161.7 110.6 199.0 219.1 98.8
## 2 EVA 162.3 111.4 198.6 219.8 100.3
## 3 NEU 159.1 109.5 197.2 216.9 100.0
## 4 ROV 159.6 111.0 197.7 218.3 100.3
## 5 KAT 161.4 108.9 199.5 219.7 100.4
7.6.1 Finding Split Section Ranks
To find the rank in terms of which driver completed each stage section in the quickest time, we can simply pass the split_durations_wide
dataframe rather than the split_durations
dataframe to the get_split_rank()
function:
get_split_rank(split_durations_wide, split_names) %>% head(5)
## code split_1 split_2 split_3 split_4 split_5
## 1 OGI 6 4 6 6 3
## 2 EVA 8 7 5 8 7
## 3 NEU 3 2 3 1 4
## 4 ROV 4 5 4 5 7
## 5 KAT 5 1 8 7 8
7.7 Adding Overall Stage Time to the Split Times
It is important to note that the split times data does not contain all the timing data for the stage, just the times relating to split points along the stage route. For a complete summary of stage timing data, we also need to add in the overall stage time from the stage results table:
= get_stage_times(eventId, stageId)
stage_times
%>% head(2) stage_times
## stageTimeId stageId entryId elapsedDurationMs elapsedDuration status
## 1 96580 1750 21536 834500 00:13:54.5000000 Completed
## 2 96474 1750 21532 835500 00:13:55.5000000 Completed
## source position diffFirstMs diffFirst diffPrevMs diffPrev
## 1 Default 1 0 00:00:00 0 00:00:00
## 2 Default 2 1000 00:00:01 1000 00:00:01
Recall that if required we can also retrieve stage time for multiple stages using the get_multi_stage_times(stagelist)
function. The get_stage_list(stages)
function will return a list of all stage IDs.
If we merge each driver’s stage times as an extra, final column to the wide split times dataframe, we can calculate the split section durations over the whole stage, including the time taken to get from the final split to the stage end.
Recalling that driver codes may not be unique, we should use the unique entryId values to create the extended dataframe:
= function(splits_wide, stage_times,
widen_splits_stage_times id_col='entryId'){
= c('elapsedDurationMs', id_col, 'diffFirstMs', 'position')
results_cols
= splits_wide %>%
splits_wide merge(stage_times[,results_cols],
by = 'entryId') %>%
mutate(split_N = elapsedDurationMs/1000)
splits_wide
}
= get_splits_wide(splits) %>%
full_splits_wide widen_splits_stage_times(stage_times) %>%
map_split_codes(splits_list) %>%
map_driver_names(cars)
%>% head(2) full_splits_wide
## code split_1 split_2 split_3 split_4 split_5 elapsedDurationMs diffFirstMs
## 1 OGI 161.7 272.3 471.3 690.4 789.2 842000 7500
## 2 EVA 162.3 273.7 472.3 692.1 792.4 845800 11300
## position split_N
## 1 6 842.0
## 2 9 845.8
To make further processing easier, we add the overall stage time to the list of split time column names. The “final split” is now the completed stage time:
= c(split_names, 'split_N')
split_names split_names
## [1] "split_1" "split_2" "split_3" "split_4" "split_5" "split_N"
7.8 Calculating the Ultimate Stage from Ultimate Split Times
The ultimate stage time for a stage is the sum of the fastest sectional split times on the stage as recorded by any driver. Rebasing the stage winner’s time against the ultimate stage time shows whether the driver potentially left time on the stage. (Of course, it might be that a very fast sectional time recorded by one driver may have wiped out their tyres and led to a relatively poor overall stage time, or risk taking that ended their stage prematurely…)
So how can we calculate the ultimate splits? For split section times \({_{S,s}}t_{i}\), the ultimate section time \({_{S,s}}u\) is given as:
\[ {_{S,s}}u = \forall i: min({_{S,s}}t_i) \]
We can calculate the times by case the wide split section duration dataframe to a long form, grouping by the splitPointId and then summarising the minimum time in each group.
Here’s how we can create the long form dataset:
%>%
full_splits_wide #gather() is deprecated / retired...
#gather(splitPointId, sectionDurationS,
# as.character(split_names))
select(all_of(as.character(split_names)), code) %>%
pivot_longer(as.character(split_names),
names_to = "splitname",
values_to = "sectionDurationS") %>%
head(3)
## # A tibble: 3 x 3
## code splitname sectionDurationS
## <chr> <chr> <dbl>
## 1 OGI split_1 161.700
## 2 OGI split_2 272.3
## 3 OGI split_3 471.3
We can also get the duration of each section:
= get_split_duration(full_splits_wide,
full_durations_wide id_col='code')
split_names,
%>% head() full_durations_wide
## code split_1 split_2 split_3 split_4 split_5 split_N
## 1 OGI 161.7 110.6 199.0 219.1 98.8 52.8
## 2 EVA 162.3 111.4 198.6 219.8 100.3 53.4
## 3 NEU 159.1 109.5 197.2 216.9 100.0 52.8
## 4 ROV 159.6 111.0 197.7 218.3 100.3 53.3
## 5 KAT 161.4 108.9 199.5 219.7 100.4 53.3
## 6 GRE 165.8 112.5 201.0 221.1 100.0 53.0
To rebase on on an ultimate time basis, it helps to think of an ultimate driver whom we define as having a time equivalent to the fastest split duration time between two split points.
If we group by splitPointId, we can summarise on sectionDurationS to find the minimum duration at each split; we can also take the opportunity to add an accumulated stage time column at each split point as well:
= full_durations_wide %>%
ultimate_splits_long pivot_longer(all_of(split_names),
names_to = "splitname",
values_to = "sectionDurationS") %>%
select(splitname, sectionDurationS) %>%
# Defensive measure
filter(!is.na(sectionDurationS) & sectionDurationS>0) %>%
group_by(splitname) %>%
summarise(ultimate = min(sectionDurationS,
na.rm = TRUE)) %>%
mutate(ultimateElapsed = cumsum(ultimate))
ultimate_splits_long
## # A tibble: 6 x 3
## splitname ultimate ultimateElapsed
## <chr> <dbl> <dbl>
## 1 split_1 157.5 157.5
## 2 split_2 108.9 266.400
## 3 split_3 196.2 462.6
## 4 split_4 216.9 679.5
## 5 split_5 98.4 777.9
## 6 split_N 52.40000 830.3
Rebasing arbitrary drivers against the ultimate stage (and the distribution of times recorded by other drivers) may give an idea of which drivers were pushing on what parts of a stage and where they were being more cautious.
Note that ultimate split times can be used to create an ultimate stage time time that can itself contribute to the ultimate ultimate rally time (sic).
7.8.1 Rebasing to Ultimate Split Times
We can rebase to the ultimate split times in three senses:
- on an ideal, ultimate per split basis;
- on an ideal elapsed time basis (the cumulative sum of ideal ultimate split durations);
- on an actual best elapsed (stage) time basis at each split, rebasing relative to the minimum actual recorded elapsed time at each split.
7.8.2 Per Split Ultimate Rebasing
To rebase on a per split basis, we can simply rebase durations relative to the minimum split duration at each split.
Let’s reshape the ultimate driver durations to a wide format:
= ultimate_splits_long %>%
ultimate_wide_durations select(splitname, ultimate) %>%
pivot_wider(names_from = splitname,
values_from = ultimate) %>%
mutate(code='ultimate')
ultimate_wide_durations
## # A tibble: 1 x 7
## split_1 split_2 split_3 split_4 split_5 split_N code
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 157.5 108.9 196.2 216.9 98.4 52.40000 ultimate
We can now add the ultimate driver split durations to the full_durations_wide data and rebase against this dummy driver:
= full_durations_wide %>%
ultimate_between_split select(code, all_of(split_names)) %>%
bind_rows(ultimate_wide_durations) %>%
rebase('ultimate', split_names,
id_col='code') %>%
head(10)
ultimate_between_split
## code split_1 split_2 split_3 split_4 split_5 split_N
## 1 OGI 4.2 1.7 2.8 2.2 0.4 0.4
## 2 EVA 4.8 2.5 2.4 2.9 1.9 1.0
## 3 NEU 1.6 0.6 1.0 0.0 1.6 0.4
## 4 ROV 2.1 2.1 1.5 1.4 1.9 0.9
## 5 KAT 3.9 0.0 3.3 2.8 2.0 0.9
## 6 GRE 8.3 3.6 4.8 4.2 1.6 0.6
## 7 TÄN 0.0 0.9 0.3 0.5 2.0 0.5
## 8 SUN 4.3 3.8 3.6 3.0 1.6 0.3
## 9 BRE 0.9 3.0 4.7 0.3 1.7 1.0
## 10 SOL 6.1 2.6 0.0 0.1 0.0 0.0
In this case we note there is at least one driver in each column with a zero gap to the ultimate driver, specifically, the driver(s) who made it between two consecutive split points in the fastest time, and no driver with a negative gap.
7.8.3 Ultimate Stage Rebasing
We can add an “ultimate” driver to the wide splits dataframe and then rebase as normal. The following function routinises the widening recipe we used above:
= function(df, col, valname){
ultimate_widen %>% select(splitname, all_of(col)) %>%
df pivot_wider(names_from = splitname,
values_from = col) %>%
mutate(code=valname)
}
We can get the wide form as:
= ultimate_widen(ultimate_splits_long,
ultimate_wide_elapsed 'ultimateElapsed', 'ultimate')
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(col)` instead of `col` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
ultimate_wide_elapsed
## # A tibble: 1 x 7
## split_1 split_2 split_3 split_4 split_5 split_N code
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 157.5 266.400 462.6 679.5 777.9 830.3 ultimate
We can now add that to our original dataframe and rebase against the ultimate stage to show how far off the ultimate stage pace each driver was.
Again, let’s make a routine of the process:
= function(df, ultimate_df, split_names,
ultimate_rebaser
ultimate_col ){%>%
df select(code, all_of(split_names)) %>%
bind_rows(ultimate_df) %>%
rebase(ultimate_col, split_names, id_col='code')
}
And let’s use that routine:
= full_splits_wide %>%
ultimate_accumulated ultimate_rebaser(ultimate_wide_elapsed,
'ultimate') %>%
split_names, head(10)
ultimate_accumulated
## code split_1 split_2 split_3 split_4 split_5 split_N
## 1 OGI 4.2 5.9 8.7 10.9 11.3 11.7
## 2 EVA 4.8 7.3 9.7 12.6 14.5 15.5
## 3 NEU 1.6 2.2 3.2 3.2 4.8 5.2
## 4 ROV 2.1 4.2 5.7 7.1 9.0 9.9
## 5 KAT 3.9 3.9 7.2 10.0 12.0 12.9
## 6 GRE 8.3 11.9 16.7 20.9 22.5 23.1
## 7 TÄN 0.0 0.9 1.2 1.7 3.7 4.2
## 8 SUN 4.3 8.1 11.7 14.7 16.3 16.6
## 9 BRE 0.9 3.9 8.6 8.9 10.6 11.6
## 10 SOL 6.1 8.7 8.7 8.8 8.8 8.8
We note there there is only one guaranteed difference of 0.0, from the driver with the fastest time at the first split, and that no times will be less than zero.
7.8.4 Actual Best Elapsed Time Rebasing
The full_splits_wide dataframe contains the elapsed times for each driver at each split so we can summarise the long form of that data to get the best actual recorded elapsed times:
= full_splits_wide %>%
actual_ultimate select(code, all_of(split_names)) %>%
# Make long
select(all_of(as.character(split_names)),
%>%
code) pivot_longer(as.character(split_names),
names_to = "splitname",
values_to = "sectionDurationS") %>%
group_by(splitname) %>%
summarise(actualUltimate = min(sectionDurationS,
na.rm=TRUE))
actual_ultimate
## # A tibble: 6 x 2
## splitname actualUltimate
## <chr> <dbl>
## 1 split_1 157.5
## 2 split_2 267.3
## 3 split_3 463.8
## 4 split_4 681.2
## 5 split_5 781.6
## 6 split_N 834.5
We can add this time to out ultimate times dataframe to provide an immediate point of reference between the actual best accumulated split times and the ultimate accumulated ideal split times:
$actual = actual_ultimate$actualUltimate
ultimate_splits_long ultimate_splits_long
## # A tibble: 6 x 4
## splitname ultimate ultimateElapsed actual
## <chr> <dbl> <dbl> <dbl>
## 1 split_1 157.5 157.5 157.5
## 2 split_2 108.9 266.400 267.3
## 3 split_3 196.2 462.6 463.8
## 4 split_4 216.9 679.5 681.2
## 5 split_5 98.4 777.9 781.6
## 6 split_N 52.40000 830.3 834.5
Let’s cast the data to a long format in readiness for rebasing it:
= ultimate_widen(ultimate_splits_long,
ultimate_wide_actual 'actual',
'ultimate')
ultimate_wide_actual
## # A tibble: 1 x 7
## split_1 split_2 split_3 split_4 split_5 split_N code
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 157.5 267.3 463.8 681.2 781.6 834.5 ultimate
And then rebase:
= full_splits_wide %>%
ultimate_actual ultimate_rebaser(ultimate_wide_actual,
'ultimate') %>%
split_names, head(10)
ultimate_actual
## code split_1 split_2 split_3 split_4 split_5 split_N
## 1 OGI 4.2 5.0 7.5 9.2 7.6 7.5
## 2 EVA 4.8 6.4 8.5 10.9 10.8 11.3
## 3 NEU 1.6 1.3 2.0 1.5 1.1 1.0
## 4 ROV 2.1 3.3 4.5 5.4 5.3 5.7
## 5 KAT 3.9 3.0 6.0 8.3 8.3 8.7
## 6 GRE 8.3 11.0 15.5 19.2 18.8 18.9
## 7 TÄN 0.0 0.0 0.0 0.0 0.0 0.0
## 8 SUN 4.3 7.2 10.5 13.0 12.6 12.4
## 9 BRE 0.9 3.0 7.4 7.2 6.9 7.4
## 10 SOL 6.1 7.8 7.5 7.1 5.1 4.6
In this case, there is at least one zero value per split corresponding to the drive(s) who recorded the fastest elapsed time up to each split point.
7.9 Visualising Rebased Times
There are various quick techniques we can use to help visualise the rebased split times and try to highlight significant patterns or peculiarities. For example, we can use coloured backgrounds to highlight each cell in a table, or sparklines to summarise each row.
7.9.1 Context Sensitive Cell Colouring
As a quick example, let’s first look at the split duration rebasing where we compare each driver’s time in getting from one split point to the next against the fastest completion of that distance:
Recall the divergent color tile formatter we met previously, reused here with a different colour sense:
= function(x){
xnormalize # Normalise to the full range of values about 0
# O will map to 0.5 in the normalised range
= c(x, -max(abs(x)), max(abs(x)))
x normalize(x)[1:(length(x)-2)]
}<- function (...) {
color_tile2 formatter("span", style = function(x) {
style(display = "block",
'text-align' = 'center',
padding = "0 4px",
`border-radius` = "4px",
`font.weight` = ifelse(abs(x)> 0.3*max(x), "bold", "normal"),
color = ifelse(abs(x)> 0.3*max(x),'white',
ifelse(x==0,'lightgrey','black')),
`background-color` = csscolor(matrix(as.integer(colorRamp(...)(xnormalize(as.numeric(x)))),
byrow=TRUE,
dimnames=list(c("green","red","blue"), NULL),
nrow=3)))
})}
We can use that to colour cells as a block (although we note that for ultimately rebased values we can never go below a zero value…). Pale colors are better (closer to the ultimate):
formattable(ultimate_between_split, align='c',
list(area(col = 2:7) ~ color_tile2(c("red",'white',
"forestgreen")),
entryId=FALSE))
code | split_1 | split_2 | split_3 | split_4 | split_5 | split_N |
---|---|---|---|---|---|---|
OGI | 4.2 | 1.7 | 2.8 | 2.2 | 0.4 | 0.4 |
EVA | 4.8 | 2.5 | 2.4 | 2.9 | 1.9 | 1.0 |
NEU | 1.6 | 0.6 | 1.0 | 0.0 | 1.6 | 0.4 |
ROV | 2.1 | 2.1 | 1.5 | 1.4 | 1.9 | 0.9 |
KAT | 3.9 | 0.0 | 3.3 | 2.8 | 2.0 | 0.9 |
GRE | 8.3 | 3.6 | 4.8 | 4.2 | 1.6 | 0.6 |
TÄN | 0.0 | 0.9 | 0.3 | 0.5 | 2.0 | 0.5 |
SUN | 4.3 | 3.8 | 3.6 | 3.0 | 1.6 | 0.3 |
BRE | 0.9 | 3.0 | 4.7 | 0.3 | 1.7 | 1.0 |
SOL | 6.1 | 2.6 | 0.0 | 0.1 | 0.0 | 0.0 |
We see that SOL was flying from the second split onwards, getting from one split to another in pretty much the fastest time after a relatively poor start.
The variation in columns may also have something interesting to say. SOL somehow made time against pretty much every between split 4 and 5, but in the other sections (apart from the short last section to finish), there is quite a lot o variability. Checking this view against a split sectioned route map might help us understand whether there were particular features of the route that might explain these differences.
Let’s compare that chart with how the accumulated stage time of each driver compares with the accumulated ultimate section times:
formattable(ultimate_accumulated, align='c',
list(area(col = 2:7) ~ color_tile2(c("red",'white',
"forestgreen")),
entryId=FALSE))
code | split_1 | split_2 | split_3 | split_4 | split_5 | split_N |
---|---|---|---|---|---|---|
OGI | 4.2 | 5.9 | 8.7 | 10.9 | 11.3 | 11.7 |
EVA | 4.8 | 7.3 | 9.7 | 12.6 | 14.5 | 15.5 |
NEU | 1.6 | 2.2 | 3.2 | 3.2 | 4.8 | 5.2 |
ROV | 2.1 | 4.2 | 5.7 | 7.1 | 9.0 | 9.9 |
KAT | 3.9 | 3.9 | 7.2 | 10.0 | 12.0 | 12.9 |
GRE | 8.3 | 11.9 | 16.7 | 20.9 | 22.5 | 23.1 |
TÄN | 0.0 | 0.9 | 1.2 | 1.7 | 3.7 | 4.2 |
SUN | 4.3 | 8.1 | 11.7 | 14.7 | 16.3 | 16.6 |
BRE | 0.9 | 3.9 | 8.6 | 8.9 | 10.6 | 11.6 |
SOL | 6.1 | 8.7 | 8.7 | 8.8 | 8.8 | 8.8 |
Here, we see that TAN was recording the best time compared the ultimate time as calculated against the sum of best split section times, but was still of the ultimate pace: it was his first split that made the difference.
Finally, let’s see how the driver’s actual split times varied against the best recorded split time at each split:
formattable(ultimate_actual, align='c',
list(area(col = 2:7) ~ color_tile2(c("red",'white',
"forestgreen")),
entryId=FALSE))
code | split_1 | split_2 | split_3 | split_4 | split_5 | split_N |
---|---|---|---|---|---|---|
OGI | 4.2 | 5.0 | 7.5 | 9.2 | 7.6 | 7.5 |
EVA | 4.8 | 6.4 | 8.5 | 10.9 | 10.8 | 11.3 |
NEU | 1.6 | 1.3 | 2.0 | 1.5 | 1.1 | 1.0 |
ROV | 2.1 | 3.3 | 4.5 | 5.4 | 5.3 | 5.7 |
KAT | 3.9 | 3.0 | 6.0 | 8.3 | 8.3 | 8.7 |
GRE | 8.3 | 11.0 | 15.5 | 19.2 | 18.8 | 18.9 |
TÄN | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
SUN | 4.3 | 7.2 | 10.5 | 13.0 | 12.6 | 12.4 |
BRE | 0.9 | 3.0 | 7.4 | 7.2 | 6.9 | 7.4 |
SOL | 6.1 | 7.8 | 7.5 | 7.1 | 5.1 | 4.6 |
Here, we see that TAN led the stage at each split point based on actual accumulated time.
7.9.2 Using Sparklines to Summarise Rebased Deltas
A quick way of summarising rebased times in a more space efficient way is to us a sparkline. As we have seen previously, these can be used added as an extra column alongside a row of data, or could be used to as a quick visual indicative summary of a row of values.
Let’s create a sparkline summarising each of the above tables.
First, the ultimate between split rebase:
= ultimate_between_split %>%
ultimate_between_split_spk gather(key ="Stage",
value ="Gap", split_names) %>%
group_by(code) %>%
summarize(ultimate_section = spk_chr(-Gap,
type ="bar"))
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(split_names)` instead of `split_names` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
spark_df(ultimate_between_split_spk)
How about for the accumulated ultimate split durations?
= ultimate_accumulated %>%
ultimate_accumulated_spk gather(key ="Stage",
value ="Gap", split_names) %>%
group_by(code) %>%
summarize(ultimate_acc = spk_chr(-Gap,
type ="bar"))
spark_df(ultimate_accumulated_spk)
And finally, compared to actual recorded best split time:
= ultimate_actual %>%
ultimate_actual_spk gather(key ="Stage",
value ="Gap", split_names) %>%
group_by(code) %>%
summarize(ultimate_actual = spk_chr(-Gap,
type ="bar"))
spark_df(ultimate_actual_spk)