7 Working With Split Times

In this chapter we’ll explore some of the ways in which we might start to work with split times data from a particular stage. On the one hand, we can consider split times purely on a time basis; on the other, we can explore the split times in terms of pace differences calculated by normalising the split times over split distances.

As with stage times, we can rebase split times to compare a specified driver directly with other drivers, not just the stage winner. We can also calculate how much time was spent on each split section by finding the difference between consecutive split times for a particular driver.

From the split section times, we can also calculate various derived measures, such as the ultimate possible stage time, based on the sum of fastest times to complete each split section.

Having access to split times also sets up the potential for rating the performance of a driver on each split against various stage and stage route metrics, such as road order or route metrics such as the “wiggliness” of the split section, although these will not be covered here.

7.1 Load Base Data

To get the splits data from a standing start, we can load in the current season list, select the rally we want, look up the itinerary from the rally, extract the sections and then the stages and the retrieve the stage ID for the stage we are interested in.

Let’s start by loading in key helper libraries:

source('code/wrc-api.R')
source('code/wrc-wrangling.R')
source('code/wrc-charts.R')

And getting some initial data:

s = get_active_season()
eventId = get_eventId_from_name(s, 'arctic')

itinerary = get_itinerary(eventId)
sections = get_sections(itinerary)
stages = get_stages(sections)
stages_lookup = get_stages_lookup(stages)

# For driver details
entries = get_rally_entries(eventId)
cars = get_car_data(entries)

As a working example, let’s define an example stage by its code:

stage_code = 'SS3'

Get a sample stage ID:

stageId = stages_lookup[[stage_code]]

7.2 Get Splits Data

The split times represented the accumulated time going the the stage at each split point.

The split times do not include the overall stage time, so we need to be mindful that if we want to report on a stage the split times in and of themselves do not contain any information about the final section of the stage between the final split point and the stage finish line.

We can load the splits data if we know the event and stage ID:

splits = get_splits(eventId, stageId)

The splits data actually comprises two dataframes in columns splitPoints and entrySplitPointTimes.

7.2.1 Split Locations

The splitPoints dataframe contains information about the splits locations:

splits_locations = splits$splitPoints
splits_locations %>% arrange(number) %>% head(2)
##   splitPointId stageId number distance
## 1         3615    1750      1     4.83
## 2         3601    1750      2     9.02

We can also generate a list of the split IDs:

splits_list = splits_locations$splitPointId
splits_list
## [1] 3593 3601 3615 3617 3621

We can retrieve the split codes ordered by the distance into the stage of each split point from a lookup on the split points:

get_split_cols = function(splits){
  split_cols =  as.character(arrange(splits$splitPoints,
                                     distance)$splitPointId)
  split_cols
}

split_cols = get_split_cols(splits)
split_cols
## [1] "3615" "3601" "3621" "3617" "3593"

Since the the split points dataframe does not including the final timing point location (i.e. the stage finish), we can get the full stage distance from the stages dataframe:

stages[stages['code']==stage_code, 'distance']
## [1] 24.43

7.2.2 Mapping Split Codes to Split Numbers

To provide a human readable version of the split identifiers, let’s map them onto a more meaningful label:

get_split_label = function(x){
  paste0('split_', splits_locations[splits_locations$splitPointId==x,
                                    'number'])
}

splits_locations$splitname = sapply(splits_locations$splitPointId,
                                 get_split_label)

splits_locations %>% head(3)
##   splitPointId stageId number distance splitname
## 1         3593    1750      5    23.21   split_5
## 2         3601    1750      2     9.02   split_2
## 3         3615    1750      1     4.83   split_1

We can generate a lookup list of split point names and IDs as:

get_stages_lookup(splits_locations, 'splitPointId', 'splitname')
##      3593      3601      3615      3617      3621 
## "split_5" "split_2" "split_1" "split_4" "split_3"

We can use this in a function that provides an annotated and ordered form of the split locations dataframe:

get_split_locations = function(splits){
  splits_locations = splits$splitPoints
  splits_locations$splitname = sapply(splits_locations$splitPointId,
                                      get_split_label)
  splits_locations %>%
    arrange(number)
}

splits_locations = get_split_locations(splits)
splits_locations
##   splitPointId stageId number distance splitname
## 1         3615    1750      1     4.83   split_1
## 2         3601    1750      2     9.02   split_2
## 3         3621    1750      3    14.87   split_3
## 4         3617    1750      4    20.63   split_4
## 5         3593    1750      5    23.21   split_5

Create a convenience list of splits names:

# We could create these as ordered factors?
split_names = splits_locations$splitname

split_names
## [1] "split_1" "split_2" "split_3" "split_4" "split_5"

7.2.3 Split Times

The second dataframe returned from the splits API call contains the splits times, accessed via the get_driver_splits() function defined previously and imported from the wrc-api.R file. The data is returned in a long format, with each row describing a single split time for a particular driver on the single stage the split times were retrieved for.

driver_splits = get_driver_splits(splits)
driver_splits %>% head(2)
##   splitPointTimeId splitPointId entryId elapsedDuration         splitDateTime
## 1           123482         3615   21540       PT2M41.4S 2021-02-27T07:10:41.4
## 2           123483         3601   21540       PT4M32.5S 2021-02-27T07:12:32.5
##            splitDateTimeLocal elapsedDurationS
## 1 2021-02-27T09:10:41.4+02:00            161.4
## 2 2021-02-27T09:12:32.5+02:00            272.5

The get_multi_split_times(stage_list) function can also provide us the long form data for multiple stages given one or more stage IDs.

7.2.4 Wide Driver Split Times

We can cast the the driver split points into a wide format using the split point codes, ordered by split distance into the stage, as the widened column names:

get_splits_wide = function(splits){
  driver_splits = get_driver_splits(splits)
  
  split_cols =  get_split_cols(splits)
  splits_cols = c('entryId', 'splitPointId', 'elapsedDurationS')
  
  splits_wide = driver_splits %>% 
    group_by(entryId) %>%
    select(all_of(splits_cols)) %>%
    tidyr::spread(key = splitPointId,
                  value = elapsedDurationS) %>%
    select(all_of(c('entryId', split_cols))) %>%
    # If we don't cast, it's a
    # non-rankable rowwise df
    as.data.frame()
  
  splits_wide
}

Rather than retrieve the split times into a long format, with one row per driver split, we can now retrieve the data into a wide form with one row per driver and a column for each split on the stage:

splits_wide = get_splits_wide(splits)

splits_wide %>% head(2)
##   entryId  3615  3601  3621  3617  3593
## 1   21530 161.7 272.3 471.3 690.4 789.2
## 2   21531 162.3 273.7 472.3 692.1 792.4

The following function, which closely resembles a similar function function for relabeling stage codes, allows us to rename the split points with more meaningful splits names:

map_split_codes = function(df, splits_list) {
  # Get stage codes lookup id->code
  splits_lookup_code = get_stages_lookup(splits_locations,
                                         'splitPointId', 'splitname')
  
  #https://stackoverflow.com/a/34299333/454773
  plyr::rename(df, replace = splits_lookup_code,
               warn_missing = FALSE)
}

For example:

splits_wide = get_splits_wide(splits) %>%
                map_split_codes(splits_list) %>% 
                map_driver_names(cars)

splits_wide %>% head(10)
##    code split_1 split_2 split_3 split_4 split_5
## 1   OGI   161.7   272.3   471.3   690.4   789.2
## 2   EVA   162.3   273.7   472.3   692.1   792.4
## 3   NEU   159.1   268.6   465.8   682.7   782.7
## 4   ROV   159.6   270.6   468.3   686.6   786.9
## 5   KAT   161.4   270.3   469.8   689.5   789.9
## 6   GRE   165.8   278.3   479.3   700.4   800.4
## 7   TÄN   157.5   267.3   463.8   681.2   781.6
## 8   SUN   161.8   274.5   474.3   694.2   794.2
## 9   BRE   158.4   270.3   471.2   688.4   788.5
## 10  SOL   163.6   275.1   471.3   688.3   786.7

We can also update our helper function to relabel stages with a more general function:

relabel_times_df2 = function(df, s_list, cars, typ='stage') {
  if (typ=='split')
    df = df %>% map_split_codes(s_list)
  else
    df = df %>% map_stage_codes(s_list)
  
  df %>%
    map_driver_names(cars)
}

7.3 Rebasing Split Times

The split times describe the split times recorded for each driver, but in many situations we may be interested in knowing the difference in split times for a specific driver relative to every other driver.

More formally, for drivers \(j\) on stage \(S\) and split \(s\), we find the rebased stage times relative to driver \(j\) as:

\[ {_{S,s}}t_{i}^{j} = {_{S,s}}t_{i} - {_{S,s}}t_{j} \] although we may want to negate that value depending on the sense of whether we want to focus on times from the selected driver’s perspective, or from the perspective of the field of drivers they are being rebased against. . To calculate the rebased times, we note that the wide dataframe format gives rows containing the split times for each driver, which is to say \({_{S,*}}t_i\).

To calculate the rebased times, we can simply subtract the row corresponding to the driver we want to rebase relative to, from the other driver rows.

Recall the heart of the rebase function we have previously defined:

#https://stackoverflow.com/a/32267785/454773
rebase_essence = function(df, id, rebase_cols,
                          id_col='entryId') {
  
  df_ =  df
  
  # The rebase values are the ones
  # we want to subtract from each row
  rebase_vals = c(df[df[[id_col]]==id, rebase_cols])
  
  # Do the rebasing
  df_[,rebase_cols] =  df[,rebase_cols] - rebase_vals
    
  df_
}

Let’s try the original function with an example driver:

ogierEntryId = get_person_id(cars, 'ogier', ret='code')

ogier_rebased = rebase(splits_wide, ogierEntryId,
                       split_names, id_col='code')

ogier_rebased %>% head(10)
##    code split_1 split_2 split_3 split_4 split_5
## 1   OGI     0.0     0.0     0.0     0.0     0.0
## 2   EVA     0.6     1.4     1.0     1.7     3.2
## 3   NEU    -2.6    -3.7    -5.5    -7.7    -6.5
## 4   ROV    -2.1    -1.7    -3.0    -3.8    -2.3
## 5   KAT    -0.3    -2.0    -1.5    -0.9     0.7
## 6   GRE     4.1     6.0     8.0    10.0    11.2
## 7   TÄN    -4.2    -5.0    -7.5    -9.2    -7.6
## 8   SUN     0.1     2.2     3.0     3.8     5.0
## 9   BRE    -3.3    -2.0    -0.1    -2.0    -0.7
## 10  SOL     1.9     2.8     0.0    -2.1    -2.5

7.4 Visualising Rebased Split Times Using Sparklines

If we cast the data back to a tidy long form data, we can easily generate a graphical summary from the long form data:

library(tidyr)

ogier_rebased_long <- ogier_rebased %>%
                        gather(key ="Split",
                               value ="TimeInS",
                               all_of(split_names))

ogier_rebased_long %>% head(10)
##    code   Split TimeInS
## 1   OGI split_1     0.0
## 2   EVA split_1     0.6
## 3   NEU split_1    -2.6
## 4   ROV split_1    -2.1
## 5   KAT split_1    -0.3
## 6   GRE split_1     4.1
## 7   TÄN split_1    -4.2
## 8   SUN split_1     0.1
## 9   BRE split_1    -3.3
## 10  SOL split_1     1.9

For example, we can use the sparkline::spk_chr() function to generate an HTML sparkline widget that we can embed in a formattable::formattable() generated table:

library(formattable)
library(sparkline)

ogier_sparkline <- ogier_rebased_long %>%
                      group_by(code) %>%
                      summarize(spk_ = spk_chr(TimeInS, type ="bar"))

# We need to create an htmlwidget form of the table
out = as.htmlwidget(formattable(head(ogier_sparkline, 5)))

# The table also has a requirement on the sparkline package
out$dependencies = c(out$dependencies,
                     htmlwidgets:::widget_dependencies("sparkline",
                                                       "sparkline"))
out

7.5 Finding the Rank Position at Each Split Point

It can often be tricky to work out the rank at each split by eye, so let’s create a simple function to display the rank at each split for us:

get_split_rank = function(df, split_cols){
  # We need to drop any list names
  split_names = as.character(split_names)
  
  df %>% mutate(across( split_cols, dense_rank ))
}


get_split_rank(splits_wide, split_names) %>% head(5)
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(split_cols)` instead of `split_cols` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
##   code split_1 split_2 split_3 split_4 split_5
## 1  OGI       6       5       6       7       6
## 2  EVA       8       7       8       8       9
## 3  NEU       3       2       2       2       2
## 4  ROV       4       4       3       3       4
## 5  KAT       5       3       4       6       7

7.6 Finding Split Section Durations

Inspection of the split times data show the split times to be a strictly increasing function over the ordered in-stage split locations. That is, the times represent the time in stage up to that split point, rather than the fractional time taken to get from one split timing point to another.

In many cases, it will be convenient to know how much time a driver took to get from one split point to the next, not least because this allows us to identify sections of the stage where a driver may have particularly gained or lost time, or to identify where a driver may be making or losing time consistently across different parts of the stage.

In abstract terms, then, what we want to calculate is the time taken for a driver \(i\) on stage \(S\) to go between two split points, \(m\) and \(n\), where \(m=0\) is the start, \(m={_S}s_{max}+1\) is the stage end, and \({_S}s_{max}\) is the number of split points on stage \(S\):

\[ {_{S,m,n}}t_{i} = {_{S,n}}t_{i} - {_{S,m}}t_{i}: 0{\le}m<n{\le}{_S}s_{max} \]

For a specific, known stage, we might write the simpler:

\[ S={stagenum}; {_{m,n}}t_{i} = {_{n}}t_{i} - {_{m}}t_{i}: 0{\le}m<n{\le}s_{max} \]

For a driver, i, we note that the accumulated stage time on stage \(S\) as given by the split times is:

\[ S={stagenum};t_{i}=\sum_{s=0}^{s{_{max}}}{_{s,s+1}}t_{i}: \]

To get the duration between two split points, we can create two temporary dataframes, one representing the original split times without the first split \({_{{s+1},s_{max}}}t\), one representing the split times without the last split, \({_{s,s_{max}-1}}t\). Subtracting one dataframe from the other this finds the difference across all consecutive columns:

\[ {_{{s+1},s_{max}}}t - {_{s,s_{max}-1}}t \]

Let’s see how that works in practice:

#https://stackoverflow.com/a/50411529/454773

get_split_duration = function(df, split_cols,
                              retId=TRUE, id_col='entryId') {
  
  # Drop names if they are set
  split_cols = as.character(split_cols)
  
  # [-1] drops the first column, [-ncol()] drops the last
  df_ = df[,split_cols][-1] - df[,split_cols][-ncol(df[,split_cols])]
  
  # The split time to the first split is simply the first split time
  df_[split_cols[1]] = df[split_cols[1]]
  
  if (retId) {
    # Add in the entryId column
    df_[[id_col]] = df[[id_col]]
  
    # Return the dataframe in a sensible column order
    df_ %>% select(c(id_col, all_of(split_cols)))
  } else {
    df_
  }
  
}

Let’s see how that works:

split_durations_wide = get_split_duration(splits_wide,
                                          split_names, id_col='code')
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(id_col)` instead of `id_col` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
split_durations_wide %>% head(5)
##   code split_1 split_2 split_3 split_4 split_5
## 1  OGI   161.7   110.6   199.0   219.1    98.8
## 2  EVA   162.3   111.4   198.6   219.8   100.3
## 3  NEU   159.1   109.5   197.2   216.9   100.0
## 4  ROV   159.6   111.0   197.7   218.3   100.3
## 5  KAT   161.4   108.9   199.5   219.7   100.4

7.6.1 Finding Split Section Ranks

To find the rank in terms of which driver completed each stage section in the quickest time, we can simply pass the split_durations_wide dataframe rather than the split_durations dataframe to the get_split_rank() function:

get_split_rank(split_durations_wide, split_names) %>% head(5)
##   code split_1 split_2 split_3 split_4 split_5
## 1  OGI       6       4       6       6       3
## 2  EVA       8       7       5       8       7
## 3  NEU       3       2       3       1       4
## 4  ROV       4       5       4       5       7
## 5  KAT       5       1       8       7       8

7.7 Adding Overall Stage Time to the Split Times

It is important to note that the split times data does not contain all the timing data for the stage, just the times relating to split points along the stage route. For a complete summary of stage timing data, we also need to add in the overall stage time from the stage results table:

stage_times = get_stage_times(eventId, stageId)

stage_times %>% head(2)
##   stageTimeId stageId entryId elapsedDurationMs  elapsedDuration    status
## 1       96580    1750   21536            834500 00:13:54.5000000 Completed
## 2       96474    1750   21532            835500 00:13:55.5000000 Completed
##    source position diffFirstMs diffFirst diffPrevMs diffPrev
## 1 Default        1           0  00:00:00          0 00:00:00
## 2 Default        2        1000  00:00:01       1000 00:00:01

Recall that if required we can also retrieve stage time for multiple stages using the get_multi_stage_times(stagelist) function. The get_stage_list(stages) function will return a list of all stage IDs.

If we merge each driver’s stage times as an extra, final column to the wide split times dataframe, we can calculate the split section durations over the whole stage, including the time taken to get from the final split to the stage end.

Recalling that driver codes may not be unique, we should use the unique entryId values to create the extended dataframe:

widen_splits_stage_times = function(splits_wide, stage_times,
                                    id_col='entryId'){
  
  results_cols = c('elapsedDurationMs', id_col,  'diffFirstMs', 'position')

  splits_wide = splits_wide %>%
                    merge(stage_times[,results_cols],
                          by = 'entryId') %>%
                    mutate(split_N = elapsedDurationMs/1000)

  splits_wide
}

full_splits_wide = get_splits_wide(splits) %>%
                      widen_splits_stage_times(stage_times) %>%
                      map_split_codes(splits_list) %>% 
                      map_driver_names(cars)

full_splits_wide %>% head(2)
##   code split_1 split_2 split_3 split_4 split_5 elapsedDurationMs diffFirstMs
## 1  OGI   161.7   272.3   471.3   690.4   789.2            842000        7500
## 2  EVA   162.3   273.7   472.3   692.1   792.4            845800       11300
##   position split_N
## 1        6   842.0
## 2        9   845.8

To make further processing easier, we add the overall stage time to the list of split time column names. The “final split” is now the completed stage time:

split_names = c(split_names, 'split_N')
split_names
## [1] "split_1" "split_2" "split_3" "split_4" "split_5" "split_N"

7.8 Calculating the Ultimate Stage from Ultimate Split Times

The ultimate stage time for a stage is the sum of the fastest sectional split times on the stage as recorded by any driver. Rebasing the stage winner’s time against the ultimate stage time shows whether the driver potentially left time on the stage. (Of course, it might be that a very fast sectional time recorded by one driver may have wiped out their tyres and led to a relatively poor overall stage time, or risk taking that ended their stage prematurely…)

So how can we calculate the ultimate splits? For split section times \({_{S,s}}t_{i}\), the ultimate section time \({_{S,s}}u\) is given as:

\[ {_{S,s}}u = \forall i: min({_{S,s}}t_i) \]

We can calculate the times by case the wide split section duration dataframe to a long form, grouping by the splitPointId and then summarising the minimum time in each group.

Here’s how we can create the long form dataset:

full_splits_wide %>%
      #gather() is deprecated / retired...
      #gather(splitPointId, sectionDurationS,
      #                 as.character(split_names))
      select(all_of(as.character(split_names)), code) %>%
      pivot_longer(as.character(split_names),
                   names_to = "splitname",
                   values_to = "sectionDurationS") %>%
      head(3)
## # A tibble: 3 x 3
##   code  splitname sectionDurationS
##   <chr> <chr>                <dbl>
## 1 OGI   split_1            161.700
## 2 OGI   split_2            272.3  
## 3 OGI   split_3            471.3

We can also get the duration of each section:

full_durations_wide = get_split_duration(full_splits_wide,
                                          split_names, id_col='code')

full_durations_wide %>% head()
##   code split_1 split_2 split_3 split_4 split_5 split_N
## 1  OGI   161.7   110.6   199.0   219.1    98.8    52.8
## 2  EVA   162.3   111.4   198.6   219.8   100.3    53.4
## 3  NEU   159.1   109.5   197.2   216.9   100.0    52.8
## 4  ROV   159.6   111.0   197.7   218.3   100.3    53.3
## 5  KAT   161.4   108.9   199.5   219.7   100.4    53.3
## 6  GRE   165.8   112.5   201.0   221.1   100.0    53.0

To rebase on on an ultimate time basis, it helps to think of an ultimate driver whom we define as having a time equivalent to the fastest split duration time between two split points.

If we group by splitPointId, we can summarise on sectionDurationS to find the minimum duration at each split; we can also take the opportunity to add an accumulated stage time column at each split point as well:

ultimate_splits_long = full_durations_wide %>%
                     pivot_longer(all_of(split_names),
                                  names_to = "splitname",
                                  values_to = "sectionDurationS") %>%
                     select(splitname, sectionDurationS) %>%
                     # Defensive measure
                     filter(!is.na(sectionDurationS) & sectionDurationS>0) %>%
                     group_by(splitname) %>% 
                     summarise(ultimate = min(sectionDurationS,
                                              na.rm = TRUE)) %>%
                     mutate(ultimateElapsed = cumsum(ultimate))

ultimate_splits_long
## # A tibble: 6 x 3
##   splitname  ultimate ultimateElapsed
##   <chr>         <dbl>           <dbl>
## 1 split_1   157.5             157.5  
## 2 split_2   108.9             266.400
## 3 split_3   196.2             462.6  
## 4 split_4   216.9             679.5  
## 5 split_5    98.4             777.9  
## 6 split_N    52.40000         830.3

Rebasing arbitrary drivers against the ultimate stage (and the distribution of times recorded by other drivers) may give an idea of which drivers were pushing on what parts of a stage and where they were being more cautious.

Note that ultimate split times can be used to create an ultimate stage time time that can itself contribute to the ultimate ultimate rally time (sic).

7.8.1 Rebasing to Ultimate Split Times

We can rebase to the ultimate split times in three senses:

  • on an ideal, ultimate per split basis;
  • on an ideal elapsed time basis (the cumulative sum of ideal ultimate split durations);
  • on an actual best elapsed (stage) time basis at each split, rebasing relative to the minimum actual recorded elapsed time at each split.

7.8.2 Per Split Ultimate Rebasing

To rebase on a per split basis, we can simply rebase durations relative to the minimum split duration at each split.

Let’s reshape the ultimate driver durations to a wide format:

ultimate_wide_durations = ultimate_splits_long %>% 
                  select(splitname, ultimate) %>%
                  pivot_wider(names_from = splitname,
                              values_from = ultimate) %>%
                  mutate(code='ultimate')

ultimate_wide_durations
## # A tibble: 1 x 7
##   split_1 split_2 split_3 split_4 split_5  split_N code    
##     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>    <dbl> <chr>   
## 1   157.5   108.9   196.2   216.9    98.4 52.40000 ultimate

We can now add the ultimate driver split durations to the full_durations_wide data and rebase against this dummy driver:

ultimate_between_split = full_durations_wide %>%
                            select(code, all_of(split_names)) %>% 
                            bind_rows(ultimate_wide_durations) %>%
                            rebase('ultimate', split_names,
                                   id_col='code') %>%
                            head(10)

ultimate_between_split
##    code split_1 split_2 split_3 split_4 split_5 split_N
## 1   OGI     4.2     1.7     2.8     2.2     0.4     0.4
## 2   EVA     4.8     2.5     2.4     2.9     1.9     1.0
## 3   NEU     1.6     0.6     1.0     0.0     1.6     0.4
## 4   ROV     2.1     2.1     1.5     1.4     1.9     0.9
## 5   KAT     3.9     0.0     3.3     2.8     2.0     0.9
## 6   GRE     8.3     3.6     4.8     4.2     1.6     0.6
## 7   TÄN     0.0     0.9     0.3     0.5     2.0     0.5
## 8   SUN     4.3     3.8     3.6     3.0     1.6     0.3
## 9   BRE     0.9     3.0     4.7     0.3     1.7     1.0
## 10  SOL     6.1     2.6     0.0     0.1     0.0     0.0

In this case we note there is at least one driver in each column with a zero gap to the ultimate driver, specifically, the driver(s) who made it between two consecutive split points in the fastest time, and no driver with a negative gap.

7.8.3 Ultimate Stage Rebasing

We can add an “ultimate” driver to the wide splits dataframe and then rebase as normal. The following function routinises the widening recipe we used above:

ultimate_widen = function(df, col, valname){
  df %>% select(splitname, all_of(col)) %>%
              pivot_wider(names_from = splitname,
                          values_from = col) %>%
              mutate(code=valname)
}

We can get the wide form as:

ultimate_wide_elapsed = ultimate_widen(ultimate_splits_long,
                                       'ultimateElapsed', 'ultimate')
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(col)` instead of `col` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
ultimate_wide_elapsed
## # A tibble: 1 x 7
##   split_1 split_2 split_3 split_4 split_5 split_N code    
##     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>   
## 1   157.5 266.400   462.6   679.5   777.9   830.3 ultimate

We can now add that to our original dataframe and rebase against the ultimate stage to show how far off the ultimate stage pace each driver was.

Again, let’s make a routine of the process:

ultimate_rebaser = function(df, ultimate_df, split_names,
                            ultimate_col ){
  df %>%
    select(code, all_of(split_names)) %>% 
  bind_rows(ultimate_df) %>%
  rebase(ultimate_col, split_names, id_col='code')
}

And let’s use that routine:

ultimate_accumulated = full_splits_wide %>% 
                          ultimate_rebaser(ultimate_wide_elapsed,
                                         split_names, 'ultimate') %>%
                          head(10)

ultimate_accumulated
##    code split_1 split_2 split_3 split_4 split_5 split_N
## 1   OGI     4.2     5.9     8.7    10.9    11.3    11.7
## 2   EVA     4.8     7.3     9.7    12.6    14.5    15.5
## 3   NEU     1.6     2.2     3.2     3.2     4.8     5.2
## 4   ROV     2.1     4.2     5.7     7.1     9.0     9.9
## 5   KAT     3.9     3.9     7.2    10.0    12.0    12.9
## 6   GRE     8.3    11.9    16.7    20.9    22.5    23.1
## 7   TÄN     0.0     0.9     1.2     1.7     3.7     4.2
## 8   SUN     4.3     8.1    11.7    14.7    16.3    16.6
## 9   BRE     0.9     3.9     8.6     8.9    10.6    11.6
## 10  SOL     6.1     8.7     8.7     8.8     8.8     8.8

We note there there is only one guaranteed difference of 0.0, from the driver with the fastest time at the first split, and that no times will be less than zero.

7.8.4 Actual Best Elapsed Time Rebasing

The full_splits_wide dataframe contains the elapsed times for each driver at each split so we can summarise the long form of that data to get the best actual recorded elapsed times:

actual_ultimate = full_splits_wide %>% 
                      select(code, all_of(split_names)) %>% 
                      # Make long
                      select(all_of(as.character(split_names)),
                             code) %>%
                      pivot_longer(as.character(split_names),
                                   names_to = "splitname",
                                   values_to = "sectionDurationS") %>%
                      group_by(splitname) %>%
                      summarise(actualUltimate = min(sectionDurationS,
                                                     na.rm=TRUE))

actual_ultimate
## # A tibble: 6 x 2
##   splitname actualUltimate
##   <chr>              <dbl>
## 1 split_1            157.5
## 2 split_2            267.3
## 3 split_3            463.8
## 4 split_4            681.2
## 5 split_5            781.6
## 6 split_N            834.5

We can add this time to out ultimate times dataframe to provide an immediate point of reference between the actual best accumulated split times and the ultimate accumulated ideal split times:

ultimate_splits_long$actual = actual_ultimate$actualUltimate
ultimate_splits_long
## # A tibble: 6 x 4
##   splitname  ultimate ultimateElapsed actual
##   <chr>         <dbl>           <dbl>  <dbl>
## 1 split_1   157.5             157.5    157.5
## 2 split_2   108.9             266.400  267.3
## 3 split_3   196.2             462.6    463.8
## 4 split_4   216.9             679.5    681.2
## 5 split_5    98.4             777.9    781.6
## 6 split_N    52.40000         830.3    834.5

Let’s cast the data to a long format in readiness for rebasing it:

ultimate_wide_actual = ultimate_widen(ultimate_splits_long,
                                      'actual',
                                      'ultimate')

ultimate_wide_actual
## # A tibble: 1 x 7
##   split_1 split_2 split_3 split_4 split_5 split_N code    
##     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl> <chr>   
## 1   157.5   267.3   463.8   681.2   781.6   834.5 ultimate

And then rebase:

ultimate_actual = full_splits_wide %>% 
                      ultimate_rebaser(ultimate_wide_actual,
                                       split_names, 'ultimate') %>%
                      head(10)

ultimate_actual
##    code split_1 split_2 split_3 split_4 split_5 split_N
## 1   OGI     4.2     5.0     7.5     9.2     7.6     7.5
## 2   EVA     4.8     6.4     8.5    10.9    10.8    11.3
## 3   NEU     1.6     1.3     2.0     1.5     1.1     1.0
## 4   ROV     2.1     3.3     4.5     5.4     5.3     5.7
## 5   KAT     3.9     3.0     6.0     8.3     8.3     8.7
## 6   GRE     8.3    11.0    15.5    19.2    18.8    18.9
## 7   TÄN     0.0     0.0     0.0     0.0     0.0     0.0
## 8   SUN     4.3     7.2    10.5    13.0    12.6    12.4
## 9   BRE     0.9     3.0     7.4     7.2     6.9     7.4
## 10  SOL     6.1     7.8     7.5     7.1     5.1     4.6

In this case, there is at least one zero value per split corresponding to the drive(s) who recorded the fastest elapsed time up to each split point.

7.9 Visualising Rebased Times

There are various quick techniques we can use to help visualise the rebased split times and try to highlight significant patterns or peculiarities. For example, we can use coloured backgrounds to highlight each cell in a table, or sparklines to summarise each row.

7.9.1 Context Sensitive Cell Colouring

As a quick example, let’s first look at the split duration rebasing where we compare each driver’s time in getting from one split point to the next against the fastest completion of that distance:

Recall the divergent color tile formatter we met previously, reused here with a different colour sense:

xnormalize = function(x){
  # Normalise to the full range of values about 0
  # O will map to 0.5 in the normalised range
  x = c(x, -max(abs(x)), max(abs(x)))
  normalize(x)[1:(length(x)-2)]
}
color_tile2 <- function (...) {
  formatter("span", style = function(x) {
    style(display = "block",
          'text-align' = 'center',
          padding = "0 4px", 
          `border-radius` = "4px",
          `font.weight` = ifelse(abs(x)> 0.3*max(x), "bold", "normal"),
          color = ifelse(abs(x)> 0.3*max(x),'white',
                         ifelse(x==0,'lightgrey','black')),
          `background-color` = csscolor(matrix(as.integer(colorRamp(...)(xnormalize(as.numeric(x)))), 
                byrow=TRUE, 
                dimnames=list(c("green","red","blue"), NULL),
                nrow=3)))
  })}

We can use that to colour cells as a block (although we note that for ultimately rebased values we can never go below a zero value…). Pale colors are better (closer to the ultimate):

formattable(ultimate_between_split, align='c',
            list(area(col = 2:7) ~ color_tile2(c("red",'white',
                                                 "forestgreen")),
                 entryId=FALSE))
code split_1 split_2 split_3 split_4 split_5 split_N
OGI 4.2 1.7 2.8 2.2 0.4 0.4
EVA 4.8 2.5 2.4 2.9 1.9 1.0
NEU 1.6 0.6 1.0 0.0 1.6 0.4
ROV 2.1 2.1 1.5 1.4 1.9 0.9
KAT 3.9 0.0 3.3 2.8 2.0 0.9
GRE 8.3 3.6 4.8 4.2 1.6 0.6
TÄN 0.0 0.9 0.3 0.5 2.0 0.5
SUN 4.3 3.8 3.6 3.0 1.6 0.3
BRE 0.9 3.0 4.7 0.3 1.7 1.0
SOL 6.1 2.6 0.0 0.1 0.0 0.0

We see that SOL was flying from the second split onwards, getting from one split to another in pretty much the fastest time after a relatively poor start.

The variation in columns may also have something interesting to say. SOL somehow made time against pretty much every between split 4 and 5, but in the other sections (apart from the short last section to finish), there is quite a lot o variability. Checking this view against a split sectioned route map might help us understand whether there were particular features of the route that might explain these differences.

Let’s compare that chart with how the accumulated stage time of each driver compares with the accumulated ultimate section times:

formattable(ultimate_accumulated, align='c',
            list(area(col = 2:7) ~ color_tile2(c("red",'white',
                                                 "forestgreen")),
                 entryId=FALSE))
code split_1 split_2 split_3 split_4 split_5 split_N
OGI 4.2 5.9 8.7 10.9 11.3 11.7
EVA 4.8 7.3 9.7 12.6 14.5 15.5
NEU 1.6 2.2 3.2 3.2 4.8 5.2
ROV 2.1 4.2 5.7 7.1 9.0 9.9
KAT 3.9 3.9 7.2 10.0 12.0 12.9
GRE 8.3 11.9 16.7 20.9 22.5 23.1
TÄN 0.0 0.9 1.2 1.7 3.7 4.2
SUN 4.3 8.1 11.7 14.7 16.3 16.6
BRE 0.9 3.9 8.6 8.9 10.6 11.6
SOL 6.1 8.7 8.7 8.8 8.8 8.8

Here, we see that TAN was recording the best time compared the ultimate time as calculated against the sum of best split section times, but was still of the ultimate pace: it was his first split that made the difference.

Finally, let’s see how the driver’s actual split times varied against the best recorded split time at each split:

formattable(ultimate_actual, align='c',
            list(area(col = 2:7) ~ color_tile2(c("red",'white',
                                                 "forestgreen")),
                 entryId=FALSE))
code split_1 split_2 split_3 split_4 split_5 split_N
OGI 4.2 5.0 7.5 9.2 7.6 7.5
EVA 4.8 6.4 8.5 10.9 10.8 11.3
NEU 1.6 1.3 2.0 1.5 1.1 1.0
ROV 2.1 3.3 4.5 5.4 5.3 5.7
KAT 3.9 3.0 6.0 8.3 8.3 8.7
GRE 8.3 11.0 15.5 19.2 18.8 18.9
TÄN 0.0 0.0 0.0 0.0 0.0 0.0
SUN 4.3 7.2 10.5 13.0 12.6 12.4
BRE 0.9 3.0 7.4 7.2 6.9 7.4
SOL 6.1 7.8 7.5 7.1 5.1 4.6

Here, we see that TAN led the stage at each split point based on actual accumulated time.

7.9.2 Using Sparklines to Summarise Rebased Deltas

A quick way of summarising rebased times in a more space efficient way is to us a sparkline. As we have seen previously, these can be used added as an extra column alongside a row of data, or could be used to as a quick visual indicative summary of a row of values.

Let’s create a sparkline summarising each of the above tables.

First, the ultimate between split rebase:

ultimate_between_split_spk = ultimate_between_split %>%
                                gather(key ="Stage",
                                value ="Gap", split_names) %>%
                                group_by(code) %>%
                                summarize(ultimate_section = spk_chr(-Gap,
                                                         type ="bar"))
## Note: Using an external vector in selections is ambiguous.
## ℹ Use `all_of(split_names)` instead of `split_names` to silence this message.
## ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
spark_df(ultimate_between_split_spk)

How about for the accumulated ultimate split durations?

ultimate_accumulated_spk = ultimate_accumulated %>%
                              gather(key ="Stage",
                              value ="Gap", split_names) %>%
                              group_by(code) %>%
                              summarize(ultimate_acc = spk_chr(-Gap,
                                                               type ="bar"))

spark_df(ultimate_accumulated_spk)

And finally, compared to actual recorded best split time:

ultimate_actual_spk = ultimate_actual %>%
                          gather(key ="Stage",
                                 value ="Gap", split_names) %>%
                          group_by(code) %>%
                          summarize(ultimate_actual = spk_chr(-Gap,
                                                              type ="bar"))

spark_df(ultimate_actual_spk)