allClazz Data

`allClazz` Data#

Building on what we learned in the previous chapter, we can quickly make a start on previewing the allClazz data.

Reviewing the available data feeds in the browser web tools when viewing pages on the Dakar live results site, we see that the URLs for the class data use path elements of the form allClazz-2025-A, allClazz-2025-M and so on. From inpecting the category data in the previous chapter, we know the available category codes are A (auto/car), M (moto/bike), K (classic), F (Future Mission).

So let’s make a start by reviewing the data feed for the auto/car category.

# Load in the required packages
import pandas as pd
from jupyterlite_simple_cors_proxy import furl, xurl

dakar_api_template = "https://www.dakar.live.worldrallyraidchampionship.com/api/{path}"

# Define the year
YEAR = 2025
# Define the category
CATEGORY = "A"

# Define the API path to the car clazz resource
# Use a Python f-string to instantiate variable values directly
clazz_path = f"allClazz-{YEAR}-{CATEGORY}"

# Define the URL
clazz_url = dakar_api_template.format(path=clazz_path)

# Preview the path and the URL
clazz_path, clazz_url

('allClazz-2025-A',
 'https://www.dakar.live.worldrallyraidchampionship.com/api/allClazz-2025-A')

Assuming that the data feed is a JSON data feed, let’s try and load it into a dataframe as such:

# Load the data
# Use furl() to handle CORS issues in Jupyterlite
clazz_df = pd.read_json(furl(clazz_url))

# Preview the data
clazz_df.head()

	liveDisplay	promotionalDisplay	reference	categoryClazzLangs	label	updatedAt	position	shortLabel	_bind	_origin	_id	_key	_updatedAt	_parent	$group	_gets	color	tinyLabel	categoryGroupLangs
0	False	True	2025-A-T3-U	[{'text': 'T3.U: "Ultimate" Lightweight Protot...	U	2025-01-05T20:25:31+01:00	1	cat.name.A_T3_U	allClazz-2025-A	categoryClazz-2025-A-T3	18af44f476a4dc9363554ccfe1a9b9fe	_id	1737386138270	categoryGroup-2025-A:15f329900afa29e3e6b099ae6...	categoryGroup-2025-A:15f329900afa29e3e6b099ae6...	{'group': '$group'}	NaN	NaN	NaN
1	False	True	2025-A-T3-1	[{'variable': 'cat.name.A_T3_1', 'locale': 'en...	1	2025-01-05T20:25:31+01:00	0	cat.name.A_T3_1	allClazz-2025-A	categoryClazz-2025-A-T3	a0a6386a4b9a61b73b036a50966345c0	_id	1737386138270	categoryGroup-2025-A:15f329900afa29e3e6b099ae6...	categoryGroup-2025-A:15f329900afa29e3e6b099ae6...	{'group': '$group'}	NaN	NaN	NaN
2	False	False	2025-A-T4-T4	[{'locale': 'en', 'text': 'T4: Modified Produc...	T4	2025-01-05T20:25:31+01:00	3	cat.name.A_T4_T4	allClazz-2025-A	categoryClazz-2025-A-T4	058d77cc7db191813c30a902a8d5ba7c	_id	1737386138221	categoryGroup-2025-A:423ea731fdcba5cda62c83349...	categoryGroup-2025-A:423ea731fdcba5cda62c83349...	{'group': '$group'}	NaN	NaN	NaN
3	False	False	2025-A-T4-NO	[{'text': 'T4: Modified Production SSV', 'vari...	NO	2025-01-05T20:25:31+01:00	0	cat.name.A_T4_NO	allClazz-2025-A	categoryClazz-2025-A-T4	0ec1b5373f8c1fb5ff70ea0590e16c50	_id	1737386138221	categoryGroup-2025-A:423ea731fdcba5cda62c83349...	categoryGroup-2025-A:423ea731fdcba5cda62c83349...	{'group': '$group'}	NaN	NaN	NaN
4	False	False	2025-A-T4-SSV2	[{'text': 'SSV2', 'locale': 'en', 'variable': ...	SSV2	2025-01-05T20:25:31+01:00	2	cat.name.A_T4_SSV2	allClazz-2025-A	categoryClazz-2025-A-T4	23ae09bc22535129a9af1e6b3071bc2c	_id	1737386138221	categoryGroup-2025-A:423ea731fdcba5cda62c83349...	categoryGroup-2025-A:423ea731fdcba5cda62c83349...	{'group': '$group'}	NaN	NaN	NaN

Let’s simply the _origin class to give us an identifier that more closely matches the form of the reference identifier (it looks like the _origin defines a category one level of abstraction up).

clazz_df['categoryClazz'] = clazz_df["_origin"].str.replace("categoryClazz-", "")

A quick preview of the data suggests once again we have a multiplicity of language labels. It’s going to be a bit of a faff if we have to keep unpacking these into a longo form, reshaping them to a wide form, and then merging them back into the original dataframe.

But code is for nothing if not automating out repeated tasks, so let’s create a way of doing that.

As with many other programming languages, Python allows you to define your own named functions or procedures. We have already seen how the pandas package contains a routines for loading data and working with dataframes, but doesn’t seem to offer a one-liner off-the-shelf that addresses our immediate concern.

So let’s fix that.

We already know what we want to do, and have identified the steps for doing it in the previous chapter. So let’s wrap those steps into a single function that we can apply straightforwardly.

# We define a function by a using the `def` statement.
# The function signature identifies required and optional parameters.
# In this case, we require a dataframe and the name of the column
# we want to reshape. We also (optionally) identify the column
# that we want to merge against. By default, this is "shortLabel".
def mergeInLangLabels(df, col, key="shortLabel"):
    # Unpack the lists of labels into their own rows
    # to give a long dataframe.
    longLabels = pd.json_normalize(df[col].explode())

    # This is the only new bit
    # If there are no labels, we may get empty rows
    # or rows filled with null / NA values in the long datafreme.
    # So we can pre-emptively drop such rows if they appear.
    longLabels.dropna(axis="index", how="all", inplace=True)
    # If we don't drop the empty rows, we may get issues
    # in the pivot stage.

    # Reshape the long dataframe to a wide dataframe by pivoting
    # the locale to column names using text values, and using
    # the category (variable) as the row index.
    wideLabels = longLabels.pivot(
        index='variable',
        columns='locale',
        values='text',
    ).reset_index()

    # Merge the data back in to the original dataframe
    _df = pd.merge(df, wideLabels,
                   left_on=key, right_on='variable')

    # Tidy up the dataframe by dropping the now redundant columns
    _df.drop("variable", axis=1, inplace=True)
    # If we pass in a column named "variable" trying to drop it
    # again will cause an error; so ignore any error...
    _df.drop(col, axis=1, inplace=True, errors="ignore")

    return _df

We can now generate our expanded, labelled data frame from a single line:

# Update the dataframe by using our new function to
# merge in the exploded and widenened language labels
clazz_df = mergeInLangLabels(clazz_df, "categoryClazzLangs")

# Preview the dataframe, limited to a few illustrative columns
clazz_df[["shortLabel", "en", "fr"]].head()

	shortLabel	en	fr
0	cat.name.A_T3_U	T3.U: "Ultimate" Lightweight Prototype Cross-C...	T3.U: Véhicules Tout-terrain Prototype léger "...
1	cat.name.A_T3_1	T3.1: Lightweight Prototype Cross-Country	T3.1: Véhicules Tout-terrain Prototype léger
2	cat.name.A_T4_T4	T4: Modified Production SSV	T4 SSV de série modifié
3	cat.name.A_T4_NO	T4: Modified Production SSV	T4 SSV de série modifié
4	cat.name.A_T4_SSV2	SSV2	SSV2

To make things even more reusable, we can save our function to a file so that we can reload it in to other notebooks.

# The inspect package allows us to inspect Python objects
import inspect

# For example, we can get the source code of our function
source_code = inspect.getsource(mergeInLangLabels)

# We also need to add in any package imports...
imports = """
import pandas as pd
"""

# And now we can write the source code to a file
with open("dakar_utils_2025.py", "w") as file:
    file.write(imports + "\n" + source_code)

# We can then use that file as simple package
# and import our function from it:
# from dakar_utils_2025 import mergeInLangLabels

allClazz Data

allClazz Data#

`allClazz` Data#