Learning to Code for Coaches: Part 5 -

Using Machine Learning to individualize the training load by predicting HRV

Alan Couzens, M.Sc. (Sports Science)

In my last couple of posts on learning to code for coaches (full series with code here), we went through how we can implement Python to obtain data from various sources. In parts 1 and 2 we looked at how you can pull data workout and HRV data straight from your Garmin's fit files. In part 3 we looked at how you can pull data from the web, using Strava's API as an example. In part 4 we looked at how we can pull data from csv exports, using the Pandas library to inspect and clean data from your Training Peaks account. While very important, I have to admit that this process of finding and handling data isn't as sexy as actually using that data to build Machine Learning models! That's what we're going to do in this post. But, before we dive in, some quick definitions. What do I mean when I talk about a "model"? & what do I mean by "Machine Learning".

These 2 concepts of modeling and machine learning are intertwined...

A model is a simplified, abstract representation of the relationship between real world entities.

The key word here is relationship, i.e. a model helps us understand how one thing relates to another. It is through this understanding, that it also enables us to predict "What will happen if..." e.g. if we change one entity but leave the others the same, how will it effect another entity.

Weather is a good example. If I know the time of year, the temperature, the barometer, & the humidity, I have the makings of a model that can predict the likelihood of rain. If it is a good model, I'll be able to manipulate one of those factors to see how it changes the prediction of rain, e.g. if the humidity drops by 10%, how does it affect the chance of rain? Through this modeling we get a deep understanding of the forces involved in a given phenomena.

So, how do we build these models?

Well, one way would be to come in with a mathematical construct of each part. In our world, a good example of this would be modeling how many watts it will take to climb a 4% grade at 25km/h. This is a basic physics equation. If we know the gradient, the weight of the rider and the bike, the wind, the aerodynamic drag, the rolling resistance, we can calculate a physics based model to answer our question. But there is another way of doing it...

We could also get a bunch of different cyclists of different weights and bikes and positions and have them ride up this hill at 25km/h and take note of their average power. Then we could plug the power it took plus all of the other variables that we know - rider weight, bike weight, wind, bike position etc into a computer and we could ask the computer to calculate these relationships for us. Presumably, it would come up with a very similar model to our physics model and thus, in a simplistic sense, we could say that the machine has 'learned' the physics involved in predicting power from these other variables.

This is the essence of machine learning. Rather than feeding the rules to the computer, we feed it data and ask it to figure out the rules.

Of course, the big advantage to this is that we don't need to have to know all of the rules! Even in the simple example above, knowing all of the factors at play to build a physics equation that accurately predicts power output takes significant domain knowledge. And, at the risk of offending my physicist friends, physics is relatively simple when compared to physiology. In physics, the environment that we're dealing with is assumed to be relatively stable. A more realistic example in physiological terms would be to run the same experiment but with each athlete riding their bike uphill on a different planet, where there are is a different gravity being applied to each athlete! In this case, developing the domain knowledge where you can be confident in your predictions is challenging, to say the least, with the traditional approach.

The difficulty in figuring out "the rules" when it comes to exercise physiology should be self-evident in the range of "rules" of training physiology that different coaches have! One rule that is of particular practical interest to the coach is "how aggressively should we ramp the training load?" There are all kinds of simple heuristics that coaches fall back on when it comes to this question - the old "Increase mileage no more than 10% per week" or, the modern version - "'Only' ramp CTL at 10 TSS/wk". Unfortunately, much of coaching is still pervaded by these blanket rules that coaches bring to their programs. Rules that, while they might work for fifty or sixty percent of the athletes that come to the program, leave a large percentage either undertrained or, more commonly, overtrained or injured.

As coaches, rather than come in with some pre-determined "way" that we try to fit our athletes to, a more globally effective strategy is to try to figure out the "way" that works for each athlete & Machine Learning is a great tool for this! If we have a measure of overtraining & we have training load & wellness data for a given athlete, we can feed these variables into a computer & have the computer figure out "the rules" for that individual athlete.

Predicting overtraining in athletes

One of the best indicators that our ramp is too aggressive for a given athlete is when we see a marked drop in heart rate variability. Numerous studies have shown that overtrained athletes exhibit a marked decrease in parasympathetic activity - indicated by significantly lower than normal heart rate variability numbers (e.g. https://onlinelibrary.wiley.com/doi/abs/10.1046/j.1475-0961.2003.00523.x, https://journals.lww.com/cjsportsmed/Abstract/2006/09000/Heart_Rate_Variability,_Blood_Pressure.7.aspx, https://link.springer.com/article/10.1007/s40279-013-0071-8 )

For example, the above diagram from the study in the third link shows the decline in HRV corresponding with declining performance during a bout of overtraining in an elite rower (B)

While it is helpful to have this visibility into the athlete's physiological state to identify when we've exceeded productive load, wouldn't it be great to be able to look ahead into the future to predict what the HRV will be, given the athlete's current state and a given training load, so that we can prevent HRV from dropping BEFORE it actually does?

This "Minority Report" task will be our job here :-), we'll apply some simple Machine Learning to figure out the correct loading "rule" when it comes to the optimal training prescription for one athlete, based on long term historical data for that individual athlete, to adjust our training prescription to minimize the risk of seeing low HRV numbers.

At it's core, this is a math problem....

If we know the athlete's current physiological state - HRV, Sleep metrics, Subjective wellness indicators etc, & we know the athlete's individual load:response relationship, how much load should we give them today to ensure that I don't drive this athlete into overtraining?

Or, in pseudo-math terms...

Today's HRV + Sleep Quality + Mood + x Training Load = acceptable HRV tomorrow (and the day after, and the day after that...)

Now, solve for x.

Importantly, while I'm using HRV as an example of something that you might want to predict, you can build a predictive model for anything you'd like to see more of - watts for a given course, energy, mood, sleep quality etc or, similarly, anything that you want to avoid - fatigue, low energy, injury, illness etc.

Whatever model you are building, the steps are essentially the same..

Step 1: Get the data

To begin with, we need a good amount of individual athlete data. Fortunately, we have plenty of it from the last post (https://www.alancouzens.com/blog/data_wrangling_with_pandas.html). We'll use the same dataframe that we built there to build our model in this post. Let's start by firing that up.

In [ ]:
import pandas as pd

data = pd.read_csv("new_improved_metrics.csv")

print(data)
     Unnamed: 0     TSS   HRV  ...        ctl        atl        tsb
0      1/1/2020   75.00   8.7  ...  50.588208  35.990495  14.597713
1      1/2/2020   42.20  10.1  ...  50.390847  36.817117  13.573731
2      1/3/2020   73.98  10.0  ...  50.945860  41.764318   9.181542
3      1/4/2020  197.00  10.5  ...  54.382268  62.429618  -8.047350
4      1/5/2020   74.73   9.6  ...  54.861016  64.067071  -9.206055
..          ...     ...   ...  ...        ...        ...        ...
361  12/28/2020    0.00   9.7  ...  94.753707  55.313371  39.440336
362  12/29/2020   71.20   9.4  ...  94.199528  57.428233  36.771296
363  12/30/2020   94.44  10.0  ...  94.205186  62.355317  31.849869
364  12/31/2020  123.76   NaN  ...  94.900561  70.529637  24.370924
365   8/31/2020     NaN   9.2  ...        NaN        NaN        NaN

[366 rows x 12 columns]

In the above code, we imported that pandas library that we played with in the last blog and then used its read_csv function to read the csv that we generated from our Training Peaks data in the last blog.

This time around, we're going to be using today's metrics to predict how "beat up" we're going to be tomorrow. So, the first thing we need to do is add a separate column to our dataframe for tomorrow's HRV. Let's do that by using the pandas shift function that we introduced in the last post...

In [ ]:
data['tomorrows_hrv'] = data['HRV'].shift(-1)
print(data)
     Unnamed: 0     TSS   HRV  ...        atl        tsb  tomorrows_hrv
0      1/1/2020   75.00   8.7  ...  35.990495  14.597713           10.1
1      1/2/2020   42.20  10.1  ...  36.817117  13.573731           10.0
2      1/3/2020   73.98  10.0  ...  41.764318   9.181542           10.5
3      1/4/2020  197.00  10.5  ...  62.429618  -8.047350            9.6
4      1/5/2020   74.73   9.6  ...  64.067071  -9.206055            9.5
..          ...     ...   ...  ...        ...        ...            ...
361  12/28/2020    0.00   9.7  ...  55.313371  39.440336            9.4
362  12/29/2020   71.20   9.4  ...  57.428233  36.771296           10.0
363  12/30/2020   94.44  10.0  ...  62.355317  31.849869            NaN
364  12/31/2020  123.76   NaN  ...  70.529637  24.370924            9.2
365   8/31/2020     NaN   9.2  ...        NaN        NaN            NaN

[366 rows x 13 columns]

Perfect. Now we have a new column that gives tomorrow's hrv number in each row. This will be the variable that we train our data on and teach our model how to predict.

Step 2: Clean the Data

But first, you've probably noticed those weird 'NaN' values above. These 'Not a Number' values represent missing data in our dataframe. Most machine learning algorithms will require "clean" data where all rows have valid values. We have a few options in dealing with these. We could fill these values with the mean or median of the column to create a valid value or we can drop any row that has null values entirely. Let's see how much data we'd be losing to decide which way to go...

In [ ]:
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 366 entries, 0 to 365
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Unnamed: 0      366 non-null    object 
 1   TSS             365 non-null    float64
 2   HRV             365 non-null    float64
 3   Pulse           365 non-null    float64
 4   Stress          365 non-null    float64
 5   Sleep Qualilty  362 non-null    float64
 6   Sleep Hours     363 non-null    float64
 7   Mood            362 non-null    float64
 8   yesterday_TSS   365 non-null    float64
 9   ctl             365 non-null    float64
 10  atl             365 non-null    float64
 11  tsb             365 non-null    float64
 12  tomorrows_hrv   364 non-null    float64
dtypes: float64(12), object(1)
memory usage: 37.3+ KB

So, using the pandas info function that we learned about in the last blog, we can see that we have 366 total rows and the features are only missing 4 values at most. In other words, this athlete does a better job than most in being consistent with his data collection! :) So, in this case, the easiest route and the route guaranteed not to pollute our data is to simply drop the rows with missing values. Let's do that...

In [ ]:
data = data.dropna()
data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 356 entries, 1 to 362
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Unnamed: 0      356 non-null    object 
 1   TSS             356 non-null    float64
 2   HRV             356 non-null    float64
 3   Pulse           356 non-null    float64
 4   Stress          356 non-null    float64
 5   Sleep Qualilty  356 non-null    float64
 6   Sleep Hours     356 non-null    float64
 7   Mood            356 non-null    float64
 8   yesterday_TSS   356 non-null    float64
 9   ctl             356 non-null    float64
 10  atl             356 non-null    float64
 11  tsb             356 non-null    float64
 12  tomorrows_hrv   356 non-null    float64
dtypes: float64(12), object(1)
memory usage: 38.9+ KB

To drop the rows with NaN values from our dataframe, we just use the dropna function.

After doing that, if we run info() again, you can see that we now we have 356 nice, clean, complete rows of data that we can build our model around.

Step 3: Set aside a "test set" to test the error in your model

The next step is an important one. When we're done building our model, we're going to want to test how well it can predict HRV values that it hasn't yet seen. The easiest way to do this is to set aside a portion of our data as a 'test set' that we can use later. So, let's do that by introducing the main python machine learning library that we are going to use in this post - scikit-learn.

In addition to having a ton of different modeling algorithms that we can play with, scikit-learn has a handy function that will automatically split your data into a training set (that we use to train the model) and a test set (that we use to test the model)...

In [ ]:
from sklearn.model_selection import train_test_split

data_train, data_test = train_test_split(data, test_size=0.2, random_state=42)
print(f"Training Data: {data_train}")
print(f"Test Data: {data_test}")
Training Data:      Unnamed: 0     TSS   HRV  ...         atl        tsb  tomorrows_hrv
352  12/19/2020  150.00  10.4  ...   82.151887  23.525709            9.8
339   12/6/2020   50.00   9.5  ...   88.195062  27.284293           10.0
185    7/4/2020  269.00   9.6  ...  183.917741   2.185778           10.2
76    3/17/2020  131.16  10.4  ...  166.688421 -19.238621            9.9
343  12/10/2020   97.07  10.0  ...   87.725173  24.988325           10.4
..          ...     ...   ...  ...         ...        ...            ...
72    3/13/2020  159.14   9.5  ...  171.173584 -25.375597           10.1
108   4/18/2020  314.24   9.7  ...  191.527766 -27.379070            9.6
273   10/1/2020   82.93   9.8  ...  110.837899  31.516224           10.3
355  12/22/2020   75.03   9.4  ...   72.915314  29.241835            9.6
104   4/14/2020  138.30   9.6  ...  172.400473 -13.050223           10.1

[284 rows x 13 columns]
Test Data:     Unnamed: 0     TSS   HRV  ...         atl        tsb  tomorrows_hrv
229  8/17/2020   31.00   8.9  ...  133.463826  38.905257            9.7
43   2/13/2020  222.21   9.7  ...  170.379395 -48.329266            9.6
259  9/17/2020  110.00   9.2  ...  128.591720  26.791642            9.8
184   7/3/2020   90.46  10.2  ...  170.852084  13.254025            9.6
57   2/27/2020  104.42   9.2  ...  144.496191 -12.399438            9.7
..         ...     ...   ...  ...         ...        ...            ...
198  7/17/2020  114.87   9.5  ...  178.727358   4.881678            9.7
246   9/4/2020  114.62   9.5  ...  138.508856  24.819093            9.5
95    4/5/2020  157.81   9.7  ...  175.275027 -18.563169            9.6
258  9/16/2020   73.60   9.8  ...  131.446758  25.030127            9.2
197  7/16/2020  218.30  10.2  ...  188.533613  -3.268294            9.5

[72 rows x 13 columns]

In the above code, we applied scikit-learn's train_test_split function to split our dataset in two, setting 20% of our data aside for later testing. We also applied a 'seed' for the random state so that we can replicate that exact same split if needed in the future. This seed can be any number you like but, as a matter of tradition, 42 is selected because 42 is the answer to "the ultimate question of life, the universe and everything" in Hitchhiker's Guide to the Galaxy and, let's face it, most coders are sci-fi geeks :-)

Step 4: Select your model's features

In the last post, we saw how important it is to look at the data prior to modeling. One of the things that it really helps with is determining what features we want to include in our model. Simply looking at how well correlated all of the features are to our target can be really helpful...

In [ ]:
corr_matrix = data.corr()
corr_matrix['tomorrows_hrv'].sort_values(ascending=False)
Out[ ]:
tomorrows_hrv     1.000000
Sleep Qualilty    0.113452
tsb               0.035643
Mood              0.035068
Pulse             0.011324
HRV              -0.002168
yesterday_TSS    -0.026527
Sleep Hours      -0.042640
TSS              -0.043356
Stress           -0.058491
ctl              -0.074770
atl              -0.088967
Name: tomorrows_hrv, dtype: float64

For this athlete, the features most correlated with tomorrow's HRV appear to be how well they slept, the acute training load that they are currently under (negatively correlated - higher load = lower next day HRV) and their self reported life stress. While the acute load (recent average load) seems to be a better predictor than the previous days TSS, since we want our model to be something that we can easily use day-to-day to determine load, let's start with that.

So, let's start by separating our dataset into those features that we think might be good predictors (X) and the variable that we're looking to predict, in this case - tomorrow's HRV (y)

An important note: Don't be too put off by the relatively low R^2 values. When dealing with machine learning, we are often using much larger data sets than is typical in inferential statistics. This leads to lower R^2 values when we look at things on a feature by feature basis. What we look to do in machine learning is to make up for these low R^2 values by a) combining features into one model & b) by not restricting our model to the linear demand of parametric statistics or even, in the case of many ML algorithms, to any known form.

In [ ]:
X_train = data_train[['Sleep Quality', 'TSS', 'Stress']]
y_train = data_train['tomorrows_hrv']
X_test = data_test[['Sleep Quality', 'TSS', 'Stress']]
y_test = data_test['tomorrows_hrv']

print(X_train)
print(y_train)
     Sleep Quality     TSS  Stress
352            5.0  150.00     3.0
339            5.0   50.00     3.0
185            5.0  269.00     3.0
76             5.0  131.16     3.0
343            5.0   97.07     3.0
..             ...     ...     ...
72             5.0  159.14     3.0
108            5.0  314.24     3.0
273            4.0   82.93     3.0
355            5.0   75.03     4.0
104            5.0  138.30     3.0

[284 rows x 3 columns]
352     9.8
339    10.0
185    10.2
76      9.9
343    10.4
       ... 
72     10.1
108     9.6
273    10.3
355     9.6
104    10.1
Name: tomorrows_hrv, Length: 284, dtype: float64

Oops. Houston we have a problem. 'Sleep Quality' is not being found in our list. Weird. If we look a little closer you can see that there is a typo in the key for 'Sleep Quality'. It is misspelled as 'Sleep Qualilty' in our dataframe. Let's fix that by renaming the column...

In [ ]:
print(data_train.columns)
data_train.rename(columns={'Sleep Qualilty':'Sleep Quality'}, inplace=True)
data_test.rename(columns={'Sleep Qualilty':'Sleep Quality'}, inplace=True)
print(data_train.columns)
Index(['Unnamed: 0', 'TSS', 'HRV', 'Pulse', 'Stress', 'Sleep Quality',
       'Sleep Hours', 'Mood', 'yesterday_TSS', 'ctl', 'atl', 'tsb',
       'tomorrows_hrv'],
      dtype='object')
Index(['Unnamed: 0', 'TSS', 'HRV', 'Pulse', 'Stress', 'Sleep Quality',
       'Sleep Hours', 'Mood', 'yesterday_TSS', 'ctl', 'atl', 'tsb',
       'tomorrows_hrv'],
      dtype='object')
/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py:4308: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,

As you can see, it's super easy in pandas to rename a column in your dataset by using pandas inbuilt 'rename' function. OK, now that we've done that, let's try splitting our data into our feature variables and the variable we're trying to predict again...

In [ ]:
X_train = data_train[['Sleep Quality','TSS', 'Stress']]
y_train = data_train['tomorrows_hrv']
X_test = data_test[['Sleep Quality','TSS', 'Stress']]
y_test = data_test['tomorrows_hrv']
print(X_train)
print(y_train)
     Sleep Quality     TSS  Stress
352            5.0  150.00     3.0
339            5.0   50.00     3.0
185            5.0  269.00     3.0
76             5.0  131.16     3.0
343            5.0   97.07     3.0
..             ...     ...     ...
72             5.0  159.14     3.0
108            5.0  314.24     3.0
273            4.0   82.93     3.0
355            5.0   75.03     4.0
104            5.0  138.30     3.0

[284 rows x 3 columns]
352     9.8
339    10.0
185    10.2
76      9.9
343    10.4
       ... 
72     10.1
108     9.6
273    10.3
355     9.6
104    10.1
Name: tomorrows_hrv, Length: 284, dtype: float64

Excellent! We now have our dataframe split into training data (80%) & testing data (20%). And we have each of these sets further split into the predictive features that we want to test (X) and the variable we want to predict(y). We can obviously come back to this step at a later point and try other features to see if they improve our model.

Step 5: Select the right model for the job

Now we're ready for the fun part - building a model. Once you've done all of the preceding steps to get some nice clean data to work with, you'll be surprised by just how easy scikit-learn makes this part of the process!

Scikit-learn offers a number of different machine learning algorithms to choose from depending on the type of machine learning we are doing...

Classification algorithms - predict what class or category a given input fits into.

Regression algorithms - predict a continuous variable, i.e a number

Clustering algorithms - take a bunch of inputs and group them together according to similarity of features

So which one are we using here?

We're predicting HRV - a continuous numeric variable, so we're dealing with a regression problem. Now, even within the subset of regression algorithms, scikt-learn has a number to choose from. It's always good policy to try the simplest model first to at least get a good baseline. For regression, one of the simplest models is linear regression. This is super easy to implement with scikit-learn...

In [ ]:
from sklearn.linear_model import LinearRegression

linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
Out[ ]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Step 6: Take your model for a test-drive

That's it! We now have a working model that will predict what an athlete's HRV will be the next day if we give it Sleep Quality, atl, Stress & Mood.

Let's take it for a spin...

In [ ]:
prediction = linear_model.predict([[5, 100, 1]])
print(f"Predicted HRV: {prediction}")
Predicted HRV: [10.00027664]

So, if we have the really positive scores of - a sleep quality of 5/5, a training load of 100 TSS and a reported stress level of 1/5, our predicted HRV for the next day is 10.0. Given the really good sleep quality and low stress and pretty mild training load, we'd predict that a HRV scor eof 10.0 is probably pretty good for this athlete. Let's see how it ranks compared to their long term data...

In [ ]:
data['HRV'].describe()
Out[ ]:
count    356.000000
mean       9.783427
std        0.402430
min        7.500000
25%        9.500000
50%        9.800000
75%       10.100000
max       10.800000
Name: HRV, dtype: float64

By running the describe() function that we introduced in the last blog, we get some nice summary statistics of a given feature. Comparing our athlete's 10.0 HRV with their mean value of 9.78, we can see it is very good! In fact, it's almost in the top 75% of all values! Looking at this, our athlete could probably handle some more load! Let's put 300TSS into our model and see what it does to the system..

In [ ]:
prediction = linear_model.predict([[5, 300, 1]])
print(f"Predicted HRV: {prediction}")
Predicted HRV: [9.91566808]

A slight hit but, with those positive morning metrics - great sleep quality and a very low stress day, I predict he'll handle it well. Now, what if those morning metrics aren't so good?

In [ ]:
prediction = linear_model.predict([[1, 300, 4]])
print(f"Predicted HRV: {prediction}")
Predicted HRV: [9.49144252]

Now, with poor sleep quality of 1/5 and a high stress level of 4/5, our heavy training load of 300TSS is predicted to push this athlete's HRV below the bottom 25% of all values. That's not great. Maybe, under these conditions of super high stress, we should give him a rest day of zero TSS...

In [ ]:
prediction = linear_model.predict([[1, 0, 4]])
print(f"Predicted HRV: {prediction}")
Predicted HRV: [9.61835535]

Ah, much better! Still on the low side of normal but getting closer to the athlete's normal HRV. Let's roll with that rest day today.

Step 7: Assess your model's error & ways to improve it

Now, before we place too much trust in our model, it pays to sit it down at the proverbial school desk and test it. You'll remember that we set aside part of our data, that our model hasn't seen yet, as a test set for this very purpose. So, let's see how it does in predicting data that it hasn't seen....

In [ ]:
import numpy as np
from sklearn.metrics import mean_squared_error
predictions = linear_model.predict(X_test)
linear_model_mse = mean_squared_error(y_test, predictions)
linear_model_rmse = np.sqrt(linear_model_mse)
print(f"Error: {linear_model_rmse}")
Error: 0.40920419592650187

To get the error of our model, scikit-learn comes to the rescue again with their mean_squared_error function. The root mean square error is a common metric used to assess the error of regression models. It provides an intuitive way to look at the error as it tells us an absolute error range (on the same scale as the metrics that we're predicting) that a majority of samples will fall between.

A root mean square error of 0.40, while not horrible, is clearly not great either. This means that if our model predicts 9.6, the 'true' value will likely fall between 9.2 and 10.0. If we look at the athlete's data, that's a pretty big range. From below the worst 25% of values to about the top 60%. Or, from a really bad HRV number to a decent one. But fear not, we're just getting started, there are a number of ways that we can improve the predictive power of our model:

  1. We can add more features from our dataset. To keep things simple, we only started with 3 features - sleep quality, TSS and stress. In the "real world" these models can have hundreds of features. If you add more features above (CTL, ATL, mood etc), you will see the error come down. If you're playing around with the Colab notebook, you can change the code above and play around with adding more features to see the impact that it has on the model error.

  2. We can add better features to our dataset. You'll remember from the correlation analysis that longer term training load metrics (ATL, CTL) better correlated with HRV than pure daily load. Additionally, different types of training are going to have a more pronounced effect on HRV than any measure of pure load. We could amend our data set to include time in zone and different sports to give more fidelity than a single load number can offer. I talk more about some of the limits of using TSS based models here (https://alancouzens.com/blog/Banister_v_Neural_Network.html)

  3. We could apply a more complex, non-linear, algorithm like a decision tree, a random forest or even a neural network to our problem. The non-linear bit will significantly improve our model. As I explain in the link above, physiological processes tend to be non-linear, i.e. the HRV doesn't tend to decrease linearly with load but, instead, hits a breakpoint at a certain point. This non-linearity can't be reflected in the linear model that we're using above.

I'll look at some of these more complex algorithms in my next post.

Until then...

Train smart,

AC