Just how useful is Heart Rate Variability?

Alan Couzens, M.Sc. (Sports Science)

After some interesting Twitter discussion this weekend around the topic of Heart Rate Variability, I figured it might be good to put a short post together around the topic above - i.e. just how useful and how accurate is heart rate variability and the inevitable follow up qu - is it really worth the 'hassle' of donning a strap when I wake up every morning to take the measure? TLDR version - Yes! Non-TLDR version - read on...

As you can see from the Twitter threads...

...it turns out that HRV is a surprisingly polarizing topic! With feedback from folks with experience taking HRV measures ranging from "This didn't tell me anything I didn't already know" to "HRV didn't line up with how I was feeling at all" to "HRV provided an additional objective measure that helped me learn more about myself as an athlete." So, which is right?

If you've read my blog for any time, you will know by now that I am a fan of using heart rate variability measures to guide the training. So, I figured in this post I would dive a little more into *HOW* I use those measures and what I have observed after ~15 years(!) of HRV analysis.

1. Is HRV sufficently 'stable' to provide a valid measure of the current state of our Autonomic Nervous System?

First of all, it is worth pointing out that HRV is a stochastic measure, i.e. it is very responsive to the smallest change & this results in it sometimes appearing 'noisy'. Very similar to a cycling power file, if you look at raw RR data you will see a good amount of beat to beat variation... e.g. my beat to beat file from this morning's test below...

While my resting heart rate stayed at a pretty constant 48bpm, the heart rate variability varied all the way from ~980ms to ~1390ms between beats (a difference of ~40%) It should be noted that my HRV was low this morning so it is often even more 'wiggly'! In some ways it looks a bit like a bike power file, doesn't it? Quite a bit of spikiness. It is in this 'spikiness' that HRV shows its strength over pure resting heart rate data. In, a similar way that a bike power file shows more second to second sensitivity than a corresponding heart rate file, it is this sensitivity that is the real strength of HRV. However, also like a power file, this variability comes with its challenges and makes interpretation more difficult than interpretation of pure resting heart rate numbers.

2. Does HRV show sufficient test-retest reliability to be useful?

If I was to prescribe a time trial for you and took 2 separate one minute random samples from within the time trial power file, chances are that the power numbers between the 2 samples would be significantly different from each other, while the heart rate samples may be quite similar. In this case, we know that the power numbers, while more responsive are also more 'true' of the output at that moment in time. This is very similar to HRV. If you take two consecutive tests, chances are, that while heart rate may be very similar between the two, HRV may be somewhat different. I did that this morning and my own results are below...

RHRrMSSDSDNNLFHF
Test 147569150131908
Test 248667733042791

While resting heart rate was very similar between the two, the HRV numbers were somewhat different for the 2 consecutive tests. The rMSSD numbers were 10ms (~18%) different, despite no apparent difference in the conditions of the test. Incidentally, the LF and HF numbers were up to 50% different for the 2 tests - again speaking to the importance of longer tests if using frequency domain measures. These differences between the 2 tests could very well come down to some acute difference in my state during the first test - was I thinking about what I had to do today? was I a little colder due to the blankets not covering me completely etc? All of these acute sources of stress & variability, while valid, are not the reason that I'm taking the test. What I really want to know is, how does my *chronic* state compare to recent values, i.e. is my system ready to do, and ready to respond to, some good training load? And the related question... is heart rate variability sufficiently discriminative to help me answer those questions OR does the acute 'noise' mask the chronic 'signal' to the extent that it is of no practical use?

3. Is the 'signal:noise ratio' sufficient for HRV to provide predictive and actionable value in session planning?

To answer this, we need to go a step further and investigate to what extent (if any) heart rate variability explains the variance in self reported fatigue and training response over time. If you record how you're feeling during the training sessions (e.g. via TrainingPeaks "Smiley" scale) and you record your heart rate variability you can perform a simple Linear Regression analysis to see just how much of the variance in how you're feeling is explained by your HRV. In Python this is as easy as...

from sklearn import linear_models

from sklearn.metrics import mean_squared_error, r2_score

model = linear_models.LinearRegression()

model.fit(my_HRV_score, my_subjective_feel)

feel_predictions = model.predict(my_HRV_score)

print(r2_score(my_HRV_score, my_subjective_feel))

print(mean_squared_error(feel_predictions, my_subjective_feel)**0.5)

When I run the above on my own data, I get an explanation of variance of 18% for rMSSD and an error of +/-0.6. In practice, this means that if I am predicting how I am going to feel in a session based purely on HRV, most of the time I will be within +/-0.6 on a 5 point scale from...

  • 1. horrible
  • 2. bad
  • 3. OK
  • 4. good
  • 5. excellent
In other words, on some days, I could feel OK (3) and the model would predict that I would feel bad (2.4). On others, I could feel bad (2) and the model would predict that I should be OK (2.6). On rare days (~1 in every 25 days) my HRV model will be a whole 2 points off, i.e. based on my HRV alone, the model will predict that I feel OK (2.8) when I really feel horrible (1) and vice versa!

A long way of saying that I completely understand those people who say that their impression after experimenting with HRV is that it didn't line up with how they were feeling. Our brain does an especially good job of weighing more heavily those days where reality doesn't match our prediction & those relatively rare days where my HRV says that I am going to feel horrible and I actually feel good are going to leave an impression!

In fact, they might leave enough of an impression to have you asking, you "Why do I even want to predict how I'm going to feel? Why don't I just tell you how I'm actually feeling?" And the reason is simple - I want to head off those situations that would have you feeling like crap *before* we get to the situation that you are feeling like crap :-) This is true whether we're looking at a session level - i.e. I want to avoid "failed" sessions as much as possible or the long term level, i.e. I want to avoid situations that lead to an athlete feeling poorly for an extended period before we over-reach and get to that point.

So, depending on your perspective, +/-0.6 i.e. predicting how the athlete is going to feel for a given session about half the time might be 'good enough' to be useful. However, we can do better, a lot better...

By including both subjective and objective measures in our model we can *significantly* improve our predictions on both how a session 'will go' and how an athlete will respond to that session. By including additional measures such as the training load, mood, soreness, motivation, & life stress in our model, we can explain ~85% of the variance in day to day scores and improve the accuracy of the prediction to ~+/- 0.2, i.e. given sufficient data, on ~24/25 days it will correctly predict how the athlete is going to feel! I can honestly say that, with the amount of data that I have now accumulated, it is very rare that the model gets it wrong &, in fact, is more common that the model knows me better than I know myself!

########

So, going back to the original question of "how useful is HRV?", in my experience, providing a sufficient amount of data is accrued, as a stand alone measure it is still useful enough to guide an athlete away from most of those sessions that are likely to be, at best non-productive, & at worst, injurious, most of the time. However, with the simple addition of some other measures, its usefulness increases exponentially to the point that the model can know an athlete well enough that it almost always prescribes the right session - a session that will have the athlete feeling good and, just as importantly, a session that will productively add fitness. The second consideration here, i.e. that the athlete not only feels good doing the session but they also get a good fitness response from the session is an equally important consideration and these two considerations don't always go together, i.e you can feel amped and ready to go but not be in a good position to recover from and respond to a hard session and vice versa. HRV can be especially helpful when these two considerations are at odds. I''ll go into that in some more detail in a future blog.

Until then...

Train smart,

AC