About a year ago, I got into cycling. Naturally, since I’m a bit of a data nerd, I started tracking my rides, so I could keep track of my progress. For the unaware, cycling can be a pretty expensive hobby, so in starting out, I chose to use my phone for GPS tracking using Strava and RideWithGPS. I honestly had no complaints, until I wanted to add in my heart rate….

In trying to keep with what I already own, I dusted off my cracked Fitbit Versa, attempted to seal the crack, strapped it on, and tried it out. While it will record cycling activities, the resulting data only recorded tracking points every 5 seconds. This is nowhere near the 1 second accuracy I’m able to get using the phone apps. While many would stop here and just chose to live with lower accuracy or lack of heart rate, I chose to press on and make it work.

Data Flow

In order for this to work, I knew I needed both GPS location from an app on my phone plus heart rate from my Fitbit. Because app allows deleting or modifying an existing ride, I needed to choose one app to record, and another to visualize. This choice was rather easy, because I feel that RideWithGPS is better for riding and Strava is better for Social.

My plan was to pull data from RideWithGPS and Fitbit API’s, combine it, then format it so it could be read by Strava (TCX).

I chose to write this as a Python script. Due to some previous work I’d done to analyze my stats, I chose to write it as a management script for an existing Django project, but I will try to focus on the more unique aspects of this task.

Location Data from RideWithGPS

For me, GPS logging is the most critical piece of information. I really enjoy tracking and conquering my personal bests on Strava Segments, so this is my primary data source. I did not want to alter it any way, just merge in the heart rate information where possible.

Before I even got started, I recorded a test ride with RideWithGPS, and found that it only recorded every other second, but luckily, that’s easily configured in the settings.

Their API is simple, but easy to navigate. It’s possible to get a list of a user’s ride, then pull individual ride details. Nested a couple layers deep in the ride details, we’re given elevation (meters), time (unix timestamp), x (longitude), and y (latitude).

{
  "e": 236.4815364,
  "t": 1598482303,
  "x": -94.4359419,
  "y": 39.2367653
},
{
  "e": 236.4665218,
  "t": 1598482304,
  "x": -94.435941,
  "y": 39.2367631
},

In my script, I used requests to pull the data, resulting in a dictionary-like object. That’s fine, but Pandas is better….

import pandas as pd

activity_df = pd.DataFrame(
    ride_detail.get('trip').get('track_points')
)

What we get is something like this:

                 e             x            y             t
   0    236.481536    -94.435942    39.236765    1598482303
   1    236.466522    -94.435941    39.236763    1598482403
   2    236.344946    -94.435940    39.236762    1598482503
   3    236.278519    -94.435938    39.236762    1598482603
   4    236.260699    -94.435933    39.236764    1598482703
...            ...           ...          ...           ...
4798    234.847334    -94.435919    39.236706    1598531700
4799    234.841117    -94.435915    39.236708    1598531701
4800    234.942241    -94.435913    39.236708    1598531702
4801    234.961215    -94.435914    39.236708    1598531703
4802    234.858422    -94.435909    39.236709    1598531704

So, let’s turn those integer timestamps into real timestamps with time zones. While we’re at it, let’s rename some columns to match the TCX spec:

import datetime
import pytz
from django.utils.timezone import make_aware

# get a utc timezone object
utcz = pytz.timezone('utc')

def get_dt(time_stamp:int):
    """ convert timestamp to timezone aware datetime """
    return make_aware(
        datetime.datetime.utcfromtimestamp(time_stamp),
        timezone=utcz,
)

# create a new Column 'Time', and populate
# it with timezone-aware timezones
activity_df['Time'] = activity_df['t'].apply(self.ts_to_dt)

# rename some columns to match TCX Spec, and drop 't'
# which is no longer needed.
activity_df.rename(
    columns={
        "y": "LatitudeDegrees",
        "x": "LongitudeDegrees",
        "e": "AltitudeMeters",
    },
    inplace=True,
)
activity_df.drop(columns=['t'], inplace=True)

This gives us something a little more human readable:

      AltitudeMeters  LongitudeDegrees  LatitudeDegrees                      Time
0         236.481536        -94.435942        39.236765 2020-08-26 22:51:43+00:00
1         236.466522        -94.435941        39.236763 2020-08-26 22:51:44+00:00
2         236.344946        -94.435940        39.236762 2020-08-26 22:51:45+00:00
3         236.278519        -94.435938        39.236762 2020-08-26 22:51:46+00:00
4         236.260699        -94.435933        39.236764 2020-08-26 22:51:47+00:00
...              ...               ...              ...                       ...
4798      234.847334        -94.435919        39.236706 2020-08-27 00:35:00+00:00
4799      234.841117        -94.435915        39.236708 2020-08-27 00:35:01+00:00
4800      234.942241        -94.435913        39.236708 2020-08-27 00:35:02+00:00
4801      234.961215        -94.435914        39.236708 2020-08-27 00:35:03+00:00
4802      234.858422        -94.435909        39.236709 2020-08-27 00:35:04+00:00

Retrieving Heart Rate Data

In order to pull minute-by-minute data from the Fitbit API, I had to create a “Personal” project, which would only allow me access to my own fitness data. This is done through the Heart Rate Intraday Time Series endpoints.

The data was returned with time only, missing date and timezone information. It actually looks something like this, yikes!

    "activities-heart-intraday": {
        "dataset": [
            {
                "time": "00:00:00",
                "value": 64
            },
            {
                "time": "00:00:10",
                "value": 63
            }
        ],
        "datasetInterval": 1,
        "datasetType": "second"
    }

Luckily, the timezone can be retrieved through the user’s profile endpoint, which returns the timezone name. In my case, ‘America/Chicago’.

Before going any further, let’s add a column to our first DataFrame displaying our time in the same zone as Fitbit:

# fitbit_tz is a dictionary object representing the response
# from the user's profile api endpoint.
# returns: America/Chicago
fitbit_tz = pytz.timezone(
    fitbit_profile.get('user').get('timezone')
)

# We're going to add a new column, 'FitbitTZ' in addition
# to the other ones:
activity_df['FitbitTZ'] = activity_df['Time'].apply(
    lambda x: x.tz_convert(fitbit_tz)
)

Here, you can see the new column. It’s the same timestamp, just displayed in a different timezone. To keep everything on one line, I’ve removed the coordinate and altitude columns to showcase the time zones:

                            Time                  FitbitTZ
0      2020-08-26 22:51:43+00:00 2020-08-26 17:51:43-05:00
1      2020-08-26 22:51:44+00:00 2020-08-26 17:51:44-05:00
2      2020-08-26 22:51:45+00:00 2020-08-26 17:51:45-05:00
3      2020-08-26 22:51:46+00:00 2020-08-26 17:51:46-05:00
4      2020-08-26 22:51:47+00:00 2020-08-26 17:51:47-05:00
...                          ...                       ...
4798   2020-08-27 00:35:00+00:00 2020-08-26 19:35:00-05:00
4799   2020-08-27 00:35:01+00:00 2020-08-26 19:35:01-05:00
4800   2020-08-27 00:35:02+00:00 2020-08-26 19:35:02-05:00
4801   2020-08-27 00:35:03+00:00 2020-08-26 19:35:03-05:00
4802   2020-08-27 00:35:04+00:00 2020-08-26 19:35:04-05:00

In my testing, I found that the Fitbit Heart Rate Intraday Series wouldn’t return more than 24 hours of data, and to avoid complex logic, I just did full day data pulls for each day found in the activity.

Pandas makes it easy to get a unique list of the dates:

fitbit_dates = activity_df['FitbitTZ'].apply(
    lambda x: x.date()
).unique().tolist()

Then, for each date, we retrieve a dictionary object, and turn those date strings into a dictionary object reflecting a timezone-naive datetime paired with the heart rate value.

Where hr_data is a dictionary object from the API and day is a datetime.date object:

hr_series = list(map(
    lambda x: {
        "time": datetime.datetime.combine(
            day,
            datetime.datetime.strptime(x.get('time'), '%H:%M:%S').time(),
        ),
        "value": x.get('value'),
    },
    hr_data.get('activities-heart-intraday').get('dataset'),
))

That data is then converted into yet another Pandas DataFrame, where we use django.utils.timezone.make_aware again to add the correct timezone onto the data. I’ll name it hr_df, for short:

        hr                      hr_time
0       61    2020-08-26 00:00:04-05:00
1       60    2020-08-26 00:00:14-05:00
2       59    2020-08-26 00:00:24-05:00
3       58    2020-08-26 00:00:29-05:00
4       57    2020-08-26 00:00:34-05:00
...     ..                          ...
5894    90    2020-08-26 20:09:26-05:00
5895    91    2020-08-26 20:09:41-05:00
5896    91    2020-08-26 20:09:56-05:00
5897    89    2020-08-26 20:10:01-05:00
5898    88    2020-08-26 20:10:06-05:00

Merging the Data

Now, we have two different data sets, both stored in Pandas DataFrames. Each contains timeseries data, but the times don’t match. Thankfully, this is easily solved with Pandas’s merge_asof method. This is like a database left join, except it will match on the nearest key rather than equal keys. Perfect!

As previously mentioned, our GPS information is the critical piece, so we’re going to use that as our left table, and merge in the heart rate data as a best fit:

activity_df = pd.merge_asof(
    activity_df.sort_values("FitbitTZ"),
    hr_df.sort_values("hr_time"),
    left_on="FitbitTZ",
    right_on="hr_time",
).drop(columns=["FitbitTZ", "hr_time"]) \
    .sort_values('Time').fillna(method='ffill')

This gives us a table with all the information we need to create the entire activity. Shown with abbreviated column names and date for spacing:

              Alt          Lng         Lat             Time   hr
0      236.481536   -94.435942   39.236765   22:51:43+00:00   98
1      236.466522   -94.435941   39.236763   22:51:44+00:00   98
2      236.344946   -94.435940   39.236762   22:51:45+00:00   98
3      236.278519   -94.435938   39.236762   22:51:46+00:00   98
4      236.260699   -94.435933   39.236764   22:51:47+00:00   98
...           ...          ...         ...              ...  ...
4798   234.847334   -94.435919   39.236706   00:35:00+00:00  141
4799   234.841117   -94.435915   39.236708   00:35:01+00:00  141
4800   234.942241   -94.435913   39.236708   00:35:02+00:00  141
4801   234.961215   -94.435914   39.236708   00:35:03+00:00  141
4802   234.858422   -94.435909   39.236709   00:35:04+00:00  141

Formatting For Upload

This is already a long enough article, so I’m going to keep this brief. Of the file formats available, I chose Training Center XML (TCX), since it’s a fairly simple XML schema. Once I had the data, it was a matter of tracking down information on the TCX file format and copying it.

I accomplished this using ElementTree, wrote it to a BytesIO object, then uploaded to Strava using requests.

Success!

A ride with heart rate data!

Takeaways

In general, most fitness apps function like walled gardens. Depending on how it’s tracked, it may be very difficult to get a complete picture of everything. As a data nerd, this is incredibly frustrating.

This was a fun problem to solve, as I got to punch a hole through some of these walls. I’ve been using this script for about a week without any issues. Unfortunately, I think it’s a bit of a temporary situation. There are a lot of moving pieces, and many different things could affect the data availability.

As far as Fitbit, I honestly can’t recommend them for anything other than a casual step tracker. In 2018, I had the opportunity to pick up a Fitbit Versa through a former employer’s health plan, but found it to be lackluster in the smart watch capabilities and fragile. Disappointingly, the screen cracked in a minor tumble about a week after I got it. I really get nothing out of their health dashboard either, so I use it like a dumb watch that also records my heart rate.

I actually plan on buying into a device ecosystem that already has everything I need all in one place. So, essentially, I’ll be throwing money at the problem, but I think it’ll give me a much better experience. In the long run, it’ll be money well spent for my new favorite hobby.

Breaking Down Fitness Data Barriers