Fall has greeted us. Winter will soon bless us. Before you know it, New Year’s resolutions will be crafted and put to the test. Before that happens though, the hour of reflection will dawn on us.

As we get ready to close down for the year, I thought I’d do what most of us do – reflect on the events of 2019. Since I spent quite a bit of time hiking this year, I thought it would be a fun little exercise to look at all my past hikes. However, Instead of sticking to 2019, I decided to go back as far as I had data: August of 2017.

Getting the Data

Although I started hiking in 2016, I only had data starting in late 2017 – this is when I got my Garmin Fenix 3 HR watch, which I have used to provide hiking stats and GPS data in the past. Ever since then, I have been recording all of my hikes, so the first step here was to gather all of my hiking data into one location. Luckily, Garmin provides simple CSV export functionality online through Garmin Connect. I simply logged into my account to download a flat-file of all my hikes.

Analyzing the Data

Once I had the data in CSV format, I used pandas to read the data so that I could start to get a sense of my stats.

I wanted to get a feel for how the data really looked. There 98 hikes total, which comes to an average of about 3.8 hikes per month. I was interested in things like total distance hiked, total moving time spent on the trail, total vertical ascension, and total calories burned.

In this first block of code, I’m renaming some of my columns from the Garmin default so that the data is easier to understand. I then drop blank values so that they don’t skew the metrics I will later calculate. Finally, I aggregate the data based on my hiking location. From here, we can quickly see that I’ve spent the most time hiking in Virginia.

This chart shows total hike metrics, grouped by location of the hike
This chart shows total hike metrics, grouped by location of the hike

Visualizing Variable Correlations

It’s fairly easy to see this from the chart, but we can easily use matplotlib to visualize the correlations between these variables.

 This scatter plot shows the correlations between hiking distance, time, calories, and elevation gain.
This scatter plot shows the correlations between hiking distance, time, calories, and elevation gain.

Visualizing Dispersion Within the Dataset

Sums and correlations are fine, but what about dispersion within our dataset? We can use matplotlib for this as well.

Boxplot of hiking calories and elevation gain
Boxplot of hiking calories and elevation gain
 Boxplot of hiking distance and time
Boxplot of hiking distance and time

This quickly shows us that I typically broke the following bounds during each of my hikes between 2017-2019.

  • 1,256 calories
  • 1,371 feet of elevation gain
  • 5.75 miles
  • 3.43 hours

There are other ways to show this data, but I really like the boxplot function because it’s so easy to see the mins, maxes, and outliers within this dataset, at a quick glance. This really helps me get a baseline for the quality of the data. For instance, if the data showed that one of my hikes was over 50 miles, then I know that I must have simply forgot to stop recording my hike after getting back into my car and driving off from the trailhead. Remember, multi-day hikes and backpacking trips are broken up into “tracks” of typically one track per day. These tracks are recorded as individual hikes in this dataset.

Visualizing Combinations of Data

The inspiration for this final data visualization method is straightforward. I wanted to look at all of the objective hiking data in one chart so that I could answer the questions that have been bugging me since I started this project:

  • How many hikes have I done?
  • Where have I hiked, and with what frequency?
  • How long were my hikes, and how often were my hikes long or short?
  • How much did I climb, and how often were my hikes steep?

I used seaborn to plot a facetgrid which allows us to map variables within a common group. This accomplished my goal with elegance.

 This scatter plot shows total hiking elevation gain and distance, per hike, across locations
This scatter plot shows total hiking elevation gain and distance, per hike, across locations

This is essentially a hike-by-hike visualization of the data shown in the very first chart of this blog, ignoring calories burned and time spent hiking (since the latter two are subjective and will vary from hiker to hiker). Nevertheless, we can reconstruct the same visualization using these two metrics as well:

This scatter plot shows total hiking duration and calories, per hike, across locations
This scatter plot shows total hiking duration and calories, per hike, across locations

For the next project, I’ll plan to show some other visualization techniques using Python. Stay tuned!


0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.