I’ve had a Strava API key for a while which I’ve used sporadically out of curiosity. While the summary data the API returns for public information is great, the best stuff requires a user to authenticate (even though the same information is displayed on the web interface for all).
Having moved up the Caerphilly valleys, I wondered which routes cyclists preferred to get from Cardiff to Caerphilly. I’ve ridden there plenty of times, but only because there was a hill, road or trail on the way and never specifically to get to Caerphilly.
Which routes do Strava members use?
Essentially we’re looking to collect activities which pass through Cardiff and then Caerphilly. The activity doesn’t necessarily have to start in Cardiff, and doesn’t necessarily have to end in Caerphilly, just as long as it passes through both places. However, we need something specific to start from, to build our list.
So I chose one of the most popular north-heading segments in Cardiff centre, along the A470 near the Royal Welsh College of Music and Drama. This segment has the added bonus that cyclists here use both the main road and the adjacent cycle path, so is likely to appear in activities of riders using either. This also means potentially different types of riders – those who are more comfortable on the road, and those more comfortable off.
To date, Strava has recorded over 3000 different riders making over 30,000 ‘efforts’ over this segment. Each effort is part of an activity, and so from each effort we can obtain an activity id. As one would probably expect, they won’t let us download all 30,000 in one go – only in batches of 200.
To begin with I downloaded around 10 of these batches of 200, from 2010 to the present. Now we have over 2000 activities we know pass through Cardiff heading North, we need to cut them down to just those which pass through Caerphilly town after they have passed through Cardiff.
Authenticated users can obtain a wealth of information about their own activities, including detailed raw data of lat/long co-ordinates, time, speed, gradient, heart-rate and so-on. But as I am just obtaining public data for activities from other users I am left with summary data which includes a summary polyline string. These summary polylines are deliberately simple and contain far fewer points than the raw ride data. Still, they’re a decent start for discarding rides where no points are within a bounding box drawn over our destination area, Caerphilly.
After some more cleaning up – removing activities which ran in the opposite direction (Caerphilly to Cardiff) and creating some new polylines which discard ride data before Cardiff and after Caerphilly, we’re left with a bunch of polylines we can stick on a google map.
The above image is a screenshot; as I can’t embed google maps on this wordpress account I’ve hosted it here.
No surprise that some of the most popular routes include the Taff Trail and the various roads over Caerphilly mountain. Each route however varies; from the busy (but quick) main roads out of Cardiff, to the mixed-surface Taff Trail, to the quieter but steeper roads above Rhiwbina. Which types of rider may favour which routes? Novice riders, competitive road cyclists, mountain bikers, commuters?
This data would probably be better displayed as a heatmap, similar to those Strava creates for individual users, or all rides. My map would differ however in only showing rides specifically from one area to another, though I’m sure this is the sort of thing others have tried before. Other things that could be done might include identifying different average speeds of routes, gradients, surfaces, times of day, whether the commute flag is set, type of ride etc. Perhaps Strava Metro data could help here; the Urban Big Data Centre have a pretty nifty ‘Glasgow in Motion‘ site which uses Strava Metro data amongst other things.