Hello! Has it really been 8 months since I last posted? My intent to keep the blog updated certainly didn’t fan out over the summer! Though now the cooler weather is here, I have more time on my hands so I intend on changing my habbits!

Firstly, I apologise for the new look! I’m still getting the hand of WordPress themes. I’m a fan of the minimalist look, though this “Twenty fourteen” certainly looks a little too spartan! I’ve never been great at the “Making it pretty” part of web design, so stick with me an hopefully things will improve!

The thing I am better at is back end coding. I particularly enjoy problem solving, so when the team behind the cycling club came to me with a suggestion that one of the site functions could do with some improvements in speed, I jumped at the chance!

The function in question relates to validating GPS trails – the majority of the rides organised are attended my masses of riders who start at a given point, visit certain “Checkpoints” to get a receipt, signature, stamp or some other proof of visit, and then (though not always) return to the start. These are the types of ride I attend, though another way of gaining points is to ride a “DIY Permanent” whereby you track your progress using a GPS device and then submit the trail, which is checked against known records of the trail.

Using libraries of functions, this part of the task is fairly straightforward and is carried out by visiting each point of the expected ride, finding the closes point from the riders trail and comparing that to within a certain level tolerance. The problem comes when there are several of these GPS trails to be checked and the riders have recorded a huge number of points along the way! One example I used was a 200km trail that had a suggested GPS trail of just over 1000 points.

The user had submitted 67,000 points along their journey! Although running locally, the software managed the impressive task within 40 seconds, I’m told that wasn’t the biggest that has been seen!

So how could the times be improved? The task is currently pretty much sequential, so only one processing core is being used so one option would be to break up the task into smaller chunks to allow for some parallel processing of the trail. Though given the cost of servers with multiple accessible cores over our current offering, I didn’t give this much consideration.

So how to simplify the task? Researching “Line simplification” may lead you to Ramer, Douglas & Peucker’s algorithm. Specifically wikipedia’s entry: https://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm.

So how does it work? (For people not wanting to look just yet…) You take the start and finish points, draw a line between, and then find the point that is furthest from this line… If this point is outside of the tolerance required, mark this point as a mid point and create two new lines meeting here. Repeat this process for the new lines created, et voila! You eventually get lines representing the original route, with slightly decreased resolution but potentially a huge compression rate!

And the results of carrying out the filtering? The trail was reduced from 67,000 points to an impressed 1,200! Running the same path validation algorithm over this new track resulted in an execution speed of just 10 seconds! a 400% improvement!

Unfortunately it wasn’t all good news. As you can imagine, comparing all these points is still a computing intensive activity and took a full 20 seconds to carry out. Still, 30 seconds total compared to 37 was a 24% improvement.

So, the simplest option that was considered, of simply “missing out points” from the control trail currently looks the best option in terms of implementation ease and improvement (Using 1 in 4 points on the control trail and comparing against the full path of 67,00 user points took 17 seconds, incidentally) though we are trying one last ditch attempt to get our riders to tone down the data sampling!

And just in case anyone is interested in the Java implementation of the algorithm (you’ll need your own implementation of lineToPoint()):

private List <Coordinate> trim(List <Coordinate> coords, int epsilon) { double dmax = 0; int index = 0; int end = coords.size() - 1; // Find furthest point from line start to end List<Coordinate> theLine = new ArrayList(); theLine.add(coords.get(0)); theLine.add(coords.get(end)); for (int i = 1; i < end - 1; i++) { double dist = lineToPoint(theLine, coords.get(i)); if (dist > dmax) { dmax = dist; index = i; } } List <Coordinate> resultList = new ArrayList(); // If max distance is greater than epsilon, recursively simplify if ( dmax > epsilon ) { List <Coordinate> recResults1 = trim(coords.subList(0, index), epsilon); List <Coordinate> recResults2 = trim(coords.subList(index, end), epsilon); // Build the result list resultList.addAll(recResults1); resultList.remove(resultList.size()-1); resultList.addAll(recResults2); } else { resultList.add(coords.get(0)); resultList.add(coords.get(coords.size()-1)); } return resultList; }