Category Archives: Ubuntu

An update! And interesting algorithms…

Hello! Has it really been 8 months since I last posted? My intent to keep the blog updated certainly didn’t fan out over the summer! Though now the cooler weather is here, I have more time on my hands so I intend on changing my habbits!

Firstly, I apologise for the new look! I’m still getting the hand of WordPress themes. I’m a fan of the minimalist look, though this “Twenty fourteen” certainly looks a little too spartan! I’ve never been great at the “Making it pretty” part of web design, so stick with me an hopefully things will improve!

The thing I am better at is back end coding. I particularly enjoy problem solving, so when the team behind the cycling club came to me with a suggestion that one of the site functions could do with some improvements in speed,  I jumped at the chance!

The function in question relates to validating GPS trails – the majority of the rides organised are attended my masses of riders who start at a given point, visit certain “Checkpoints” to get a receipt, signature, stamp or some other proof of visit, and then (though not always) return to the start. These are the types of ride I attend, though another way of gaining points is to ride a “DIY Permanent” whereby you track your progress using a GPS device and then submit the trail, which is checked against known records of the trail.

Using libraries of functions, this part of the task is fairly straightforward and is carried out by visiting each point of the expected ride, finding the closes point from the riders trail and comparing that to within a certain level tolerance. The problem comes when there are several of these GPS trails to be checked and the riders have recorded a huge number of points along the way! One example I used was a 200km trail that had a suggested GPS trail of just over 1000 points.

The user had submitted 67,000 points along their journey! Although running locally, the software managed the impressive task within 40 seconds, I’m told that wasn’t the biggest that has been seen!

So how could the times be improved? The task is currently pretty much sequential, so only one processing core is being used so one option would be to break up the task into smaller chunks to allow for some parallel processing of the trail. Though given the cost of servers with multiple accessible cores over our current offering, I didn’t give this much consideration.

So how to simplify the task? Researching “Line simplification” may lead you to Ramer, Douglas & Peucker’s algorithm. Specifically wikipedia’s entry: https://en.wikipedia.org/wiki/Ramer-Douglas-Peucker_algorithm.

So how does it work? (For people not wanting to look just yet…) You take the start and finish points, draw a line between, and then find the point that is furthest from this line… If this point is outside of the tolerance required, mark this point as a mid point and create two new lines meeting here. Repeat this process for the new lines created, et voila! You eventually get lines representing the original route, with slightly decreased resolution but potentially a huge compression rate!

And the results of carrying out the filtering? The trail was reduced from 67,000 points to an impressed 1,200! Running the same path validation algorithm over this new track resulted in an execution speed of just 10 seconds! a 400% improvement!

Unfortunately it wasn’t all good news. As you can imagine, comparing all these points is still a computing intensive activity and took a full 20 seconds to carry out. Still, 30 seconds total compared to 37 was a 24% improvement.

So, the simplest option that was considered, of simply “missing out points” from the control trail currently looks the best option in terms of implementation ease and improvement (Using 1 in 4 points on the control trail and comparing against the full path of 67,00 user points took 17 seconds, incidentally) though we are trying one last ditch attempt to get our riders to tone down the data sampling!

 

And just in case anyone is interested in the Java implementation of the algorithm (you’ll need your own implementation of lineToPoint()):

 

private List <Coordinate> trim(List <Coordinate> coords, int epsilon) {
 double dmax = 0;
 int index = 0;
 int end = coords.size() - 1;

// Find furthest point from line start to end
 List<Coordinate> theLine = new ArrayList();
 theLine.add(coords.get(0));
 theLine.add(coords.get(end));
for (int i = 1; i < end - 1; i++) {
 double dist = lineToPoint(theLine, coords.get(i));
 if (dist > dmax) {
 dmax = dist;
 index = i;
 }
 }

 List <Coordinate> resultList = new ArrayList();
 // If max distance is greater than epsilon, recursively simplify
 if ( dmax > epsilon ) {
 List <Coordinate> recResults1 = trim(coords.subList(0, index), epsilon);
 List <Coordinate> recResults2 = trim(coords.subList(index, end), epsilon);
 
 // Build the result list
 resultList.addAll(recResults1);
 resultList.remove(resultList.size()-1);
 resultList.addAll(recResults2);
 } else {
 resultList.add(coords.get(0));
 resultList.add(coords.get(coords.size()-1));
 }
 return resultList;
 }

Look before you leap, think before you adjust the Apache settings!

I’m sure not everyone will have the same problem – several different URLs pointing at one server, all served by one Apache server? Maybe you’ve bought a few similar domains? Maybe tezk.co.uk, tezk.uk and tezk.home? Though through slightly sloppy web page creation, some of the links will direct you to one of the other sites… Not a problem! Until you realise the cookies you’ve created on one domain wont necessarily work on another domain, resulting in logged in users being forced to log in several times! A good argument for never specifying the host in links, only the relative path.

Anyways, how to force people to use just one domain? Here’s where Apache’s redirect feature comes into play! Where as previously you might have had the following in your sites configuration file:

<VirtualHost *:80>
           ServerName www.tezk.co.uk
           ServerAlias  tezk.home
           <directory /var/www/html/tezk>
           </directory>
</VirtualHost>

So the server responds to a host request on any of those addresses, simply use the redirect feature!

<VirtualHost *:80>
           ServerName tezk.home
           Redirect "/" "http://www.tezk.co.uk/"
</VirtualHost>

<VirtualHost *:80>
           ServerName www.tezk.co.uk
           <directory /var/www/html/tezk>
           </directory>
</VirtualHost>

Simple! Now, any request to tezk.home will result in a 3xx code with a redirect! The browser simply requests the tezk.co.uk web page and the user is none the wiser, other than not having cookie related login issues!

Though there is one problem this setup does create… What happens when we request the website using the ip address? Or use a script or curl to pull data from the site? Due to the ordering here, if the site name isn’t specified, we match the first listed, and so we get redirected! Unless the script is configured to follow redirects (And how many are?) the script will fail and you’ll be left scratching your head!

So how to resolve it? Listing the redirects after the main page definition will set the “default” page to our main site, which does fix the script issue, but does mean anyone requesting another site from us (Using the “Host” http header) will be served up our main site. But then again maybe that isn’t such a problem? The other way would be to ensure the “host” http header is set correctly! This is what Apache looks at when checking the virtual hosts!

It seems a related server and an App in the Google play store had been broken since the migration due to them not specifying the “host” properly…

AWS, a new comer’s views

As an IT professional it can be a daunting task trying to keep up with new technologies. A lot of job specifications I’ve seen have asked for Amazon or other cloud experience, so I decided it was time I had a look!

Previously I’ve administered several virtual servers hosted by hosting companies, all of which can be classed as cloud computing, given that the machine doesn’t actually exist and all of the processing is done “in the cloud”, so what are my initial thoughts having delved into AWS for the first time?

Mainly how easy it is to get started! The interface certainly looks daunting, though I managed to get a virtual server up and running on the EC2 (Elastic cloud computing) service in less than 10 minutes! And what’s better, they provide a modest machine known as a t2.micro, absolutely free of charge for the first year!

Delving a little deeper, following Ryan Kroonenberg’s “AWS Certified Developer 2016” course on Udacity, (https://www.udemy.com/aws-certified-developer-associate/) I was surprised by how easy it all was! I even manged to get a load balancer set up, linked to an EC2 server, and managed to Link to the Tomcat server seamlessly from Netbeans! And it was still free! (So long as you pick the right machine…) Admittedly the choice of course was biased towards getting certified rather than actually learning AWS, but it provides an excellent starting point!

Having developed an idea for a future Android app, the environment looks like the perfect place to host the back end processing given it’s scalable nature and high availability. Plus it will earn me some brownie points for the CV! Watch this space for details of the App in the future!

What a week!

What a week it has been! We returned to Kent from visiting family in Derbyshire, unaware of the events in store…

It all started Monday morning with an ominous email reading “The server has disappeared!”. Never something you want to see, never mind on a Monday! The usual ping test confirmed the host was down, though the hosting companies routers confirmed that the host was unreachable, so at least the DNS was not to blame.

Checking the providers news feed revealed they’d had a server outage the previous day, confirmation something had gone awry on their end. After finding someone with the password for the dashboard, they raised a support ticket and got the server running again! Unfortunately that was only the start of the problems!

As the server had unexpectedly gone off line, the file system had a few errors and so had been mounted read only until we could run FSCK over it to correct any problems. Being a hosted server, this had to be requested from the ever helpful support team. That done, we managed approximately half an hour before the server switched the file system back to read only mode! After several attempts to stabilise things, the decision was made to request a new server image and perform an emergency migration… Not something to be taken lightly, given the server was still running Ubuntu 10.04!

With the new server freshly running 14.04 and updated, the long slow process of installing the LAMP stack commenced, followed by the reinstatement of the databases and web site pages. By Tuesday evening, everything appeared to be running. Or so it seemed! It turns out the website was relying on features disabled in the currently installed version of PHP! After trying a few code kludges to work around the matter, time was pressing on and so a downgrade looked the best option. But how to carry it out? A good old compile from source! Which always takes forever in a server environment, given they aren’t geared for development and so package after package had to be added before it was finally compiled and installed and back up and running!

So, by Wednesday evening, the majority of the functionality had been reinstated…  A problem reading created Excel spreadsheets was traced to bugs in a no longer supported PHP module (http://pear.php.net/bugs/bug.php?id=19284&edit=3) and usability issues relating to cookie usage across domains were rectified (rather than respond to several addresses, the site now forwards requests to one domain) and so we’re up and ready for the weekend!

So what did I learn from the week?

  • Always keep up to date backups of the /etc/ directory! Luckily I had a copy I’d made last year available, but it would have been much easier if it had been kept up to date!
  • PHP is evil…
  • And PHP code from last decade isn’t going to be happy on a fresh server.
  • Ask for passwords for the control panel of the server! I hadn’t asked as I didn’t forsee a need for it…

So here’s to the weekend, and the 10 mile run on Sunday! (http://canterbury10.co.uk/)

With thanks to:

The support staff at Hosting UK (https://hostinguk.net/)  who did all they could to ensure we were back up and running as quickly as possible.

My colleagues at AUKWEB (http://www.aukweb.net/) all volunteers who set up the old server, coded the site and assisted where they could in the migration.

Here’s to another decade of uptime!