Proof of Negative for database design : Put simply – Try to keep data in one easily searchable place!

Its always satisfying when you are able to conceptualise a problem that then seems to be appear widely. Invariably when I do this it is something that others reasoned on long before nonetheless it is still satisfying to arrive at the table albeit late.

I have done a lot of work in the public sector and many of the legal requirements constantly change and are not always logical. This can lead to contradictory requirements which software designers and particularly vendors can’t or won’t keep up with. Inevitably we end up with satellite procedures to catch edge cases. This is causing quite a lot of friction because of something I have started thinking about as Proof of Negative.

To prove a negative you have to access all information that has ever been – often to prove a positive you just need to find the one instance and then you can stop looking. Therefore it is on average harder to prove a negative than it is to prove a positive.

It was highlighted to me the other day when looking at a map. We were looking for a planning application and we were certain it existed. We had no reason to presume it didn’t exist however we were unable to find it. I eventually went to the map and looked at every planning application in an area none existed and so the conclusion was that it had never existed – time consuming but highlighted to me the importance of having all the information in one format that was easily comparable. Quite often switching between systems there are reconciliation issues either gaps or overlaps or the search options are wildly different which leaves you needing then to reconcile between searches additional difficulty and additional time.

So something to keep in mind when moving data about (Try and keep it in one easily searchable place ) Still nice to discover or should I say re-discover a fundamental truth .. It is sometimes referred to as the philosophic burden of proof and has been debated and thought about extensively.

Wikipedia Proof of Negative

New System Assessment : A Simple Set of Guidelines for judging material differences between Software

Here’s a short guide I put together for myself to help me assess the value of enterprise systems that I may be looking for or designing myself. Not hard and fast and some points may not be relevant to every application but this should form a good basis from which to start. Generally a higher number is better than a lower number but within numbers items are more cumulative and of equal value.

1)Paper based = Everything from single sheets to formal Books of Accounts
2)Simple Digital = spreadsheet and other file based storage
3)Cloud based Simple Digital = spreadsheet and other file based storage – this can include file based blob storage
4.1)LAN relational database – normalised but otherwise fairly locked down
4.2)LAN relational database + easy import and export of data other than spatial
4.3)LAN relational database + report writer definable by users
4.4)LAN relational database + spatially enabled
5.1)Internet available relational database – normalised but otherwise fairly locked down
5.2)Internet available relational database + easy import and export of data other than spatial
5.3)Internet available relational database + report writer definable by users
5.4)Internet available relational database + spatially enabled (noted not all applications require a spatial aspect)
5.5)Internet available relational database + separate site available for public access view only
5.6)Internet available relations database + separate access to public access to edit and add information

Requirements
+ UPtime / reliability
+ SPEED – after reliability very important
+ Good master details forms (sounds easy but bootstrap is not great for this)
+ GOOD Search – the larger the system the more important the search options are in it. Look out for things like Automated Objective Index creation (case sensitive search is more of a hassle than a benefit a lot of the time – odd wild cards – or not being wild by default are problem areas)

Important Points
+ Portability can you up sticks and move it to a different cloud provider
+ Non – Proprietary (Can be a gotcha if its really obscure or has weird security, often linked to portability)
+ two factor authentication
+ Responsive Design (Not necessarily as important as you think in some applications see Github / Open Layers arguably don’t need it)
+ Dynamic saving – I really miss the fast dynamic saving environment of MS Access the save button isn’t quite the same
+ Built in CRM – I generally like them especially if data can go straight into correct files.. Increasingly I am designing systems with inbuilt CRM – I know this might not be to every organisations taste but it is jarring to go between systems and normalisation between systems is usually very sub optimal – plus you frequently come across proof of negative problems when data required for a specific task is not held appropriately.
+ Satellite application for customers to enter information (why not let your customers maintain their information can be great and can empower customers Fintech anyone?)

= All the bells and whistles
Spatially enabled internet available database with two factor authentication – report writer and easy import and export of spatial and attribute information with public access to edit and add information by customers (Possibly via satellite application).

QGIS and PostGIS : Identifying direction of a vector

If using the dijkstra function with direction turned on it is important to identify the order in which the nodes of a vector line have been digitised. This is called the direction, dijkstra can use this with a reverse_cost attribute to handicap wrong movement along lines to such an extent that the correct path can be calculated around things like roundabouts.

Here is an example of the roundabout in Straiton in Edinburgh just North of the A720 bypass. While some of the lines have a correct anti clockwise orienation clearly some have been incorrectly digitised.

First we can see this by displaying the network in QGIS but using the styling to arrow the direction.

The function that can be used to reverse such inaccuracies if you can’t resort to buying a correct dataset try ST_REVERSE

Extraction Transformation and Load (ETL) – some thoughts on a large IT transfer project

In 2017 I was involved in an important work project to transfer all the records in a legacy system that was being deprecated by the vendor (IDOX Group) into another maintained system. We were in some ways fortunate because both systems had been designed by a single company (IDOX) and they were encouraging us to transfer. We had delayed transfer for several years already but were aware that we now had to move. The vendor did have some tools in place , had staff dedicated to such transfers and were offering favourable consultancy rates. The amount of data was not horrendous in computing terms but they were far far beyond the remit of the ability to cope with any sort of manual data correction and the system was an absolute core system upon which several departments completely depended. These were systems that all departments are in from the moment they start the work day to the end. Generally its unusual if they are down for more than 5 minutes in a month, all work pretty much stops when they stop and in no circumstances could they be down for more than a day without special dispensation and coordination to indicate to manage customer expectations.

The whole project was a success although it was challenging. Here is an outline of the steps we took. As ever order here is important in most of the steps. I had written something on this before but consider this to be a more accurate rundown see here

Step 1
Inform managers of all involved sections and ensure they are on board – identify and ring fence budget

Step 2
Appoint project manager on vendor and client side
draw together team to perform transformation.

Step 3
Draft time table creation of how long it will take putting in place planning for tutorials on systems and consultancy.

Step 4
Request managers to put forward staff on all sides willing to be involved

Step 5
Identify any omissions in knowledge and start to identify how this can be remedied. Kick off and complete acquisition of said staff.

Step 6
Meeting with lead staff to confirm buy in. Request buy in from staff including ring fencing of holidays etc.. to ensure key staff are available at required times.

Step 7
Set up test systems that all individuals have access to and ensure that the old and new systems can be viewed simultaneously by individuals. Ensure that the domain specialists can identify processes that will be mirrored from the old system to the new system

Step 8
Give DBAs or those that will be doing data transfer access to databases of source so that they can start thinking of how they can pull out information.

Step 9
Training for all individuals concerned in new systems.

Step 10
In new system start tasking individuals with how they are going to do the simple processes – eg register a record approve a record alter a record and get reports out. If possible allow new champions to start to define things like reports.

Step 11
Start making up any new lookup fields compared with old lookups and also start tasking individuals with creation of reports and letter that will need to be done.

Step 12
Start mapping the data from old system to new system – excel spreadsheets can be used for this that show the data going from the old system and what fields they are going to go into in the new system. Divide this task up between domain users – this step needs to be done after old and new systems are on domain users machines. As part of this the applications in question should expose if possible the table and field names of the source and target fields. With the systems we were involved in this was possible both for the old and new systems.

For each form on the two systems try to identify the below

Source table.field Target table.field

Also get them to map the lookup table values if direct transfer is not possible or if alias id are used in these lookups.

Source table.field.value=Equivalent.Target table.field.value

Step 13
Give both mapping documents to the ETL people to allow them to start writing the queries. It is unlikely that there will be a straight transfer across from table to table. While it would be expected that field and table names will be completely different it will be expected that table structure will in certain places be different in this respect it would be good to have a really nice schema diagram of both source and target.

Step 14
Allow data individuals to write scripts that can be run live against present initial system – if necessary doesn’t need to be live live could copy every night and then perform on 1 day old database backend – which is what we did. This means work can go on in old system and then at a touch of a button.

Encourage DBAs to be able to run these scripts every day to ensure that running them for go live is absolutely no issue. Our scripts only took about half an hour to run so this wasn’t an issue. I was personally involved in writing the SQL for those and I had systems in place to cross tab the amount coming into each new table so I could see new records and information from the old system trickling manually into the system and then being transferred.

Step 15
Test data input into new system

Step 16
Check test data input into new system with reference to domain users.

Step 17
Confirm go live date ensure staff available for issues

Step 18
Go live to production and start all new procedures ensure staff technical and domain key players on hand to make flexible solutions to things

Step 19
Project review on going maintenance and improvement of new system

Step 20
After suitable time turn off of old system if possible.

Towards the best table design for Addresses

Floor Reference or Flat No
Building Name or House Name
House/Building/Unit Number
Street Name
Through Street
Locality
Town or City
Country
Postcode

A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.

Antoine de Saint-Exupery

OK so I didn’t know who he was either until I did a search on simplicity but I wanted to record my best field structure for the storage of addresses which although important is on a par with watching paint dry in terms of excitement. Despite this addresses are central to a heck of a lot of applications so it is important to get it right. This copes with pretty much most addresses. I mainly use locality in larger towns or cities to denote a sub area specifically for users as you can guarantee they will be unfamiliar with the majority of specific street names and particularly for users who are doing business in the same city they are located in it is very useful for them to have a quick understanding of where things are in relation to the assets of the operation involved. Such “context” can be added by using maps but will come for free with intelligent staff if you give them opportunity. I long ago dropped counties and districts but I could see if you were in a larger country such as Russia or America could be useful to note state / oblast etc.. Additionally in the UK town is really postal town – which is often not obvious at all – it can be overkill but you can organise addresses so that postal town is referenced from the town automatically. Probably overkill.

Information on BS7666
National Land and Property Gazeteer

Building Timing Software for Athletic Races

A while back I got the opportunity to work with a friend who times athletic races. After a bit of questioning he showed me some of his equipment and how it worked. I realised I could probably put together some home made timing software and so I set about doing it. The following is an overview of the stages that I went through to create the software.

Firstly its important to understand how the hardware works. To my mind this is very much a pattern and irrespective of the software you are using it is going to have to perform the same tasks in the same order.

Pre race test and hook up
IMGP0085

Each competitor is given an RFID chip that has a unique number registered to it. (Numbers can be changed but requires specialist equipment)
A timing box is used to both power the matts (aka antennae) and act as a clock. When someone goes over the matt the chip emits the number and sends to the box. The box then places a time stamp on the chip and places in a file. A file or stream of all these strings is then pushed out of the timing box to awaiting computer.
Computer is listening and takes this information and software does the rest

In terms of the information that is produced by the box this is remakably simple – a raw hex string for each read.

My solution has three main tables
T1 – Raw times
T2 – Competitors
T3 – Chip Tag numbers.

The hard coded chip numbers are hex and each competitor is allocated a chip – they are actually being allocated this hex key but the number equates so on the chip itself will be a readable number eg 14001, 14002 the actual raw code from the box is often something like 45ab32c

What the software then does.

Step 1
Ensure exact duplicates reads are not entered generally because of multiple imports of same file.This is done by setting the full hex key as a primary key disallowing the same value to go into the Raw Times table multiple times.

Step 2
Hex to Decimal translation

Step 3
Drop times before race start time.
Simple select query.

Step 4
Match Reading to competitors – A simple join after hex to decimal translation of information from 3

Step 5
Sort times within competitor in ascending order
Data sort on dual columns

Step 6
Gate times – Competitors get multiple reads over the matt. I have set up something called a gate ( no idea whether this is a standard pattern or not but its what I call it ) Then a gate period is subjectively decided, lets say 30 seconds, (this can be altered for lapped races where you know a racer can’t complete a time in a certain time). A Query is then set up to look to the first time of a competitor and the gate period is added. The software continues looking down the times and deletes any times greater than the first gate time but less than the first gate time plus the gate period.

Step 7
Count timing points
In some races there are laps some are point to point – a lapped race can be thought of as recursive non lap race in that it is a series of non lap races where the end of one non lap race is the start of another. All times for each competitor are sorted in increasing order and an additional field is added with incremented numbers 1,2,3,4 etc. Typically there will be 2 a start and a finish. There can be one where a gun has gone off and competitors are not running across a start matt. In that instance everyone is considered as having the same start time the time of the Starting pistol and start one is given the 2 point and 1 is assumed to be a pre-set time.

Step 8
Pivot the times
Place point number as column header – name of competitor as row heading and time value as the value of the pivot.

Step 9
Calculate the lap times
Simple n+1 time minus n time

Step 10
Add up individual times to get total race time sort by any additional category eg gender age etc… sort time order and allocate prizes appropriately.

Step 11
Pick up whatever pretty report writer you can get your hands on – excel will do at a push and print out and hand to race organisers.

Step 12
Make pretty forms so you can easily change competitors add in extra times for missed times allow for edge cases like DNFs DQs penalties etc.

Sunday 11th of May I had the opportunity to test the software out by acting as lead timer on the Castle of Mey 10k the most northerly 10k in the country. Glad to say worked perfectly.

Below is a video of the 2015 race where we were in the McNicol Van

Nice when a plan comes together.