013 Postgres command line : psql : Using ST_Within function to build junction tables to compare 2 separate polygon tables

First off let us create a new database to hold our examples in.

CREATE DATABASE stwithindb;

Now add the postgis extension.

Lets create two tables one called fields and one called plots

CREATE TABLE
t00001fields
(
pkid serial primary key,
fieldname varchar(50),
geom geometry(polygon,27700)
)
;
CREATE TABLE
t00002Plots
(
pkid serial primary key,
plotname varchar(50),
geom geometry(polygon,27700)
)
;

Now lets go to QGIS connect to the PostGIS instance add the tables and create some test data manually.

Here I have added fields in green with bold number labels and plots in brown with smaller number labelling. The numbers represent the pkid fields.

Now here I can quickly run a query to identify the plots that are in fields

SELECT t00002plots.pkid
FROM
t00002plots,
t00001fields
WHERE 
ST_WITHIN(PUBLIC.T00002PLOTS.GEOM, PUBLIC.T00001FIELDS.GEOM);

And it correctly identifies that plot 1 is within the fields layer.

But what would be great in an application is to have some kind of junction table that individual master records could display their children on. For this we need a junction table that links between the field and plots table showing the pkids from each.

SELECT t00002plots.pkid as Plotspkid,t00001fields.pkid as Fieldspkid
FROM
t00002plots,
t00001fields
WHERE 
ST_WITHIN(PUBLIC.T00002PLOTS.GEOM, PUBLIC.T00001FIELDS.GEOM);

Now I will move plot 2 into field 3 and rerun the above.

The layer now looks like

and running the former query we get.

Now its possible to either create a junction table to hold this information..

eg

CREATE TABLE t00010fieldplotjunction AS 
SELECT t00002plots.pkid as Plotspkid,t00001fields.pkid as Fieldspkid
FROM
t00002plots,
t00001fields
WHERE 
ST_WITHIN(PUBLIC.T00002PLOTS.GEOM, PUBLIC.T00001FIELDS.GEOM);

or we can create a view that will constantly calculate this everytime it is seen

CREATE VIEW v001FieldPlotJunction AS
SELECT t00002plots.pkid as Plotspkid,t00001fields.pkid as Fieldspkid
FROM
t00002plots,
t00001fields
WHERE 
ST_WITHIN(PUBLIC.T00002PLOTS.GEOM, PUBLIC.T00001FIELDS.GEOM);

Now if I add a few more plots and fields and then pull up the view we shall see that everything has been adjusted

and running the view we now get

In some circumstances this calculation may be expensive so we may wish to run and create a junction table overnight other times we may be happy to do it fully dynamically. Of course in a front end you could query and filter such that only one record was compared against the fields plot at anytime. Very useful nonetheless.

Extraction Transformation and Load (ETL) – some thoughts on a large IT transfer project

In 2017 I was involved in an important work project to transfer all the records in a legacy system that was being deprecated by the vendor (IDOX Group) into another maintained system. We were in some ways fortunate because both systems had been designed by a single company (IDOX) and they were encouraging us to transfer. We had delayed transfer for several years already but were aware that we now had to move. The vendor did have some tools in place , had staff dedicated to such transfers and were offering favourable consultancy rates. The amount of data was not horrendous in computing terms but they were far far beyond the remit of the ability to cope with any sort of manual data correction and the system was an absolute core system upon which several departments completely depended. These were systems that all departments are in from the moment they start the work day to the end. Generally its unusual if they are down for more than 5 minutes in a month, all work pretty much stops when they stop and in no circumstances could they be down for more than a day without special dispensation and coordination to indicate to manage customer expectations.

The whole project was a success although it was challenging. Here is an outline of the steps we took. As ever order here is important in most of the steps. I had written something on this before but consider this to be a more accurate rundown see here

Step 1
Inform managers of all involved sections and ensure they are on board – identify and ring fence budget

Step 2
Appoint project manager on vendor and client side
draw together team to perform transformation.

Step 3
Draft time table creation of how long it will take putting in place planning for tutorials on systems and consultancy.

Step 4
Request managers to put forward staff on all sides willing to be involved

Step 5
Identify any omissions in knowledge and start to identify how this can be remedied. Kick off and complete acquisition of said staff.

Step 6
Meeting with lead staff to confirm buy in. Request buy in from staff including ring fencing of holidays etc.. to ensure key staff are available at required times.

Step 7
Set up test systems that all individuals have access to and ensure that the old and new systems can be viewed simultaneously by individuals. Ensure that the domain specialists can identify processes that will be mirrored from the old system to the new system

Step 8
Give DBAs or those that will be doing data transfer access to databases of source so that they can start thinking of how they can pull out information.

Step 9
Training for all individuals concerned in new systems.

Step 10
In new system start tasking individuals with how they are going to do the simple processes – eg register a record approve a record alter a record and get reports out. If possible allow new champions to start to define things like reports.

Step 11
Start making up any new lookup fields compared with old lookups and also start tasking individuals with creation of reports and letter that will need to be done.

Step 12
Start mapping the data from old system to new system – excel spreadsheets can be used for this that show the data going from the old system and what fields they are going to go into in the new system. Divide this task up between domain users – this step needs to be done after old and new systems are on domain users machines. As part of this the applications in question should expose if possible the table and field names of the source and target fields. With the systems we were involved in this was possible both for the old and new systems.

For each form on the two systems try to identify the below

Source table.field Target table.field

Also get them to map the lookup table values if direct transfer is not possible or if alias id are used in these lookups.

Source table.field.value=Equivalent.Target table.field.value

Step 13
Give both mapping documents to the ETL people to allow them to start writing the queries. It is unlikely that there will be a straight transfer across from table to table. While it would be expected that field and table names will be completely different it will be expected that table structure will in certain places be different in this respect it would be good to have a really nice schema diagram of both source and target.

Step 14
Allow data individuals to write scripts that can be run live against present initial system – if necessary doesn’t need to be live live could copy every night and then perform on 1 day old database backend – which is what we did. This means work can go on in old system and then at a touch of a button.

Encourage DBAs to be able to run these scripts every day to ensure that running them for go live is absolutely no issue. Our scripts only took about half an hour to run so this wasn’t an issue. I was personally involved in writing the SQL for those and I had systems in place to cross tab the amount coming into each new table so I could see new records and information from the old system trickling manually into the system and then being transferred.

Step 15
Test data input into new system

Step 16
Check test data input into new system with reference to domain users.

Step 17
Confirm go live date ensure staff available for issues

Step 18
Go live to production and start all new procedures ensure staff technical and domain key players on hand to make flexible solutions to things

Step 19
Project review on going maintenance and improvement of new system

Step 20
After suitable time turn off of old system if possible.

012 Postgres command line : psql : Run sql from a table

I wanted to be able to run thousands of queries or hundreds from Postgres like I can in MS Access this didn’t turn out to be too difficult.
Here’s something that works firstly lets create a new database

CREATE DATABASE sqlloopdb;

You will then need to connect to the database.

Next I will create 2 tables with; One table to be called t001sqltarget – this is the table we shall change with queries. One table called t002sqlrun – this will contain queries that we will run.

Please note the field names are important as well but I will let you study them in the code.

I then have 4 inserts that place valid SQL strings into the field sqltorun.

CREATE TABLE t001sqltarget (pkid serial primary key, fieldforupdate varchar(1));
CREATE TABLE t002sqlrun (pkid serial primary key, sqltorun varchar(1000));
INSERT INTO t002sqlrun(sqltorun) values ('INSERT INTO t001sqltarget(fieldforupdate) values (1);');
INSERT INTO t002sqlrun(sqltorun) values ('INSERT INTO t001sqltarget(fieldforupdate) values (2);');
INSERT INTO t002sqlrun(sqltorun) values ('INSERT INTO t001sqltarget(fieldforupdate) values (3);');
INSERT INTO t002sqlrun(sqltorun) values ('INSERT INTO t001sqltarget(fieldforupdate) values (4);');

First lets run the above and see what we have. Below you can see that I create the database then connect to it before opening the editor from which I run the above code I then take a look at the tables in the database and run a select to return all the records within the t001sqltarget table of which there are none.

Now lets run the following code and then take a look at t001sqltarget.

DO
LANGUAGE plpgsql
$$
DECLARE
stmt text;
BEGIN
FOR stmt IN
SELECT sqltorun FROM t002sqlrun
LOOP
EXECUTE stmt;
END LOOP;
END;
$$;

And after running there are 4 lines in the table

Every time I run the Do code four more records will be added to this table. Any SQL could be included in t002sqlrun and this is a nice demonstration of what I had previously able to do in MS Access and is massively powerful. It could be used for instance to calculate multiple measurements.

011 : Postgres amalgamate consecutive lines into a single line in a table

Here we take much of the work covered in post 010 and take the parts and user st_union to merge into a single record and place it in a table created by transforming a view into a table

Firstly go to your psql line and ensure that you are logged in with a username that you wish to be the owner of the table. In my case general

logging into edinburgh routing database

Now same measurement as before but this time we shall make a view out of the measurements then load that into a new table before deleting the view leaving us with the table with a combined measurement.

CREATE VIEW v001firstmeasurement AS SELECT seq,  id1 AS node, id2 AS edge, cost, geom, agg
  FROM pgr_dijkstra( 'SELECT id, source, target, st_length(geom) as cost FROM public.t01roadnetwork', 15883, 10967, false, false  ) as di
  JOIN public.t01roadnetwork pt ON di.id2 = pt.id ;

CREATE TABLE t003 as select sum(cost), st_union(geom) from v001firstmeasurement;

DROP VIEW v001firstmeasurement;

It is important in notepad to remove the blank spaces in the editor this looks as follows.

We then should then get some kind of confirmation that the view and table are created before the view is then dropped again. There might be a more efficient way of doing this but this was my first experiment. And we can go back to QGIS 3.4 and display the now single line in our project. Complete with now accurate measurement. It should be noted that if you were wanting to do multiple line measurements you would need to step out of the create statement and use an insert statement for all subsequent insertions as follows.
insert into t003(sum,st_union) select sum(cost),st_union(geom) from v001firstmeasurement;
This would allow you to do multiple measurments. I haven’t added up the measurement but it looks about right.