So let’s have a look at the data! All property sales are stored as a row in a PostGIS enabled PostgreSQL database table. For every sale, there is information about e.g. the property type (villa, apartment, plot of land etc), the number of rooms, the sales date and price. And of course, since this will eventually be about geospatial analysis, the location is available as a PostGIS geometry datatype.

An example of a table entry of a house sale is shown below:

Column SQL table column name Value
Address address Stoppgatan 3
District district Tollarp
Municipality kommun Kristianstads kommun
Property type property_type Villa
Sales date sold_date 2020-04-27
Broker broker Länsförsäkringar Fastighetsförmedling Kristianstad
Sales price final_price 1520000
Ask price ask_price 1395000
Price per area price_per_area 11692
Number of rooms num_rooms 5
Main living area main_area 130
Misc. area misc_area 0
Plot area plot_area 263
Build year build_year 1976
Maintentance cost maintenance 32692
Latitude lat 55.9313614
Longitude lon 13.99752036
Geometry way 0101000020110F0000F777698C00BD374166BF31F1EEC75C41

In this part, we’ll get started by doing some basic traditional trend analysis using SQL. We want to answer the question:

What is the monthly price trend for apartments in the largest cities of Sweden during the last 12 months?

I’ll start with posting the desired end result, and then go into the details of the SQL queries to generate it.

Desired result

month Stockholm Göteborg Malmö Uppsala Solna
2019-05-01 100.0 100.0 100.0 100.0 100.0
2019-06-01 102.4 97.1 98.2 95.7 97.9
2019-07-01 98.1 101.8 99.1 112.1 101.6
2019-08-01 102.7 105.1 104.1 112.4 103.3
2019-09-01 106.9 102.6 104.6 104.6 99.3
2019-10-01 105.9 103.1 101.0 102.0 100.3
2019-11-01 105.9 102.3 100.0 99.5 102.0
2019-12-01 108.4 100.7 93.9 108.3 103.8
2020-01-01 109.5 105.1 103.6 107.9 106.1
2020-02-01 110.5 103.5 109.5 105.7 105.8
2020-03-01 108.5 102.3 97.2 99.6 100.0
2020-04-01 102.2 103.3 100.1 98.7 100.0

This table shows that there is a considerable drop in the median price per area unit during March and/or April in Stockholm, Malmö, Uppsala and Solna. Interestingly, the same effect is not showing in Göteborg. However, this is based on the very blunt tool provided by the median. Examining the price distribution more carefully might reveal another story. (This may be the topic for another future post, so let’s leave it the way it is for now.)

The SQL query to achieve this table will be built up in three steps below. First, the median price for the big municipalities are calculated. Second, the normalization to the first month is done. And lastly, the resulting table is pivoted to create the desired output.

Query 1 - Calculate the median price per month for selected cities

SELECT date_trunc('month', sold_date)::date AS month,
       kommun,
       percentile_disc(0.5) WITHIN GROUP (ORDER BY price_per_area) AS median
FROM sold_properties
WHERE property_type LIKE 'Bostadsr%' AND
      sold_date BETWEEN '2019-05-01' AND '2020-04-30' AND
      kommun IN (SELECT kommun
                 FROM sold_properties
                 WHERE property_type LIKE 'Bostadsr%' AND
                       sold_date BETWEEN '2019-05-01' AND '2020-04-30'
                 GROUP BY kommun
                 HAVING count(*) > 2000
                 )
GROUP BY 1,2
;

Results

month kommun median
2019-05-01 Göteborgs kommun 47561
2019-05-01 Malmö kommun 28947
2019-05-01 Solna kommun 55705
2019-05-01 Stockholms kommun 68478
2019-05-01 Uppsala kommun 35294
2019-06-01 Göteborgs kommun 46190
2019-06-01 Malmö kommun 28429
2019-06-01 Solna kommun 54509
2019-06-01 Stockholms kommun 70089
2019-06-01 Uppsala kommun 33769
2019-07-01 Göteborgs kommun 48438
2019-07-01 Malmö kommun 28687

Now, let’s normalize the median prices to the first value of the time period (May 2019) to represent the percentage change. This is done using the above query as a sub-query and scaling by the first value of a sorted list of price values for all municipalities.

Query 2 - Normalize to the first month in the period

SELECT month,
       kommun,
       100*median/first_value(median) OVER (PARTITION BY kommun
                                            ORDER BY month) AS norm_price
FROM (SELECT date_trunc('month', sold_date)::date AS month,
             kommun,
             percentile_disc(0.5) WITHIN GROUP (ORDER BY price_per_area) AS median
      FROM sold_properties
      WHERE property_type LIKE 'Bostadsr%' AND
            sold_date BETWEEN '2019-05-01' AND '2020-04-30' AND
            kommun IN (SELECT kommun
                       FROM sold_properties
                       WHERE property_type LIKE 'Bostadsr%' AND
                             sold_date BETWEEN '2019-05-01' AND '2020-04-30'
                       GROUP BY kommun
                       HAVING count(*) > 2000
                       )
      GROUP BY 1,2
     )
     AS monthly_price
;

Results

month kommun norm_price
2019-05-01 Göteborgs kommun 100
2019-06-01 Göteborgs kommun 97.11738609364815
2019-07-01 Göteborgs kommun 101.84394777233447
2019-08-01 Göteborgs kommun 105.12815121633271
2019-09-01 Göteborgs kommun 102.56302432665419
2019-10-01 Göteborgs kommun 103.0928702087845
2019-11-01 Göteborgs kommun 102.27497319232144
2019-12-01 Göteborgs kommun 100.74851243666029
2020-01-01 Göteborgs kommun 105.12815121633271
2020-02-01 Göteborgs kommun 103.5344084438931
2020-03-01 Göteborgs kommun 102.25184499905384
2020-04-01 Göteborgs kommun 103.28420344399824
2019-05-01 Malmö kommun 100
2019-06-01 Malmö kommun 98.21052267937955
2019-07-01 Malmö kommun 99.10180675026773
2019-08-01 Malmö kommun 104.13859812761254

To get to the final result, we want to distribute the municipalities (kommuns) to separate columns, i.e. pivot on the kommun column. This functionality is available in the table_func extension of PostgreSQL by means of the crosstab function. The full query above (Query 2) needs to be passed as a string to this function, with the addition of the ORDER keyword to ensure that the rows are ordered correctly.

Query 3 - Make a pivot table

SELECT *                                                                             
FROM crosstab('SELECT month,
               kommun,
               100*median/first_value(median) OVER (PARTITION BY kommun
                                                    ORDER BY month) AS norm_price
               FROM (SELECT date_trunc(''month'', sold_date)::date AS month,
                            kommun,
                            percentile_disc(0.5) WITHIN GROUP (ORDER BY price_per_area) AS median
                     FROM sold_properties
                     WHERE property_type LIKE ''Bostadsr%'' AND
                           sold_date BETWEEN ''2019-05-01'' AND ''2020-04-30'' AND
                           kommun IN (SELECT kommun
                                      FROM sold_properties
                                      WHERE property_type LIKE ''Bostadsr%'' AND
                                            sold_date BETWEEN ''2019-05-01'' AND ''2020-04-30''
                                      GROUP BY kommun
                                      HAVING count(*) > 2000
                                      )
                     GROUP BY 1,2
                    )
                    AS monthly_price
               ORDER BY 1,2',
               $$VALUES('Stockholms kommun'::text),
                       ('Göteborgs kommun'::text),
                       ('Malmö kommun'::text),
                       ('Uppsala kommun'::text),
                       ('Solna kommun'::text)
               $$                     
              )                       
AS ct("month" date,
      "Stockholm" numeric(5,1),
      "Göteborg" numeric(5,1),
      "Malmö" numeric(5,1),
      "Uppsala" numeric(5,1),
      "Solna" numeric(5,1)
    )
;

Results

month Stockholm Göteborg Malmö Uppsala Solna
2019-05-01 100.0 100.0 100.0 100.0 100.0
2019-06-01 102.4 97.1 98.2 95.7 97.9
2019-07-01 98.1 101.8 99.1 112.1 101.6
2019-08-01 102.7 105.1 104.1 112.4 103.3
2019-09-01 106.9 102.6 104.6 104.6 99.3
2019-10-01 105.9 103.1 101.0 102.0 100.3
2019-11-01 105.9 102.3 100.0 99.5 102.0
2019-12-01 108.4 100.7 93.9 108.3 103.8
2020-01-01 109.5 105.1 103.6 107.9 106.1
2020-02-01 110.5 103.5 109.5 105.7 105.8
2020-03-01 108.5 102.3 97.2 99.6 100.0
2020-04-01 102.2 103.3 100.1 98.7 100.0

And voilà, we have reached the desired result.

In the next post we’ll probably dive more into the geo-spatial information available and ways to visualize the housing price trends in QGIS.