Visualizing housing price trends with QGIS, part 1 - Data overview and some queries
So let’s have a look at the data! All property sales are stored as a row in a PostGIS enabled PostgreSQL database table. For every sale, there is information about e.g. the property type (villa, apartment, plot of land etc), the number of rooms, the sales date and price. And of course, since this will eventually be about geospatial analysis, the location is available as a PostGIS geometry datatype.
An example of a table entry of a house sale is shown below:
| Column | SQL table column name | Value |
|---|---|---|
| Address | address | Stoppgatan 3 |
| District | district | Tollarp |
| Municipality | kommun | Kristianstads kommun |
| Property type | property_type | Villa |
| Sales date | sold_date | 2020-04-27 |
| Broker | broker | Länsförsäkringar Fastighetsförmedling Kristianstad |
| Sales price | final_price | 1520000 |
| Ask price | ask_price | 1395000 |
| Price per area | price_per_area | 11692 |
| Number of rooms | num_rooms | 5 |
| Main living area | main_area | 130 |
| Misc. area | misc_area | 0 |
| Plot area | plot_area | 263 |
| Build year | build_year | 1976 |
| Maintentance cost | maintenance | 32692 |
| Latitude | lat | 55.9313614 |
| Longitude | lon | 13.99752036 |
| Geometry | way | 0101000020110F0000F777698C00BD374166BF31F1EEC75C41 |
In this part, we’ll get started by doing some basic traditional trend analysis using SQL. We want to answer the question:
What is the monthly price trend for apartments in the largest cities of Sweden during the last 12 months?
I’ll start with posting the desired end result, and then go into the details of the SQL queries to generate it.
Desired result
| month | Stockholm | Göteborg | Malmö | Uppsala | Solna |
|---|---|---|---|---|---|
| 2019-05-01 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
| 2019-06-01 | 102.4 | 97.1 | 98.2 | 95.7 | 97.9 |
| 2019-07-01 | 98.1 | 101.8 | 99.1 | 112.1 | 101.6 |
| 2019-08-01 | 102.7 | 105.1 | 104.1 | 112.4 | 103.3 |
| 2019-09-01 | 106.9 | 102.6 | 104.6 | 104.6 | 99.3 |
| 2019-10-01 | 105.9 | 103.1 | 101.0 | 102.0 | 100.3 |
| 2019-11-01 | 105.9 | 102.3 | 100.0 | 99.5 | 102.0 |
| 2019-12-01 | 108.4 | 100.7 | 93.9 | 108.3 | 103.8 |
| 2020-01-01 | 109.5 | 105.1 | 103.6 | 107.9 | 106.1 |
| 2020-02-01 | 110.5 | 103.5 | 109.5 | 105.7 | 105.8 |
| 2020-03-01 | 108.5 | 102.3 | 97.2 | 99.6 | 100.0 |
| 2020-04-01 | 102.2 | 103.3 | 100.1 | 98.7 | 100.0 |
This table shows that there is a considerable drop in the median price per area unit during March and/or April in Stockholm, Malmö, Uppsala and Solna. Interestingly, the same effect is not showing in Göteborg. However, this is based on the very blunt tool provided by the median. Examining the price distribution more carefully might reveal another story. (This may be the topic for another future post, so let’s leave it the way it is for now.)
The SQL query to achieve this table will be built up in three steps below. First, the median price for the big municipalities are calculated. Second, the normalization to the first month is done. And lastly, the resulting table is pivoted to create the desired output.
Query 1 - Calculate the median price per month for selected cities
SELECT date_trunc('month', sold_date)::date AS month,
kommun,
percentile_disc(0.5) WITHIN GROUP (ORDER BY price_per_area) AS median
FROM sold_properties
WHERE property_type LIKE 'Bostadsr%' AND
sold_date BETWEEN '2019-05-01' AND '2020-04-30' AND
kommun IN (SELECT kommun
FROM sold_properties
WHERE property_type LIKE 'Bostadsr%' AND
sold_date BETWEEN '2019-05-01' AND '2020-04-30'
GROUP BY kommun
HAVING count(*) > 2000
)
GROUP BY 1,2
;Results
| month | kommun | median |
|---|---|---|
| 2019-05-01 | Göteborgs kommun | 47561 |
| 2019-05-01 | Malmö kommun | 28947 |
| 2019-05-01 | Solna kommun | 55705 |
| 2019-05-01 | Stockholms kommun | 68478 |
| 2019-05-01 | Uppsala kommun | 35294 |
| 2019-06-01 | Göteborgs kommun | 46190 |
| 2019-06-01 | Malmö kommun | 28429 |
| 2019-06-01 | Solna kommun | 54509 |
| 2019-06-01 | Stockholms kommun | 70089 |
| 2019-06-01 | Uppsala kommun | 33769 |
| 2019-07-01 | Göteborgs kommun | 48438 |
| 2019-07-01 | Malmö kommun | 28687 |
| … | … | … |
Now, let’s normalize the median prices to the first value of the time period (May 2019) to represent the percentage change. This is done using the above query as a sub-query and scaling by the first value of a sorted list of price values for all municipalities.
Query 2 - Normalize to the first month in the period
SELECT month,
kommun,
100*median/first_value(median) OVER (PARTITION BY kommun
ORDER BY month) AS norm_price
FROM (SELECT date_trunc('month', sold_date)::date AS month,
kommun,
percentile_disc(0.5) WITHIN GROUP (ORDER BY price_per_area) AS median
FROM sold_properties
WHERE property_type LIKE 'Bostadsr%' AND
sold_date BETWEEN '2019-05-01' AND '2020-04-30' AND
kommun IN (SELECT kommun
FROM sold_properties
WHERE property_type LIKE 'Bostadsr%' AND
sold_date BETWEEN '2019-05-01' AND '2020-04-30'
GROUP BY kommun
HAVING count(*) > 2000
)
GROUP BY 1,2
)
AS monthly_price
;Results
| month | kommun | norm_price |
|---|---|---|
| 2019-05-01 | Göteborgs kommun | 100 |
| 2019-06-01 | Göteborgs kommun | 97.11738609364815 |
| 2019-07-01 | Göteborgs kommun | 101.84394777233447 |
| 2019-08-01 | Göteborgs kommun | 105.12815121633271 |
| 2019-09-01 | Göteborgs kommun | 102.56302432665419 |
| 2019-10-01 | Göteborgs kommun | 103.0928702087845 |
| 2019-11-01 | Göteborgs kommun | 102.27497319232144 |
| 2019-12-01 | Göteborgs kommun | 100.74851243666029 |
| 2020-01-01 | Göteborgs kommun | 105.12815121633271 |
| 2020-02-01 | Göteborgs kommun | 103.5344084438931 |
| 2020-03-01 | Göteborgs kommun | 102.25184499905384 |
| 2020-04-01 | Göteborgs kommun | 103.28420344399824 |
| 2019-05-01 | Malmö kommun | 100 |
| 2019-06-01 | Malmö kommun | 98.21052267937955 |
| 2019-07-01 | Malmö kommun | 99.10180675026773 |
| 2019-08-01 | Malmö kommun | 104.13859812761254 |
| … | … | … |
To get to the final result, we want to distribute the municipalities (kommuns) to separate columns, i.e. pivot on the kommun column. This functionality is available in the table_func extension of PostgreSQL by means of the crosstab function. The full query above (Query 2) needs to be passed as a string to this function, with the addition of the ORDER keyword to ensure that the rows are ordered correctly.
Query 3 - Make a pivot table
SELECT *
FROM crosstab('SELECT month,
kommun,
100*median/first_value(median) OVER (PARTITION BY kommun
ORDER BY month) AS norm_price
FROM (SELECT date_trunc(''month'', sold_date)::date AS month,
kommun,
percentile_disc(0.5) WITHIN GROUP (ORDER BY price_per_area) AS median
FROM sold_properties
WHERE property_type LIKE ''Bostadsr%'' AND
sold_date BETWEEN ''2019-05-01'' AND ''2020-04-30'' AND
kommun IN (SELECT kommun
FROM sold_properties
WHERE property_type LIKE ''Bostadsr%'' AND
sold_date BETWEEN ''2019-05-01'' AND ''2020-04-30''
GROUP BY kommun
HAVING count(*) > 2000
)
GROUP BY 1,2
)
AS monthly_price
ORDER BY 1,2',
$$VALUES('Stockholms kommun'::text),
('Göteborgs kommun'::text),
('Malmö kommun'::text),
('Uppsala kommun'::text),
('Solna kommun'::text)
$$
)
AS ct("month" date,
"Stockholm" numeric(5,1),
"Göteborg" numeric(5,1),
"Malmö" numeric(5,1),
"Uppsala" numeric(5,1),
"Solna" numeric(5,1)
)
;Results
| month | Stockholm | Göteborg | Malmö | Uppsala | Solna |
|---|---|---|---|---|---|
| 2019-05-01 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
| 2019-06-01 | 102.4 | 97.1 | 98.2 | 95.7 | 97.9 |
| 2019-07-01 | 98.1 | 101.8 | 99.1 | 112.1 | 101.6 |
| 2019-08-01 | 102.7 | 105.1 | 104.1 | 112.4 | 103.3 |
| 2019-09-01 | 106.9 | 102.6 | 104.6 | 104.6 | 99.3 |
| 2019-10-01 | 105.9 | 103.1 | 101.0 | 102.0 | 100.3 |
| 2019-11-01 | 105.9 | 102.3 | 100.0 | 99.5 | 102.0 |
| 2019-12-01 | 108.4 | 100.7 | 93.9 | 108.3 | 103.8 |
| 2020-01-01 | 109.5 | 105.1 | 103.6 | 107.9 | 106.1 |
| 2020-02-01 | 110.5 | 103.5 | 109.5 | 105.7 | 105.8 |
| 2020-03-01 | 108.5 | 102.3 | 97.2 | 99.6 | 100.0 |
| 2020-04-01 | 102.2 | 103.3 | 100.1 | 98.7 | 100.0 |
And voilà, we have reached the desired result.
In the next post we’ll probably dive more into the geo-spatial information available and ways to visualize the housing price trends in QGIS.