5.1 KiB
tags |
---|
umap, datasette, opendata |
Mapping the concentration of not-for-profit organizations
Following a discussion with a friend, I realized the number of not-for-profit organizations could be a good indicator of activities in a city, potentially corellating it to well-being.
I wanted to create a choropleth map, so that different cities appear in different colors on the map, depending on their respective number of organisations.
Getting the data
The first thing to do was to retrieve the data. I needed two distincts datasets:
- the cities and their shapes, to display them.
- the number of organisations per city.
The first one was easy, thanks to France GeoJSON, I was able to download the geometries of the cities of my department.
For the number of organisations, I found some data on data.gouv.fr, but the comments made me explore the official journal dataset.
Turns out it's possible to issue requests on an API, without having to download everything, so I went with this pseudo SQL statement:
SELECT count(*) as nb_asso
WHERE departement_libelle=="Ille-et-Vilaine"
GROUP BY "codepostal_actuel"
ORDER BY -nb_asso
Which translates to this URL
Merging the data with jq
I had all the interesting bits, but in two unrelated json
files.
jq, a tool to manipulate JSON data allowed me
to merge these files together, using the --slurpfile
argument:
jq --slurpfile orgs organizations.json '.features |= map(
.properties |= (. as $props |
($orgs[0].results[] | select(.codepostal_actuel == $props.code) | . + $props)
)
)' cities.geojson > enriched-cities.geojson
I still find it a bit hard to read, but basically what this does is:
- use the
cities.geojson
file as an input - references
organizations.json
as a second input, naming itorgs
. - for each of the geojson properties, merge them with the data coming from orgs, matching on the postal code.
It's using map()
and the |=
syntax from jq to do this.
So, it works, and produces an enhanced version of the .geojson
.
But it turns out that the data I got wasn't good enough.
Second take
It turns out this simple version is returning no results for the biggest city around (Rennes). There is something fishy.
Let's see the kind of data that's inside this .geojson
file in more details:
jq '.features[].properties | select(.nom == "Rennes")' cities.geojson
{
"code": "35238",
"nom": "Rennes"
}```
It turns out this code is not the postal code, but the INSEE code, and these aren't used in the other dataset:
```bash
jq '.results[] | select(.codepostal_actuel == "35238")' organizations.json
# Returns nothing
The other dataset I had access to is exposing these codes, so I downloaded all the files, imported them in datasette and issued an SQL query on it:
SELECT COUNT(*) AS count, adrs_codeinsee, libcom
FROM base_rna
WHERE adrs_codepostal LIKE '35%'
GROUP BY adrs_codepostal, libcom;
The produced data looked like a typical datasette JSON result:
{
// …
"rows": [
[
2746,
"35238",
"RENNES"
],
[
14,
"35116",
"FRESNAIS"
],
// other records here…
],
// …
}
Here is the updated jq query, defaulting to 0 for the org count when nothing is found:
jq --slurpfile orgs organizations.json '
.features |= map(
.properties |= (
. as $props |
((($orgs[0].rows[] | select(.[1] == $props.code))[0]) // 0) as $orgCount |
. + { "orgCount": $orgCount }
)
)
' cities.geojson > enriched-cities.geojson
Visualisation
I've imported this as a choropleth layer in uMap, it looks like this: