almet/blog.notmyidea.org

mirror of https://github.com/almet/notmyidea.git synced 2025-04-28 19:42:37 +02:00

update

2024-04-23 23:05:38 +02:00

5.1 KiB

Raw Blame History

tags
umap, datasette, opendata

Mapping the concentration of not-for-profit organizations

Following a discussion with a friend, I realized the number of not-for-profit organizations could be a good indicator of activities in a city, potentially corellating it to well-being.

I wanted to create a choropleth map, so that different cities appear in different colors on the map, depending on their respective number of organisations.

Getting the data

The first thing to do was to retrieve the data. I needed two distincts datasets:

the cities and their shapes, to display them.
the number of organisations per city.

The first one was easy, thanks to France GeoJSON, I was able to download the geometries of the cities of my department.

For the number of organisations, I found some data on data.gouv.fr, but the comments made me explore the official journal dataset.

Turns out it's possible to issue requests on an API, without having to download everything, so I went with this pseudo SQL statement:

SELECT count(*) as nb_asso
WHERE departement_libelle=="Ille-et-Vilaine"
GROUP BY "codepostal_actuel"
ORDER BY -nb_asso

Which translates to this URL

Merging the data with jq

I had all the interesting bits, but in two unrelated json files.

jq, a tool to manipulate JSON data allowed me to merge these files together, using the --slurpfile argument:


jq --slurpfile orgs organizations.json '.features |= map(
   .properties |= (. as $props |
     ($orgs[0].results[] | select(.codepostal_actuel == $props.code) | . + $props)
   )
 )' cities.geojson > enriched-cities.geojson

I still find it a bit hard to read, but basically what this does is:

use the cities.geojson file as an input
references organizations.json as a second input, naming it orgs.
for each of the geojson properties, merge them with the data coming from orgs, matching on the postal code.

It's using map() and the |= syntax from jq to do this.

So, it works, and produces an enhanced version of the .geojson.

But it turns out that the data I got wasn't good enough.

Second take

It turns out this simple version is returning no results for the biggest city around (Rennes). There is something fishy.

Let's see the kind of data that's inside this .geojson file in more details:


jq '.features[].properties | select(.nom == "Rennes")' cities.geojson
{
  "code": "35238",
  "nom": "Rennes"
}```

It turns out this code is not the postal code, but the INSEE code, and these aren't used in the other dataset:

```bash
jq '.results[] | select(.codepostal_actuel == "35238")' organizations.json
# Returns nothing

The other dataset I had access to is exposing these codes, so I downloaded all the files, imported them in datasette and issued an SQL query on it:

SELECT COUNT(*) AS count, adrs_codeinsee, libcom
FROM base_rna
WHERE adrs_codepostal LIKE '35%'
GROUP BY adrs_codepostal, libcom;

The produced data looked like a typical datasette JSON result:

{
  // …
  "rows": [
    [
      2746,
      "35238",
      "RENNES"
    ],
    [
      14,
      "35116",
      "FRESNAIS"
    ],
    // other records here…
  ],
  // …
}

Here is the updated jq query, defaulting to 0 for the org count when nothing is found:


jq --slurpfile orgs organizations.json '
 .features |= map(
   .properties |= (
     . as $props |
     ((($orgs[0].rows[] | select(.[1] == $props.code))[0]) // 0) as $orgCount |
     . + { "orgCount": $orgCount }
   )
 )
' cities.geojson > enriched-cities.geojson

Visualisation

I've imported this as a choropleth layer in uMap, it looks like this:

Voir en plein écran

5.1 KiB Raw Blame History