# Argos Argos is an HTTP monitoring service. It allows you to define a list of websites to monitor, and a list of checks to run on these websites. It will then run these checks periodically, and alert you if something goes wrong. Todo: - [x] Use Postgresql as a database - [x] Expose a simple read-only website. - [ ] Use background tasks for alerting - [ ] Add a command to generate new authentication tokens - [ ] Task for database cleanup (to run periodically) - [ ] Handles multiple alerting backends (email, sms, gotify) - [ ] Add a way to specify the severity of the alerts in the config - [ ] Do not send "expected" and "got" values in case check-status and body-contains suceeded Features : - [x] Uses `.yaml` files for configuration ; - [x] Read the configuration file and convert it to tasks ; - [x] Store tasks in a database ; - [x] Multiple paths per websites can be tested ; - [x] Handle jobs failures on the clients - [x] Exposes an HTTP API that can be consumed by other systems ; - [x] Checks can be distributed on the network thanks to a job queue ; Implemented checks : - [x] Returned status code matches what you expect ; - [x] Returned body matches what you expect ; - [x] SSL certificate expires in more than X days ; ## How to run ? To install it, create a virtualenv and install the dependencies: ```bash python3 -m venv venv source venv/bin/activate pip install -e . ``` Once all the dependencies are in place, here is how to run the server: ```bash argos server ``` The server will read a `config.yaml` file at startup, and will populate the tasks specified in it. See the configuration section below for more information on how to configure the checks you want to run. And here is how to run the agent: ```bash argos agent http://localhost:8000 "" ``` ## Configuration Here is a simple configuration file: ```yaml general: frequency: "1m" # Run checks every minute. alerts: error: - local warning: - local alert: - local service: secrets: # Secrets can be generated using `openssl rand -base64 32`. - "O4kt8Max9/k0EmHaEJ0CGGYbBNFmK8kOZNIoUk3Kjwc" - "x1T1VZR51pxrv5pQUyzooMG4pMUvHNMhA5y/3cUsYVs=" ssl: thresholds: - "1d": critical - "5d": warning # It's also possible to define the checks in another file # with the include syntax: # # websites: !include websites.yaml # websites: - domain: "https://mypads.framapad.org" paths: - path: "/mypads/" checks: - status-is: "200" - body-contains: '
' - ssl-certificate-expiration: "on-check" - path: "/admin/" checks: - status-is: "401" ``` ## Development notes ### On service start. 1. Read the job definitions file and populate the database. 2. From the job definition, create a list of tasks to execute. 3. From time to time (?) clean the db. ### On configuration changes : - Find and tombstone the JobDefinitions that are not useful anymore. - Cascade delete the child tasks that are planned. Tombstone them as wel. ### On worker demand : - Find the tasks for which : - last_check is not defined - OR last_check + max_timedelta > datetime.now() - AND selected_by not defined. - Mark these tasks as selected by the current worker, on the current date. ### From time to time (cleanup): - Check for stalled tasks (datetime.now() - selected_at) > MAX_WORKER_TIME. Remove the lock. ### On the worker side 1. Hey, I'm XX, give me some work. 2. OK, this is done, here are the results for Task: response.