# Argos Argos is an HTTP monitoring service. It allows you to define a list of websites to monitor, and a list of checks to run on these websites. It will then run these checks periodically, and alert you if something goes wrong. Todo: - [ ] Cleandb should keep max number of results by task - [ ] Do not return empty list on / when no results from agents. - [ ] Last seen agents - [ ] donner un aperçu rapide de l’état de la supervision. - [ ] Rename error in unexpected error - [ ] Use background tasks for alerting - [ ] Delete outdated tasks from config - [ ] Implement alerting tasks - [ ] Handles multiple alerting backends (email, sms, gotify) - [ ] Un flag de configuration permet d’ajouter automatiquement un job de vérification de redirection 301 de la version HTTP vers HTTPS - [ ] add an "unknown" severity for check errors - [ ] Add a way to specify the severity of the alerts in the config - [ ] Add a command to generate new authentication token Implemented checks : - [x] Returned status code matches what you expect ; - [x] Returned body matches what you expect ; - [x] SSL certificate expires in more than X days ; ## Development notes ### On service start. 1. Read the job definitions file and populate the database. 2. From the job definition, create a list of tasks to execute. 3. From time to time (?) clean the db. ### On configuration changes : - Find and tombstone the JobDefinitions that are not useful anymore. - Cascade delete the child tasks that are planned. Tombstone them as wel. ### On worker demand : - Find the tasks for which : - last_check is not defined - OR last_check + max_timedelta > datetime.now() - AND selected_by not defined. - Mark these tasks as selected by the current worker, on the current date. ### From time to time (cleanup): - Check for stalled tasks (datetime.now() - selected_at) > MAX_WORKER_TIME. Remove the lock. ### On the worker side 1. Hey, I'm XX, give me some work. 2. OK, this is done, here are the results for Task: response.