Monitoring tool for Framaspace. [Online documentation](https://argos-monitoring.framasoft.org/)
Find a file
Alexis Métaireau 8ac5cdb529 Updated models.py and queries.py
- Removed the `Definition` class and added the `Task` class. It contains all information needed to run the jobs on the workers.
- Added the `Result` class. It stores the results returned by workers.
- In queries.py, updated the `update_from_config` function. Now it checks for the existence of tasks with the same URL, check, and expected result before adding new ones.
2023-10-02 13:03:06 +02:00
argos Updated models.py and queries.py 2023-10-02 13:03:06 +02:00
config.yaml Start working with FastAPI 2023-10-02 12:15:57 +02:00
log_conf.yaml Start working with FastAPI 2023-10-02 12:15:57 +02:00
Pipfile Start working with FastAPI 2023-10-02 12:15:57 +02:00
Pipfile.lock Start working with FastAPI 2023-10-02 12:15:57 +02:00
README.md Updated models.py and queries.py 2023-10-02 13:03:06 +02:00

Argos

🚧 This is mainly a work in progress for now. It's not working, don't try to install it ! 🚧

Argos is an HTTP monitoring service. It's meant to be simple to configure and simple to use.

Features :

  • Uses .yaml files for configuration ;
  • Read the configuration file and convert it to tasks ;
  • Store tasks in a database ;
  • Checks can be distributed on the network thanks to a job queue ;
  • Multiple paths per websites can be tested ;
  • Handles multiple alerting backends (email, sms, gotify) ;
  • Exposes an HTTP API that can be consumed by other systems ;
  • Exposes a simple read-only website.

Implemented checks :

  • Returned status code matches what you expect ;
  • Returned body matches what you expect ;
  • SSL certificate expires in more than X days ;

Development notes

On service start.

  1. Read the job definitions file and populate the database.
  2. From the job definition, create a list of tasks to execute.
  3. From time to time (?) clean the db.

On configuration changes :

  • Find and tombstone the JobDefinitions that are not useful anymore.
  • Cascade delete the child tasks that are planned. Tombstone them as wel.

On worker demand :

  • Find the tasks for which :
    • last_check is not defined
    • OR last_check + max_timedelta > datetime.now()
    • AND selected_by not defined.
  • Mark these tasks as selected by the current worker, on the current date.

From time to time:

  • Check for stalled tasks (datetime.now() - selected_at) > MAX_WORKER_TIME. Remove the lock.

On the worker side

Hey, I'm XX, give me some work. OK, this is done, here are the results for Task: response.