argos/README.md

54 lines
No EOL
2.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Argos
Argos is an HTTP monitoring service. It allows you to define a list of websites to monitor, and a list of checks to run on these websites. It will then run these checks periodically, and alert you if something goes wrong.
Todo:
- [ ] Retrying: attempt 1413 ended with: <Future at 0x104f39390 state=finished raised RuntimeError> Cannot reopen a client instance, once it has been closed.
- [ ] Cleandb should keep max number of results by task
- [ ] Do not return empty list on / when no results from agents.
- [ ] Last seen agents
- [ ] donner un aperçu rapide de létat de la supervision.
- [ ] Rename error in unexpected error
- [ ] Use background tasks for alerting
- [ ] Delete outdated tasks from config
- [ ] Implement alerting tasks
- [ ] Handles multiple alerting backends (email, sms, gotify)
- [ ] Un flag de configuration permet dajouter automatiquement un job de vérification de redirection 301 de la version HTTP vers HTTPS
- [ ] add an "unknown" severity for check errors
- [ ] Add a way to specify the severity of the alerts in the config
- [ ] Add a command to generate new authentication token
Implemented checks :
- [x] Returned status code matches what you expect ;
- [x] Returned body matches what you expect ;
- [x] SSL certificate expires in more than X days ;
## Development notes
### On service start.
1. Read the job definitions file and populate the database.
2. From the job definition, create a list of tasks to execute.
3. From time to time (?) clean the db.
### On configuration changes :
- Find and tombstone the JobDefinitions that are not useful anymore.
- Cascade delete the child tasks that are planned. Tombstone them as wel.
### On worker demand :
- Find the tasks for which :
- last_check is not defined
- OR last_check + max_timedelta > datetime.now()
- AND selected_by not defined.
- Mark these tasks as selected by the current worker, on the current date.
### From time to time (cleanup):
- Check for stalled tasks (datetime.now() - selected_at) > MAX_WORKER_TIME. Remove the lock.
### On the worker side
1. Hey, I'm XX, give me some work.
2. <Service answers> OK, this is done, here are the results for Task<id>: response.