mirror of
https://framagit.org/framasoft/framaspace/argos.git
synced 2025-04-28 18:02:41 +02:00
126 lines
No EOL
3.6 KiB
Markdown
126 lines
No EOL
3.6 KiB
Markdown
# Argos
|
||
|
||
🚧 This is mainly a work in progress for now. It's not working, don't try to install it ! 🚧
|
||
|
||
Argos is an HTTP monitoring service. It's meant to be simple to configure and simple to use.
|
||
|
||
Todo:
|
||
|
||
- [ ] Use Postgresql as a database
|
||
- [ ] Use background tasks for alerting
|
||
- [ ] Add a command to generate new authentication tokens
|
||
- [ ] Task for database cleanup (to run periodically)
|
||
- [ ] Handles multiple alerting backends (email, sms, gotify)
|
||
- [ ] Expose a simple read-only website.
|
||
- [ ] Add a way to specify the severity of the alerts in the config
|
||
- [ ] Do not send "expected" and "got" values in case check-status and body-contains suceeded
|
||
|
||
Features :
|
||
|
||
- [x] Uses `.yaml` files for configuration ;
|
||
- [x] Read the configuration file and convert it to tasks ;
|
||
- [x] Store tasks in a database ;
|
||
- [x] Multiple paths per websites can be tested ;
|
||
- [x] Handle jobs failures on the clients
|
||
- [x] Exposes an HTTP API that can be consumed by other systems ;
|
||
- [x] Checks can be distributed on the network thanks to a job queue ;
|
||
|
||
Implemented checks :
|
||
|
||
- [x] Returned status code matches what you expect ;
|
||
- [x] Returned body matches what you expect ;
|
||
- [x] SSL certificate expires in more than X days ;
|
||
|
||
## How to run ?
|
||
|
||
We're using [pipenv](https://pipenv.pypa.io/) to manage the virtual environment and the dependencies.
|
||
You can install it with [pipx](https://pypa.github.io/pipx/):
|
||
|
||
```bash
|
||
pipx install pipenv
|
||
```
|
||
|
||
And then, checkout this repository and sync its pipenv
|
||
|
||
```bash
|
||
pipenv sync
|
||
```
|
||
|
||
Once all the dependencies are in place, here is how to run the server:
|
||
|
||
```bash
|
||
pipenv run argos server
|
||
```
|
||
|
||
The server will read a `config.yaml` file at startup, and will populate the tasks specified in it. See the configuration section below for more information on how to configure the checks you want to run.
|
||
|
||
And here is how to run the agent:
|
||
|
||
```bash
|
||
pipenv run argos agent --server http://localhost:8000 --auth "<auth-token>"
|
||
```
|
||
|
||
## Configuration
|
||
|
||
Here is a simple configuration file:
|
||
|
||
```yaml
|
||
general:
|
||
frequency: 4h # Run checks every 4 hours.
|
||
alerts:
|
||
error:
|
||
- local
|
||
warning:
|
||
- local
|
||
alert:
|
||
- local
|
||
service:
|
||
port: 8888
|
||
# Can be generated using `openssl rand -base64 32`.
|
||
secrets:
|
||
- "O4kt8Max9/k0EmHaEJ0CGGYbBNFmK8kOZNIoUk3Kjwc"
|
||
- "x1T1VZR51pxrv5pQUyzooMG4pMUvHNMhA5y/3cUsYVs="
|
||
|
||
ssl:
|
||
thresholds:
|
||
critical: "1d"
|
||
warning: "10d"
|
||
|
||
websites:
|
||
- domain: "https://blog.notmyidea.org"
|
||
paths:
|
||
- path: "/"
|
||
checks:
|
||
- status-is: 200
|
||
- body-contains: "Alexis"
|
||
- ssl-certificate-expiration: "on-check"
|
||
- path: "/foo"
|
||
checks:
|
||
- status-is: 400
|
||
```
|
||
|
||
## Development notes
|
||
|
||
### On service start.
|
||
|
||
1. Read the job definitions file and populate the database.
|
||
2. From the job definition, create a list of tasks to execute.
|
||
3. From time to time (?) clean the db.
|
||
|
||
### On configuration changes :
|
||
- Find and tombstone the JobDefinitions that are not useful anymore.
|
||
- Cascade delete the child tasks that are planned. Tombstone them as wel.
|
||
|
||
### On worker demand :
|
||
- Find the tasks for which :
|
||
- last_check is not defined
|
||
- OR last_check + max_timedelta > datetime.now()
|
||
- AND selected_by not defined.
|
||
- Mark these tasks as selected by the current worker, on the current date.
|
||
|
||
### From time to time (cleanup):
|
||
- Check for stalled tasks (datetime.now() - selected_at) > MAX_WORKER_TIME. Remove the lock.
|
||
|
||
### On the worker side
|
||
1. Hey, I'm XX, give me some work.
|
||
2. <Service answers> OK, this is done, here are the results for Task<id>: response. |