Compare commits

...

67 commits
0.5.0 ... main

Author SHA1 Message Date
Luc Didry
9389e3a005
🏷 — Bump version (0.9.0) 2025-02-18 17:05:55 +01:00
Luc Didry
159a6e2427
🔀 Merge remote-tracking branch 'origin/develop' 2025-02-18 17:05:25 +01:00
Luc Didry
211ac32028
🐛 — Fix worker timeout for old results cleaning in recurring tasks (fix #84)
💥 Old results are now removed by their age, not based on their number.

💥 Warning: `max_results` setting has been replaced by `max_results_age`, which is a duration.
Use `argos server generate-config > /etc/argos/config.yaml-dist` to generate
a new example configuration file.
2025-02-18 17:04:26 +01:00
Luc Didry
32f2518294
🏷 — Bump version (0.8.2) 2025-02-18 14:58:35 +01:00
Luc Didry
38cc06e972
🐛 — Fix recurring tasks with gunicorn 2025-02-18 14:57:49 +01:00
Luc Didry
4b78919937
🏷 — Bump version (0.8.1) 2025-02-18 14:22:41 +01:00
Luc Didry
d8f30ebccd
🐛 — Fix todo enum in jobs table 2025-02-18 14:22:12 +01:00
Luc Didry
09674f73ef
🏷 — Bump version (0.8.0) 2025-02-18 13:50:36 +01:00
Luc Didry
c63093bb2f
🔀 Merge remote-tracking branch 'origin/develop' 2025-02-18 13:48:47 +01:00
Luc Didry
657624ed35
📝 — Add enum doc for developers 2025-02-18 13:47:27 +01:00
Luc Didry
471c1eae91
📜 — Add breaking changes to CHANGELOG 2025-02-18 13:43:05 +01:00
Luc Didry
c3708af32a
🐛 — Better httpx.RequestError handling (fix #83) 2025-02-18 13:36:40 +01:00
Luc Didry
23fea9fffa
🐛 — Automatically reconnect to LDAP if unreachable (fix #81) 2025-02-18 11:28:05 +01:00
Luc Didry
a48c7b74e6
— Reload configuration asynchronously (fix #79) 2025-02-17 17:26:56 +01:00
Luc Didry
8d82f7f9d6
— No need cron tasks for agents watching (fix #76) 2025-02-17 15:35:13 +01:00
Luc Didry
fd0c68cd4c
— Add missing dependency for fastapi-utils 2025-02-17 11:03:01 +01:00
Luc Didry
c98cd9c017
— No need cron tasks for DB cleaning anymore (fix #74 and #75) 2025-02-17 10:46:01 +01:00
Luc Didry
73e7a8f414
📝 — Document how to add data to requests (fix #77) 2025-02-12 16:25:10 +01:00
Luc Didry
db54dd2cdd
— Allow to customize agent User-Agent header (fix #78) 2025-02-12 16:10:09 +01:00
Luc Didry
1b484da27a
🏷 — Bump version (0.7.4) 2025-02-12 15:33:36 +01:00
Luc Didry
07f87a0f7d
🔀 Merge remote-tracking branch 'origin/develop' 2025-02-12 15:33:04 +01:00
Luc Didry
60f3079140
🩹 — Add missing enum removal 2025-02-12 15:32:24 +01:00
Luc Didry
ca709dca62
🔀 Merge remote-tracking branch 'dryusdan/dev/fix-method-enum' into develop 2025-02-12 15:02:19 +01:00
Dryusdan
0f099b9df4 Fix method enum in tasks table 2025-01-29 11:37:09 +01:00
Luc Didry
5abdd8414d
🏷 — Bump version (0.7.3) 2025-01-26 07:59:10 +01:00
Luc Didry
06868cdd74
🔀 Merge remote-tracking branch 'origin/develop' 2025-01-26 07:58:34 +01:00
Luc Didry
2b82f7c8f2
🐛 — Fix bug in retry_before_notification logic when success 2025-01-26 07:54:31 +01:00
Luc Didry
797a60a85c
🏷 — Bump version (0.7.2) 2025-01-24 14:07:50 +01:00
Luc Didry
4c4d3b69b2
🔀 Merge remote-tracking branch 'origin/develop' 2025-01-24 14:07:16 +01:00
Luc Didry
c922894567
🐛 — Fix bug in retry_before_notification logic 2025-01-24 14:06:36 +01:00
Luc Didry
8652539086
🏷 — Bump version (0.7.1) 2025-01-15 16:18:16 +01:00
Luc Didry
4f3dfd994b
🔀 Merge remote-tracking branch 'origin/develop' 2025-01-15 16:16:14 +01:00
Luc Didry
28ec85fed3
📝 — Improve release documentation 2025-01-15 16:15:16 +01:00
Luc Didry
586660c02a
🩹 — Check before adding/removing ip_version_enum 2025-01-15 09:14:55 +01:00
Luc Didry
64f8241e74
🩹 — Avoid warning from MySQL only alembic instructions 2025-01-14 17:06:59 +01:00
Luc Didry
3d209fed22
🏷 — Bump version (0.7.0) 2025-01-14 16:41:09 +01:00
Luc Didry
acd90133bd
🔀 Merge remote-tracking branch 'origin/develop' 2025-01-14 16:39:53 +01:00
Luc Didry
be90aa095a
🐛 — Fix strange and buggy behavior 2025-01-14 16:38:43 +01:00
Luc Didry
06f8310505
🗃 — Use bigint type for results id column in PostgreSQL (fix #73) 2025-01-14 16:38:43 +01:00
Luc Didry
fe89d62e88
🐛🗃 — Fix enum migration on PostgreSQL 2025-01-14 16:38:43 +01:00
Luc Didry
1e7672abca
🚸 — Add a long expiration date on auto-refresh cookies 2025-01-14 16:38:43 +01:00
Luc Didry
2ef999fa63
— Allow to specify form data and headers for checks (fix #70) 2025-01-14 16:38:43 +01:00
Luc Didry
9c8be94c20
🐛 — Fix bug when changing IP version not removing tasks (fix #72) 2025-01-14 16:38:38 +01:00
Luc Didry
311d86d130
— Ability to delay notification after X failures (fix #71) 2024-12-09 14:08:55 +01:00
Luc Didry
e0edb50e12
— Mutualize check requests (fix #68) 2024-12-04 15:04:06 +01:00
Luc Didry
ea23ea7c1f
— IPv4/IPv6 choice for checks, and choice for a dual-stack check (fix #69) 2024-12-02 15:24:54 +01:00
Luc Didry
a1600cb08e
🏷 — Bump version (0.6.1) 2024-11-28 16:59:58 +01:00
Luc Didry
0da1f4986e
🔀 Merge remote-tracking branch 'origin/develop' 2024-11-28 16:59:10 +01:00
Luc Didry
1853b4fead
💚 — Fix tests in CI 2024-11-28 16:51:28 +01:00
Luc Didry
bb4db3ca84
🐛 - Fix domain status selector’s bug on page refresh 2024-11-28 16:16:53 +01:00
Luc Didry
7d21d8d271
🐛 - Fix database migrations without default values 2024-11-28 16:13:30 +01:00
Luc Didry
868e91b866
🔨 — Update hatch 2024-11-28 15:51:32 +01:00
Luc Didry
ffd24173e5
🏷 — Bump version (0.6.0) 2024-11-28 15:42:39 +01:00
Luc Didry
594fbd6881
🔀 Merge remote-tracking branch 'origin/develop' 2024-11-28 15:41:37 +01:00
Luc Didry
04e33a8d24
🛂 — Allow to use a LDAP server for authentication (fix #64) 2024-11-28 15:37:07 +01:00
Luc Didry
da221b856b
🛂 — Allow partial or total anonymous access to web interface (fix #63) 2024-11-28 11:48:08 +01:00
Luc Didry
841f8638de
📝 — Fix doc headings 2024-11-28 11:39:09 +01:00
Luc Didry
5b999184d0
— Add a setting to set a reschedule delay if check failed (fix #67)
BREAKING CHANGE: `mo` is no longer accepted for declaring a duration in month in the configuration
You need to use `M`, `month` or `months`

Bonus:  - Allow to choose a frequency smaller than a minute
2024-11-27 16:26:56 +01:00
Luc Didry
0563cf185a
— Add "Remember me" checkbox on login (#65) 2024-11-27 11:00:40 +01:00
Luc Didry
91a9b27106
💄 — Filter form on domains list (fix #66) 2024-11-27 09:55:24 +01:00
Luc Didry
4117f9f628
♻ — Refactor some agent code 2024-11-26 16:52:20 +01:00
Luc Didry
8ac2519398
— The HTTP method used by checks is now configurable 2024-11-26 15:59:19 +01:00
Luc Didry
d3766a79c6
— Retry check right after a httpx.ReadError 2024-11-26 14:32:35 +01:00
Luc Didry
759fa05417
📝 — Avoid scrolling on a documented command 2024-11-25 17:02:09 +01:00
Luc Didry
a31c12e037
— Fix not-OK domains display if javascript is disabled 2024-11-14 09:41:59 +01:00
Luc Didry
04bbe21a66
💄 — Show only not-OK domains by default in domains list, to reduce the load on browser 2024-11-14 08:54:19 +01:00
Luc Didry
fdc219ba5c
🩹 — Fix CHANGELOG typo 2024-11-14 08:40:53 +01:00
48 changed files with 1897 additions and 615 deletions

View file

@ -1,3 +1,4 @@
---
image: python:3.11
stages:
@ -18,6 +19,9 @@ default:
install:
stage: install
before_script:
- apt-get update
- apt-get install -y build-essential libldap-dev libsasl2-dev
script:
- make venv
- make develop
@ -64,7 +68,7 @@ release_job:
- if: $CI_COMMIT_TAG
script:
- sed -n '/^## '$CI_COMMIT_TAG'/,/^#/p' CHANGELOG.md | sed -e '/^\(#\|$\|Date\)/d' > release.md
release: # See https://docs.gitlab.com/ee/ci/yaml/#release for available properties
release: # See https://docs.gitlab.com/ee/ci/yaml/#release for available properties
tag_name: '$CI_COMMIT_TAG'
description: './release.md'
assets:

View file

@ -2,6 +2,115 @@
## [Unreleased]
## 0.9.0
Date: 2025-02-18
- 🐛 — Fix worker timeout for old results cleaning in recurring tasks (#84)
💥 Old results are now removed by their age, not based on their number.
💥 Warning: `max_results` setting has been replaced by `max_results_age`, which is a duration.
Use `argos server generate-config > /etc/argos/config.yaml-dist` to generate
a new example configuration file.
## 0.8.2
Date: 2025-02-18
- 🐛 — Fix recurring tasks with gunicorn
## 0.8.1
Date: 2025-02-18
- 🐛 — Fix todo enum in jobs table
## 0.8.0
Date: 2025-02-18
- ✨ — Allow to customize agent User-Agent header (#78)
- 📝 — Document how to add data to requests (#77)
- ✨ — No need cron tasks for DB cleaning anymore (#74 and #75)
- ✨ — No need cron tasks for agents watching (#76)
- ✨ — Reload configuration asynchronously (#79)
- 🐛 — Automatically reconnect to LDAP if unreachable (#81)
- 🐛 — Better httpx.RequestError handling (#83)
💥 Warning: there is new settings to add to your configuration file.
Use `argos server generate-config > /etc/argos/config.yaml-dist` to generate
a new example configuration file.
💥 You dont need cron tasks anymore!
Remove your old cron tasks as they will now do nothing but generating errors.
NB: You may want to add `--enqueue` to `reload-config` command in your systemd file.
## 0.7.4
Date: 2025-02-12
- 🐛 — Fix method enum in tasks table (thx to Dryusdan)
## 0.7.3
Date: 2025-01-26
- 🐛 — Fix bug in retry_before_notification logic when success
## 0.7.2
Date: 2025-01-24
- 🐛 — Fix bug in retry_before_notification logic
## 0.7.1
Date: 2025-01-15
- 🩹 — Avoid warning from MySQL only alembic instructions
- 🩹 — Check before adding/removing ip_version_enum
- 📝 — Improve release documentation
## 0.7.0
Date: 2025-01-14
- ✨ — IPv4/IPv6 choice for checks, and choice for a dual-stack check (#69)
- ⚡ — Mutualize check requests (#68)
- ✨ — Ability to delay notification after X failures (#71)
- 🐛 — Fix bug when changing IP version not removing tasks (#72)
- ✨ — Allow to specify form data and headers for checks (#70)
- 🚸 — Add a long expiration date on auto-refresh cookies
- 🗃️ — Use bigint type for results id column in PostgreSQL (#73)
## 0.6.1
Date: 2024-11-28
- 🐛 - Fix database migrations without default values
- 🐛 - Fix domain status selectors bug on page refresh
## 0.6.0
Date: 2024-11-28
- 💄 — Show only not-OK domains by default in domains list, to reduce the load on browser
- ♿️ — Fix not-OK domains display if javascript is disabled
- ✨ — Retry check right after a httpx.ReadError
- ✨ — The HTTP method used by checks is now configurable
- ♻️ — Refactor some agent code
- 💄 — Filter form on domains list (#66)
- ✨ — Add "Remember me" checkbox on login (#65)
- ✨ — Add a setting to set a reschedule delay if check failed (#67)
BREAKING CHANGE: `mo` is no longer accepted for declaring a duration in month in the configuration
You need to use `M`, `month` or `months`
- ✨ - Allow to choose a frequency smaller than a minute
- ✨🛂 — Allow partial or total anonymous access to web interface (#63)
- ✨🛂 — Allow to use a LDAP server for authentication (#64)
## 0.5.0
Date: 2024-09-26
@ -68,7 +177,7 @@ Date: 2024-06-24
- 💄📯 — Improve notifications and result(s) pages
- 🔊 — Add level of log before the log message
— 🔊 — Add a warning messages in the logs if there is no tasks in database. (fix #41)
- 🔊 — Add a warning message in the logs if there is no tasks in database. (fix #41)
- ✨ — Add command to generate example configuration (fix #38)
- 📝 — Improve documentation
- ✨ — Add command to warn if its been long since last viewing an agent (fix #49)

View file

@ -10,7 +10,7 @@ NC=\033[0m # No Color
venv: ## Create the venv
python3 -m venv venv
develop: venv ## Install the dev dependencies
venv/bin/pip install -e ".[dev,docs]"
venv/bin/pip install -e ".[dev,docs,ldap]"
docs: cog ## Build the docs
venv/bin/sphinx-build docs public
if [ ! -e "public/mermaid.min.js" ]; then curl -sL $$(grep mermaid.min.js public/search.html | cut -f 2 -d '"') --output public/mermaid.min.js; fi

View file

@ -1 +1 @@
VERSION = "0.5.0"
VERSION = "0.9.0"

View file

@ -6,6 +6,8 @@ import asyncio
import json
import logging
import socket
from hashlib import md5
from time import sleep
from typing import List
import httpx
@ -32,26 +34,47 @@ def log_failure(retry_state):
)
class ArgosAgent:
class ArgosAgent: # pylint: disable-msg=too-many-instance-attributes
"""The Argos agent is responsible for running the checks and reporting the results."""
def __init__(self, server: str, auth: str, max_tasks: int, wait_time: int):
def __init__( # pylint: disable-msg=too-many-positional-arguments
self, server: str, auth: str, max_tasks: int, wait_time: int, user_agent: str
):
self.server = server
self.max_tasks = max_tasks
self.wait_time = wait_time
self.auth = auth
self._http_client = None
if user_agent == "":
self.ua = user_agent
else:
self.ua = f" - {user_agent}"
self._http_client: httpx.AsyncClient | None = None
self._http_client_v4: httpx.AsyncClient | None = None
self._http_client_v6: httpx.AsyncClient | None = None
self._res_cache: dict[str, httpx.Response] = {}
self.agent_id = socket.gethostname()
@retry(after=log_failure, wait=wait_random(min=1, max=2))
async def run(self):
headers = {
auth_header = {
"Authorization": f"Bearer {self.auth}",
"User-Agent": f"Argos Panoptes {VERSION} "
"(about: https://argos-monitoring.framasoft.org/)",
"User-Agent": f"Argos Panoptes agent {VERSION}{self.ua}",
}
self._http_client = httpx.AsyncClient(headers=headers)
self._http_client = httpx.AsyncClient(headers=auth_header)
ua_header = {
"User-Agent": f"Argos Panoptes {VERSION} "
f"(about: https://argos-monitoring.framasoft.org/){self.ua}",
}
self._http_client_v4 = httpx.AsyncClient(
headers=ua_header,
transport=httpx.AsyncHTTPTransport(local_address="0.0.0.0"),
)
self._http_client_v6 = httpx.AsyncClient(
headers=ua_header, transport=httpx.AsyncHTTPTransport(local_address="::")
)
logger.info("Running agent against %s", self.server)
async with self._http_client:
while "forever":
@ -60,20 +83,90 @@ class ArgosAgent:
logger.info("Waiting %i seconds before next retry", self.wait_time)
await asyncio.sleep(self.wait_time)
async def _do_request(self, group: str, details: dict):
logger.debug("_do_request for group %s", group)
headers = {}
if details["request_data"] is not None:
request_data = json.loads(details["request_data"])
if request_data["headers"] is not None:
headers = request_data["headers"]
if details["ip_version"] == "4":
http_client = self._http_client_v4
else:
http_client = self._http_client_v6
try:
if details["request_data"] is None or request_data["data"] is None:
response = await http_client.request( # type: ignore[union-attr]
method=details["method"],
url=details["url"],
headers=headers,
timeout=60,
)
elif request_data["json"]:
response = await http_client.request( # type: ignore[union-attr]
method=details["method"],
url=details["url"],
headers=headers,
json=request_data["data"],
timeout=60,
)
else:
response = await http_client.request( # type: ignore[union-attr]
method=details["method"],
url=details["url"],
headers=headers,
data=request_data["data"],
timeout=60,
)
except httpx.ReadError:
sleep(1)
logger.warning("httpx.ReadError for group %s, re-emit request", group)
if details["request_data"] is None or request_data["data"] is None:
response = await http_client.request( # type: ignore[union-attr]
method=details["method"], url=details["url"], timeout=60
)
elif request_data["json"]:
response = await http_client.request( # type: ignore[union-attr]
method=details["method"],
url=details["url"],
json=request_data["data"],
timeout=60,
)
else:
response = await http_client.request( # type: ignore[union-attr]
method=details["method"],
url=details["url"],
data=request_data["data"],
timeout=60,
)
except httpx.RequestError as err:
logger.warning("httpx.RequestError for group %s", group)
response = err
self._res_cache[group] = response
async def _complete_task(self, _task: dict) -> AgentResult:
try:
task = Task(**_task)
check_class = get_registered_check(task.check)
check = check_class(self._http_client, task)
result = await check.run()
status = result.status
context = result.context
check_class = get_registered_check(task.check)
check = check_class(task)
response = self._res_cache[task.task_group]
if isinstance(response, httpx.Response):
result = await check.run(response)
status = result.status
context = result.context
else:
status = "failure"
context = SerializableException.from_exception(response)
except Exception as err: # pylint: disable=broad-except
status = "error"
context = SerializableException.from_exception(err)
msg = f"An exception occured when running {_task}. {err.__class__.__name__} : {err}"
logger.error(msg)
return AgentResult(task_id=task.id, status=status, context=context)
async def _get_and_complete_tasks(self):
@ -84,12 +177,45 @@ class ArgosAgent:
)
if response.status_code == httpx.codes.OK:
# XXX Maybe we want to group the tests by URL ? (to issue one request per URL)
data = response.json()
logger.info("Received %i tasks from the server", len(data))
req_groups = {}
_tasks = []
for _task in data:
task = Task(**_task)
url = task.url
group = task.task_group
if task.check == "http-to-https":
data = task.request_data
if data is None:
data = ""
url = str(httpx.URL(task.url).copy_with(scheme="http"))
group = (
f"{task.method}-{task.ip_version}-{url}-"
f"{md5(data.encode()).hexdigest()}"
)
_task["task_group"] = group
req_groups[group] = {
"url": url,
"ip_version": task.ip_version,
"method": task.method,
"request_data": task.request_data,
}
_tasks.append(_task)
requests = []
for group, details in req_groups.items():
requests.append(self._do_request(group, details))
if requests:
await asyncio.gather(*requests)
tasks = []
for task in data:
for task in _tasks:
tasks.append(self._complete_task(task))
if tasks:

View file

@ -3,7 +3,6 @@
from dataclasses import dataclass
from typing import Type
import httpx
from pydantic import BaseModel
from argos.schemas.models import Task
@ -92,8 +91,7 @@ class BaseCheck:
raise CheckNotFound(name)
return check
def __init__(self, http_client: httpx.AsyncClient, task: Task):
self.http_client = http_client
def __init__(self, task: Task):
self.task = task
@property

View file

@ -4,7 +4,7 @@ import json
import re
from datetime import datetime
from httpx import URL
from httpx import Response
from jsonpointer import resolve_pointer, JsonPointerException
from argos.checks.base import (
@ -22,13 +22,7 @@ class HTTPStatus(BaseCheck):
config = "status-is"
expected_cls = ExpectedIntValue
async def run(self) -> dict:
# XXX Get the method from the task
task = self.task
response = await self.http_client.request(
method="get", url=task.url, timeout=60
)
async def run(self, response: Response) -> dict:
return self.response(
status=response.status_code == self.expected,
expected=self.expected,
@ -42,13 +36,7 @@ class HTTPStatusIn(BaseCheck):
config = "status-in"
expected_cls = ExpectedStringValue
async def run(self) -> dict:
# XXX Get the method from the task
task = self.task
response = await self.http_client.request(
method="get", url=task.url, timeout=60
)
async def run(self, response: Response) -> dict:
return self.response(
status=response.status_code in json.loads(self.expected),
expected=self.expected,
@ -62,11 +50,7 @@ class HTTPToHTTPS(BaseCheck):
config = "http-to-https"
expected_cls = ExpectedStringValue
async def run(self) -> dict:
task = self.task
url = URL(task.url).copy_with(scheme="http")
response = await self.http_client.request(method="get", url=url, timeout=60)
async def run(self, response: Response) -> dict:
expected_dict = json.loads(self.expected)
expected = range(300, 400)
if "range" in expected_dict:
@ -90,13 +74,7 @@ class HTTPHeadersContain(BaseCheck):
config = "headers-contain"
expected_cls = ExpectedStringValue
async def run(self) -> dict:
# XXX Get the method from the task
task = self.task
response = await self.http_client.request(
method="get", url=task.url, timeout=60
)
async def run(self, response: Response) -> dict:
status = True
for header in json.loads(self.expected):
if header not in response.headers:
@ -116,13 +94,7 @@ class HTTPHeadersHave(BaseCheck):
config = "headers-have"
expected_cls = ExpectedStringValue
async def run(self) -> dict:
# XXX Get the method from the task
task = self.task
response = await self.http_client.request(
method="get", url=task.url, timeout=60
)
async def run(self, response: Response) -> dict:
status = True
for header, value in json.loads(self.expected).items():
if header not in response.headers:
@ -146,13 +118,7 @@ class HTTPHeadersLike(BaseCheck):
config = "headers-like"
expected_cls = ExpectedStringValue
async def run(self) -> dict:
# XXX Get the method from the task
task = self.task
response = await self.http_client.request(
method="get", url=task.url, timeout=60
)
async def run(self, response: Response) -> dict:
status = True
for header, value in json.loads(self.expected).items():
if header not in response.headers:
@ -175,10 +141,7 @@ class HTTPBodyContains(BaseCheck):
config = "body-contains"
expected_cls = ExpectedStringValue
async def run(self) -> dict:
response = await self.http_client.request(
method="get", url=self.task.url, timeout=60
)
async def run(self, response: Response) -> dict:
return self.response(status=self.expected in response.text)
@ -188,10 +151,7 @@ class HTTPBodyLike(BaseCheck):
config = "body-like"
expected_cls = ExpectedStringValue
async def run(self) -> dict:
response = await self.http_client.request(
method="get", url=self.task.url, timeout=60
)
async def run(self, response: Response) -> dict:
if re.search(rf"{self.expected}", response.text):
return self.response(status=True)
@ -205,13 +165,7 @@ class HTTPJsonContains(BaseCheck):
config = "json-contains"
expected_cls = ExpectedStringValue
async def run(self) -> dict:
# XXX Get the method from the task
task = self.task
response = await self.http_client.request(
method="get", url=task.url, timeout=60
)
async def run(self, response: Response) -> dict:
obj = response.json()
status = True
@ -235,13 +189,7 @@ class HTTPJsonHas(BaseCheck):
config = "json-has"
expected_cls = ExpectedStringValue
async def run(self) -> dict:
# XXX Get the method from the task
task = self.task
response = await self.http_client.request(
method="get", url=task.url, timeout=60
)
async def run(self, response: Response) -> dict:
obj = response.json()
status = True
@ -269,13 +217,7 @@ class HTTPJsonLike(BaseCheck):
config = "json-like"
expected_cls = ExpectedStringValue
async def run(self) -> dict:
# XXX Get the method from the task
task = self.task
response = await self.http_client.request(
method="get", url=task.url, timeout=60
)
async def run(self, response: Response) -> dict:
obj = response.json()
status = True
@ -302,13 +244,7 @@ class HTTPJsonIs(BaseCheck):
config = "json-is"
expected_cls = ExpectedStringValue
async def run(self) -> dict:
# XXX Get the method from the task
task = self.task
response = await self.http_client.request(
method="get", url=task.url, timeout=60
)
async def run(self, response: Response) -> dict:
obj = response.json()
status = response.json() == json.loads(self.expected)
@ -326,10 +262,8 @@ class SSLCertificateExpiration(BaseCheck):
config = "ssl-certificate-expiration"
expected_cls = ExpectedStringValue
async def run(self):
async def run(self, response: Response) -> dict:
"""Returns the number of days in which the certificate will expire."""
response = await self.http_client.get(self.task.url, timeout=60)
network_stream = response.extensions["network_stream"]
ssl_obj = network_stream.get_extra_info("ssl_object")
cert = ssl_obj.getpeercert()

View file

@ -92,7 +92,12 @@ def version():
default="INFO",
type=click.Choice(logging.LOG_LEVELS, case_sensitive=False),
)
def agent(server_url, auth, max_tasks, wait_time, log_level):
@click.option(
"--user-agent",
default="",
help="A custom string to append to the User-Agent header",
)
def agent(server_url, auth, max_tasks, wait_time, log_level, user_agent): # pylint: disable-msg=too-many-positional-arguments
"""Get and run tasks for the provided server. Will wait for new tasks.
Usage: argos agent https://argos.example.org "auth-token-here"
@ -108,7 +113,7 @@ def agent(server_url, auth, max_tasks, wait_time, log_level):
from argos.logging import logger
logger.setLevel(log_level)
agent_ = ArgosAgent(server_url, auth, max_tasks, wait_time)
agent_ = ArgosAgent(server_url, auth, max_tasks, wait_time, user_agent)
asyncio.run(agent_.run())
@ -135,101 +140,6 @@ def start(host, port, config, reload):
uvicorn.run("argos.server:app", host=host, port=port, reload=reload)
def validate_max_lock_seconds(ctx, param, value):
if value <= 60:
raise click.BadParameter("Should be strictly higher than 60")
return value
def validate_max_results(ctx, param, value):
if value <= 0:
raise click.BadParameter("Should be a positive integer")
return value
@server.command()
@click.option(
"--max-results",
default=100,
help="Number of results per task to keep",
callback=validate_max_results,
)
@click.option(
"--max-lock-seconds",
default=100,
help=(
"The number of seconds after which a lock is "
"considered stale, must be higher than 60 "
"(the checks have a timeout value of 60 seconds)"
),
callback=validate_max_lock_seconds,
)
@click.option(
"--config",
default="argos-config.yaml",
help="Path of the configuration file. "
"If ARGOS_YAML_FILE environment variable is set, its value will be used instead. "
"Default value: argos-config.yaml and /etc/argos/config.yaml as fallback.",
envvar="ARGOS_YAML_FILE",
callback=validate_config_access,
)
@coroutine
async def cleandb(max_results, max_lock_seconds, config):
"""Clean the database (to run routinely)
\b
- Removes old results from the database.
- Removes locks from tasks that have been locked for too long.
"""
# Its mandatory to do it before the imports
os.environ["ARGOS_YAML_FILE"] = config
# The imports are made here otherwise the agent will need server configuration files.
from argos.server import queries
db = await get_db()
removed = await queries.remove_old_results(db, max_results)
updated = await queries.release_old_locks(db, max_lock_seconds)
click.echo(f"{removed} results removed")
click.echo(f"{updated} locks released")
@server.command()
@click.option(
"--time-without-agent",
default=5,
help="Time without seeing an agent after which a warning will be issued, in minutes. "
"Default is 5 minutes.",
callback=validate_max_results,
)
@click.option(
"--config",
default="argos-config.yaml",
help="Path of the configuration file. "
"If ARGOS_YAML_FILE environment variable is set, its value will be used instead.",
envvar="ARGOS_YAML_FILE",
callback=validate_config_access,
)
@coroutine
async def watch_agents(time_without_agent, config):
"""Watch agents (to run routinely)
Issues a warning if no agent has been seen by the server for a given time.
"""
# Its mandatory to do it before the imports
os.environ["ARGOS_YAML_FILE"] = config
# The imports are made here otherwise the agent will need server configuration files.
from argos.server import queries
db = await get_db()
agents = await queries.get_recent_agents_count(db, time_without_agent)
if agents == 0:
click.echo(f"No agent has been seen in the last {time_without_agent} minutes.")
sysexit(1)
@server.command(short_help="Load or reload tasks configuration")
@click.option(
"--config",
@ -240,23 +150,40 @@ async def watch_agents(time_without_agent, config):
envvar="ARGOS_YAML_FILE",
callback=validate_config_access,
)
@click.option(
"--enqueue/--no-enqueue",
default=False,
help="Let Argos main recurring tasks handle configurations loading. "
"It may delay the application of the new configuration up to 2 minutes. "
"Default is --no-enqueue",
)
@coroutine
async def reload_config(config):
async def reload_config(config, enqueue):
"""Read tasks configuration and add/delete tasks in database if needed"""
# Its mandatory to do it before the imports
os.environ["ARGOS_YAML_FILE"] = config
# The imports are made here otherwise the agent will need server configuration files.
from argos.server import queries
from argos.server.main import read_config
from argos.server.settings import read_config
_config = read_config(config)
db = await get_db()
changed = await queries.update_from_config(db, _config)
click.echo(f"{changed['added']} tasks added")
click.echo(f"{changed['vanished']} tasks deleted")
config_changed = await queries.has_config_changed(db, _config)
if not config_changed:
click.echo("Config has not change")
else:
if enqueue:
msg = await queries.update_from_config_later(db, config_file=config)
click.echo(msg)
else:
changed = await queries.update_from_config(db, _config)
click.echo(f"{changed['added']} task(s) added")
click.echo(f"{changed['vanished']} task(s) deleted")
@server.command()
@ -570,8 +497,8 @@ async def test_mail(config, domain, severity):
from argos.logging import set_log_level
from argos.server.alerting import notify_by_mail
from argos.server.main import read_config
from argos.server.models import Result, Task
from argos.server.settings import read_config
conf = read_config(config)
@ -586,6 +513,7 @@ async def test_mail(config, domain, severity):
check="body-contains",
expected="foo",
frequency=1,
ip_version=4,
selected_by="test",
selected_at=now,
)
@ -634,8 +562,8 @@ async def test_gotify(config, domain, severity):
from argos.logging import set_log_level
from argos.server.alerting import notify_with_gotify
from argos.server.main import read_config
from argos.server.models import Result, Task
from argos.server.settings import read_config
conf = read_config(config)
@ -650,6 +578,7 @@ async def test_gotify(config, domain, severity):
check="body-contains",
expected="foo",
frequency=1,
ip_version=4,
selected_by="test",
selected_at=now,
)
@ -701,8 +630,8 @@ async def test_apprise(config, domain, severity, apprise_group):
from argos.logging import set_log_level
from argos.server.alerting import notify_with_apprise
from argos.server.main import read_config
from argos.server.models import Result, Task
from argos.server.settings import read_config
conf = read_config(config)
@ -717,6 +646,7 @@ async def test_apprise(config, domain, severity, apprise_group):
check="body-contains",
expected="foo",
frequency=1,
ip_version=4,
selected_by="test",
selected_at=now,
)

View file

@ -1,5 +1,7 @@
---
general:
# Except for frequency and recheck_delay settings, changes in general
# section of the configuration will need a restart of argos server.
db:
# The database URL, as defined in SQLAlchemy docs :
# https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls
@ -14,13 +16,77 @@ general:
# Can be "production", "dev", "test".
# If not present, default value is "production"
env: "production"
# to get a good string for cookie_secret, run:
# To get a good string for cookie_secret, run:
# openssl rand -hex 32
cookie_secret: "foo_bar_baz"
# Session duration
# Use m for minutes, h for hours, d for days
# w for weeks, M for months, y for years
# See https://github.com/timwedde/durations_nlp#scales-reference for details
# If not present, default value is "7d"
session_duration: "7d"
# Session opened with "Remember me" checked
# If not present, the "Remember me" feature is not available
# remember_me_duration: "1M"
# Unauthenticated access
# If can grant an unauthenticated access to the dashboard or to all pages
# To do so, choose either "dashboard", or "all"
# If not present, all pages needs authentication
# unauthenticated_access: "all"
# LDAP authentication
# Instead of relying on Argos users, use a LDAP server to authenticate users.
# If not present, Argos native user system is used.
# ldap:
# # Server URI
# uri: "ldaps://ldap.example.org"
# # Search base DN
# user_tree: "ou=users,dc=example,dc=org"
# # Search bind DN
# bind_dn: "uid=ldap_user,ou=users,dc=example,dc=org"
# # Search bind password
# bind_pwd: "secr3t"
# # User attribute (uid, mail, sAMAccountName, etc.)
# user_attr: "uid"
# # User filter (to exclude some users, etc.)
# user_filter: "(!(uid=ldap_user))"
# Default delay for checks.
# Can be superseeded in domain configuration.
# For ex., to run checks every minute:
frequency: "1m"
# For ex., to run checks every 5 minutes:
frequency: "5m"
# Default re-check delay if a check has failed.
# Can be superseeded in domain configuration.
# If not present, failed checked wont be re-checked (they will be
# run again like if they succeded
# For ex., to re-try a check one minute after a failure:
# recheck_delay: "1m"
# Default setting for notifications delay.
# Say you want to be warned right after a failure on a check: set it to 0
# Say you want a second failure on the check before being warned,
# to avoid network hiccups: set it to 1
# Can be superseeded in domain configuration
# If not present, default is 0
# retry_before_notification: 0
# Defaults settings for IPv4/IPv6
# Can be superseeded in domain configuration.
# By default, Argos will check both IPv4 and IPv6 addresses of a domain
# (i.e. by default, both `ipv4` and `ipv6` are set to true).
# To disable the IPv4 check of domains:
# ipv4: false
# To disable the IPv6 check of domains:
# ipv6: false
# Argos root path
# If not present, default value is ""
# Set it to /foo if you want to use argos at /foo/ instead of /
# on your web server
# root_path: "/foo"
# Which way do you want to be warned when a check goes to that severity?
# "local" emits a message in the server log
# Youll need to configure mail, gotify or apprise below to be able to use
@ -36,11 +102,10 @@ general:
- local
unknown:
- local
# Argos root path
# If not present, default value is ""
# Set it to /foo if you want to use argos at /foo/ instead of /
# on your web server
# root_path: "/foo"
# This alert is triggered when no Argos agent has been seen in a while
# See recurring_tasks.time_without_agent below
no_agent:
- local
# Mail configuration is quite straight-forward
# mail:
# mailfrom: no-reply@example.org
@ -84,6 +149,22 @@ ssl:
- "1d": critical
- "5d": warning
# Argos will execute some tasks in the background for you
# every 2 minutes and needs some configuration for that
recurring_tasks:
# Maximum age of results
# Use m for minutes, h for hours, d for days
# w for weeks, M for months, y for years
# See https://github.com/timwedde/durations_nlp#scales-reference for details
max_results_age: "1d"
# Max number of seconds a task can be locked
# Minimum value is 61, default is 100
max_lock_seconds: 100
# Max number of minutes without seing an agent
# before sending an alert
# Minimum value is 1, default is 5
time_without_agent: 5
# It's also possible to define the checks in another file
# with the include syntax:
#
@ -91,8 +172,15 @@ ssl:
#
websites:
- domain: "https://mypads.example.org"
# Wait for a second failure before sending notification
retry_before_notification: 1
paths:
- path: "/mypads/"
# Specify the method of the HTTP request
# Valid values are "GET", "HEAD", "POST", "OPTIONS",
# "CONNECT", "TRACE", "PUT", "PATCH" and "DELETE"
# default is "GET" if omitted
method: "GET"
checks:
# Check that the returned HTTP status is 200
- status-is: 200
@ -123,6 +211,17 @@ websites:
- 302
- 307
- path: "/admin/"
methode: "POST"
# Send form data in the request
request_data:
data:
login: "admin"
password: "my-password"
# To send data as JSON (optional, default is false):
is_json: true
# To send additional headers
headers:
Authorization: "Bearer foo-bar-baz"
checks:
# Check that the return HTTP status is one of those
# Similar to status-is, verify that you dont mistyped it!
@ -164,6 +263,9 @@ websites:
- json-is: '{"foo": "bar", "baz": 42}'
- domain: "https://munin.example.org"
frequency: "20m"
recheck_delay: "5m"
# Lets say its an IPv6 only web site
ipv4: false
paths:
- path: "/"
checks:

View file

@ -14,9 +14,10 @@ logger = logging.getLogger(__name__)
# XXX Does not work ?
def set_log_level(log_level):
def set_log_level(log_level: str, quiet: bool = False):
level = getattr(logging, log_level.upper(), None)
if not isinstance(level, int):
raise ValueError(f"Invalid log level: {log_level}")
logger.setLevel(level=level)
logger.info("Log level set to %s", log_level)
if not quiet:
logger.info("Log level set to %s", log_level)

View file

@ -5,8 +5,9 @@ For database models, see argos.server.models.
import json
from typing import Dict, List, Literal, Optional, Tuple
from typing import Any, Dict, List, Literal, Tuple
from durations_nlp import Duration
from pydantic import (
BaseModel,
ConfigDict,
@ -17,15 +18,16 @@ from pydantic import (
PositiveInt,
field_validator,
)
from pydantic.functional_validators import BeforeValidator
from pydantic.functional_validators import AfterValidator, BeforeValidator
from pydantic.networks import UrlConstraints
from pydantic_core import Url
from typing_extensions import Annotated
from argos.schemas.utils import string_to_duration
from argos.schemas.utils import Method
Severity = Literal["warning", "error", "critical", "unknown"]
Environment = Literal["dev", "test", "production"]
Unauthenticated = Literal["dashboard", "all"]
SQLiteDsn = Annotated[
Url,
UrlConstraints(
@ -37,7 +39,7 @@ SQLiteDsn = Annotated[
def parse_threshold(value):
"""Parse duration threshold for SSL certificate validity"""
for duration_str, severity in value.items():
days = string_to_duration(duration_str, "days")
days = Duration(duration_str).to_days()
# Return here because it's one-item dicts.
return (days, severity)
@ -46,6 +48,33 @@ class SSL(BaseModel):
thresholds: List[Annotated[Tuple[int, Severity], BeforeValidator(parse_threshold)]]
class RecurringTasks(BaseModel):
max_results_age: float
max_lock_seconds: int
time_without_agent: int
@field_validator("max_results_age", mode="before")
def parse_max_results_age(cls, value):
"""Convert the configured maximum results age to seconds"""
return Duration(value).to_seconds()
@field_validator("max_lock_seconds", mode="before")
def parse_max_lock_seconds(cls, value):
"""Ensure that max_lock_seconds is higher or equal to agents requests timeout (60)"""
if value > 60:
return value
return 100
@field_validator("time_without_agent", mode="before")
def parse_time_without_agent(cls, value):
"""Ensure that time_without_agent is at least one minute"""
if value >= 1:
return value
return 5
class WebsiteCheck(BaseModel):
key: str
value: str | List[str] | Dict[str, str]
@ -102,8 +131,26 @@ def parse_checks(value):
return (name, expected)
def parse_request_data(value):
"""Turn form or JSON data into JSON string"""
return json.dumps(
{"data": value.data, "json": value.is_json, "headers": value.headers}
)
class RequestData(BaseModel):
data: Any = None
is_json: bool = False
headers: Dict[str, str] | None = None
class WebsitePath(BaseModel):
path: str
method: Method = "GET"
request_data: Annotated[
RequestData, AfterValidator(parse_request_data)
] | None = None
checks: List[
Annotated[
Tuple[str, str],
@ -114,14 +161,26 @@ class WebsitePath(BaseModel):
class Website(BaseModel):
domain: HttpUrl
frequency: Optional[int] = None
ipv4: bool | None = None
ipv6: bool | None = None
frequency: float | None = None
recheck_delay: float | None = None
retry_before_notification: int | None = None
paths: List[WebsitePath]
@field_validator("frequency", mode="before")
def parse_frequency(cls, value):
"""Convert the configured frequency to minutes"""
if value:
return string_to_duration(value, "minutes")
return Duration(value).to_minutes()
return None
@field_validator("recheck_delay", mode="before")
def parse_recheck_delay(cls, value):
"""Convert the configured recheck delay to minutes"""
if value:
return Duration(value).to_minutes()
return None
@ -147,7 +206,7 @@ class Mail(BaseModel):
port: PositiveInt = 25
ssl: StrictBool = False
starttls: StrictBool = False
auth: Optional[MailAuth] = None
auth: MailAuth | None = None
addresses: List[EmailStr]
@ -158,6 +217,7 @@ class Alert(BaseModel):
warning: List[str]
critical: List[str]
unknown: List[str]
no_agent: List[str]
class GotifyUrl(BaseModel):
@ -171,27 +231,66 @@ class DbSettings(BaseModel):
max_overflow: int = 20
class LdapSettings(BaseModel):
uri: str
user_tree: str
bind_dn: str | None = None
bind_pwd: str | None = None
user_attr: str
user_filter: str | None = None
class General(BaseModel):
"""Frequency for the checks and alerts"""
cookie_secret: str
frequency: int
db: DbSettings
env: Environment = "production"
cookie_secret: str
session_duration: int = 10080 # 7 days
remember_me_duration: int | None = None
unauthenticated_access: Unauthenticated | None = None
ldap: LdapSettings | None = None
frequency: float
recheck_delay: float | None = None
retry_before_notification: int = 0
ipv4: bool = True
ipv6: bool = True
root_path: str = ""
alerts: Alert
mail: Optional[Mail] = None
gotify: Optional[List[GotifyUrl]] = None
apprise: Optional[Dict[str, List[str]]] = None
mail: Mail | None = None
gotify: List[GotifyUrl] | None = None
apprise: Dict[str, List[str]] | None = None
@field_validator("session_duration", mode="before")
def parse_session_duration(cls, value):
"""Convert the configured session duration to minutes"""
return Duration(value).to_minutes()
@field_validator("remember_me_duration", mode="before")
def parse_remember_me_duration(cls, value):
"""Convert the configured session duration with remember me feature to minutes"""
if value:
return int(Duration(value).to_minutes())
return None
@field_validator("frequency", mode="before")
def parse_frequency(cls, value):
"""Convert the configured frequency to minutes"""
return string_to_duration(value, "minutes")
return Duration(value).to_minutes()
@field_validator("recheck_delay", mode="before")
def parse_recheck_delay(cls, value):
"""Convert the configured recheck delay to minutes"""
if value:
return Duration(value).to_minutes()
return None
class Config(BaseModel):
general: General
service: Service
ssl: SSL
recurring_tasks: RecurringTasks
websites: List[Website]

View file

@ -8,17 +8,39 @@ from typing import Literal
from pydantic import BaseModel, ConfigDict
from argos.schemas.utils import IPVersion, Method, Todo
# XXX Refactor using SQLModel to avoid duplication of model data
class Job(BaseModel):
"""Tasks needing to be executed in recurring tasks processing.
Its quite like a job queue."""
id: int
todo: Todo
args: str
current: bool
added_at: datetime
def __str__(self):
return f"Job ({self.id}): {self.todo}"
class Task(BaseModel):
"""A task corresponds to a check to execute"""
id: int
url: str
domain: str
ip_version: IPVersion
check: str
method: Method
request_data: str | None
expected: str
task_group: str
retry_before_notification: int
contiguous_failures: int
selected_at: datetime | None
selected_by: str | None
@ -28,7 +50,8 @@ class Task(BaseModel):
task_id = self.id
url = self.url
check = self.check
return f"Task ({task_id}): {url} - {check}"
ip_version = self.ip_version
return f"Task ({task_id}): {url} (IPv{ip_version}) - {check}"
class SerializableException(BaseModel):

View file

@ -1,42 +1,10 @@
from typing import Literal
def string_to_duration(
value: str, target: Literal["days", "hours", "minutes"]
) -> int | float:
"""Convert a string to a number of hours, days or minutes"""
num = int("".join(filter(str.isdigit, value)))
IPVersion = Literal["4", "6"]
# It's not possible to convert from a smaller unit to a greater one:
# - hours and minutes cannot be converted to days
# - minutes cannot be converted to hours
if (target == "days" and ("h" in value or "m" in value.replace("mo", ""))) or (
target == "hours" and "m" in value.replace("mo", "")
):
msg = (
"Durations cannot be converted from a smaller to a greater unit. "
f"(trying to convert '{value}' to {target})"
)
raise ValueError(msg, value)
Method = Literal[
"GET", "HEAD", "POST", "OPTIONS", "CONNECT", "TRACE", "PUT", "PATCH", "DELETE"
]
# Consider we're converting to minutes, do the eventual multiplication at the end.
if "h" in value:
num = num * 60
elif "d" in value:
num = num * 60 * 24
elif "w" in value:
num = num * 60 * 24 * 7
elif "mo" in value:
num = num * 60 * 24 * 30 # considers 30d in a month
elif "y" in value:
num = num * 60 * 24 * 365 # considers 365d in a year
elif "m" not in value:
raise ValueError("Invalid duration value", value)
if target == "hours":
return num / 60
if target == "days":
return num / 60 / 24
# target == "minutes"
return num
Todo = Literal["RELOAD_CONFIG"]

View file

@ -11,6 +11,55 @@ import httpx
from argos.checks.base import Severity
from argos.logging import logger
from argos.schemas.config import Config, Mail, GotifyUrl
from argos.server.models import Task
def need_alert(
last_severity: str, last_severity_update, severity: str, status: str, task: Task
) -> bool:
## Create alert… or not!
send_notif = False
# Severity has changed, and no retry before notification
if last_severity != severity and task.retry_before_notification == 0:
send_notif = True
# Seems to be a first check: create a notification
elif last_severity != severity and last_severity_update is None:
send_notif = True
# As we created a notification, avoid resending it on a
# future failure
if status != "success":
task.contiguous_failures = task.retry_before_notification
# We need retry before notification, so the severity may not have changed
# since last check
elif task.retry_before_notification != 0:
# If we got a success, and we already have created a notification:
# create notification of success immediately
if (
status == "success"
and task.contiguous_failures >= task.retry_before_notification + 1
):
send_notif = True
task.contiguous_failures = 0
# The status is not a success
elif status != "success":
# This is a new failure
task.contiguous_failures += 1
# Severity has changed, but not to success, thats odd:
# create a notification
if (
last_severity not in ("ok", severity)
and last_severity_update is not None
):
send_notif = True
# As we created a notification, avoid resending it on a
# future failure
task.contiguous_failures = task.retry_before_notification
# Severity has not changed, but there has been enough failures
# to create a notification
elif task.contiguous_failures == task.retry_before_notification + 1:
send_notif = True
return send_notif
def get_icon_from_severity(severity: str) -> str:
@ -25,7 +74,92 @@ def get_icon_from_severity(severity: str) -> str:
return icon
def handle_alert(config: Config, result, task, severity, old_severity, request):
def send_mail(mail: EmailMessage, config: Mail):
"""Send message by mail"""
if config.ssl:
logger.debug("Mail notification: SSL")
context = ssl.create_default_context()
smtp = smtplib.SMTP_SSL(host=config.host, port=config.port, context=context)
else:
smtp = smtplib.SMTP(
host=config.host, # type: ignore
port=config.port,
)
if config.starttls:
logger.debug("Mail notification: STARTTLS")
context = ssl.create_default_context()
smtp.starttls(context=context)
if config.auth is not None:
logger.debug("Mail notification: authentification")
smtp.login(config.auth.login, config.auth.password)
for address in config.addresses:
logger.debug("Sending mail to %s", address)
logger.debug(mail.get_body())
smtp.send_message(mail, to_addrs=address)
def send_gotify_msg(config, payload):
"""Send message with gotify"""
headers = {"accept": "application/json", "content-type": "application/json"}
for url in config:
logger.debug("Sending gotify message(s) to %s", url.url)
for token in url.tokens:
try:
res = httpx.post(
f"{url.url}message",
params={"token": token},
headers=headers,
json=payload,
)
res.raise_for_status()
except httpx.RequestError as err:
logger.error(
"An error occurred while sending a message to %s with token %s",
err.request.url,
token,
)
def no_agent_alert(config: Config):
"""Alert"""
msg = "You should check whats going on with your Argos agents."
twa = config.recurring_tasks.time_without_agent
if twa > 1:
subject = f"No agent has been seen within the last {twa} minutes"
else:
subject = "No agent has been seen within the last minute"
if "local" in config.general.alerts.no_agent:
logger.error(subject)
if config.general.mail is not None and "mail" in config.general.alerts.no_agent:
mail = EmailMessage()
mail["Subject"] = f"[Argos] {subject}"
mail["From"] = config.general.mail.mailfrom
mail.set_content(msg)
send_mail(mail, config.general.mail)
if config.general.gotify is not None and "gotify" in config.general.alerts.no_agent:
priority = 9
payload = {"title": subject, "message": msg, "priority": priority}
send_gotify_msg(config.general.gotify, payload)
if config.general.apprise is not None:
for notif_way in config.general.alerts.no_agent:
if notif_way.startswith("apprise:"):
group = notif_way[8:]
apobj = apprise.Apprise()
for channel in config.general.apprise[group]:
apobj.add(channel)
apobj.notify(title=subject, body=msg)
def handle_alert(config: Config, result, task, severity, old_severity, request): # pylint: disable-msg=too-many-positional-arguments
"""Dispatch alert through configured alert channels"""
if "local" in getattr(config.general.alerts, severity):
@ -64,7 +198,7 @@ def handle_alert(config: Config, result, task, severity, old_severity, request):
)
def notify_with_apprise(
def notify_with_apprise( # pylint: disable-msg=too-many-positional-arguments
result, task, severity: str, old_severity: str, group: List[str], request
) -> None:
logger.debug("Will send apprise notification")
@ -74,9 +208,9 @@ def notify_with_apprise(
apobj.add(channel)
icon = get_icon_from_severity(severity)
title = f"[Argos] {icon} {urlparse(task.url).netloc}: status {severity}"
title = f"[Argos] {icon} {urlparse(task.url).netloc} (IPv{task.ip_version}): status {severity}"
msg = f"""\
URL: {task.url}
URL: {task.url} (IPv{task.ip_version})
Check: {task.check}
Status: {severity}
Time: {result.submitted_at}
@ -90,14 +224,14 @@ See results of task on {request.url_for('get_task_results_view', task_id=task.id
apobj.notify(title=title, body=msg)
def notify_by_mail(
def notify_by_mail( # pylint: disable-msg=too-many-positional-arguments
result, task, severity: str, old_severity: str, config: Mail, request
) -> None:
logger.debug("Will send mail notification")
icon = get_icon_from_severity(severity)
msg = f"""\
URL: {task.url}
URL: {task.url} (IPv{task.ip_version})
Check: {task.check}
Status: {severity}
Time: {result.submitted_at}
@ -109,39 +243,18 @@ See results of task on {request.url_for('get_task_results_view', task_id=task.id
"""
mail = EmailMessage()
mail["Subject"] = f"[Argos] {icon} {urlparse(task.url).netloc}: status {severity}"
mail[
"Subject"
] = f"[Argos] {icon} {urlparse(task.url).netloc} (IPv{task.ip_version}): status {severity}"
mail["From"] = config.mailfrom
mail.set_content(msg)
if config.ssl:
logger.debug("Mail notification: SSL")
context = ssl.create_default_context()
smtp = smtplib.SMTP_SSL(host=config.host, port=config.port, context=context)
else:
smtp = smtplib.SMTP(
host=config.host, # type: ignore
port=config.port,
)
if config.starttls:
logger.debug("Mail notification: STARTTLS")
context = ssl.create_default_context()
smtp.starttls(context=context)
if config.auth is not None:
logger.debug("Mail notification: authentification")
smtp.login(config.auth.login, config.auth.password)
for address in config.addresses:
logger.debug("Sending mail to %s", address)
logger.debug(msg)
smtp.send_message(mail, to_addrs=address)
send_mail(mail, config)
def notify_with_gotify(
def notify_with_gotify( # pylint: disable-msg=too-many-positional-arguments
result, task, severity: str, old_severity: str, config: List[GotifyUrl], request
) -> None:
logger.debug("Will send gotify notification")
headers = {"accept": "application/json", "content-type": "application/json"}
icon = get_icon_from_severity(severity)
priority = 9
@ -152,9 +265,11 @@ def notify_with_gotify(
elif severity == Severity.UNKNOWN:
priority = 5
subject = f"{icon} {urlparse(task.url).netloc}: status {severity}"
subject = (
f"{icon} {urlparse(task.url).netloc} (IPv{task.ip_version}): status {severity}"
)
msg = f"""\
URL:    <{task.url}>\\
URL:    <{task.url}> (IPv{task.ip_version})\\
Check:  {task.check}\\
Status: {severity}\\
Time:   {result.submitted_at}\\
@ -175,20 +290,4 @@ See results of task on <{request.url_for('get_task_results_view', task_id=task.i
payload = {"title": subject, "message": msg, "priority": priority, "extras": extras}
for url in config:
logger.debug("Sending gotify message(s) to %s", url.url)
for token in url.tokens:
try:
res = httpx.post(
f"{url.url}message",
params={"token": token},
headers=headers,
json=payload,
)
res.raise_for_status()
except httpx.RequestError as err:
logger.error(
"An error occurred while sending a message to %s with token %s",
err.request.url,
token,
)
send_gotify_msg(config, payload)

View file

@ -1,19 +1,20 @@
import os
import sys
from contextlib import asynccontextmanager
from pathlib import Path
from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles
from fastapi_login import LoginManager
from pydantic import ValidationError
from fastapi_utils.tasks import repeat_every
from psutil import Process
from sqlalchemy import create_engine, event
from sqlalchemy.orm import sessionmaker
from argos.logging import logger
from argos.logging import logger, set_log_level
from argos.server import models, routes, queries
from argos.server.alerting import no_agent_alert
from argos.server.exceptions import NotAuthenticatedException, auth_exception_handler
from argos.server.settings import read_yaml_config
from argos.server.settings import read_config
def get_application() -> FastAPI:
@ -36,13 +37,23 @@ def get_application() -> FastAPI:
appli.add_exception_handler(NotAuthenticatedException, auth_exception_handler)
appli.state.manager = create_manager(config.general.cookie_secret)
if config.general.ldap is not None:
import ldap
appli.state.ldap = ldap.initialize(config.general.ldap.uri)
@appli.state.manager.user_loader()
async def query_user(user: str) -> None | models.User:
async def query_user(user: str) -> None | str | models.User:
"""
Get a user from the db
Get a user from the db or LDAP
:param user: name of the user
:return: None or the user object
"""
if appli.state.config.general.ldap is not None:
from argos.server.routes.dependencies import find_ldap_user
return await find_ldap_user(appli.state.config, appli.state.ldap, user)
return await queries.get_user(appli.state.db, user)
appli.include_router(routes.api, prefix="/api")
@ -59,17 +70,6 @@ async def connect_to_db(appli):
return appli.state.db
def read_config(yaml_file):
try:
config = read_yaml_config(yaml_file)
return config
except ValidationError as err:
logger.error("Errors where found while reading configuration:")
for error in err.errors():
logger.error("%s is %s", error["loc"], error["type"])
sys.exit(1)
def setup_database(appli):
config = appli.state.config
db_url = str(config.general.db.url)
@ -100,7 +100,7 @@ def setup_database(appli):
models.Base.metadata.create_all(bind=engine)
def create_manager(cookie_secret):
def create_manager(cookie_secret: str) -> LoginManager:
if cookie_secret == "foo_bar_baz":
logger.warning(
"You should change the cookie_secret secret in your configuration file."
@ -114,8 +114,47 @@ def create_manager(cookie_secret):
)
@repeat_every(seconds=120, logger=logger)
async def recurring_tasks() -> None:
"""Recurring DB cleanup and watch-agents tasks"""
# If we are using gunicorn
if not hasattr(app.state, "SessionLocal"):
parent_process = Process(os.getppid())
children = parent_process.children(recursive=True)
# Start the task only once, not for every worker
if children[0].pid == os.getpid():
# and we need to setup database engine
setup_database(app)
else:
return None
set_log_level("info", quiet=True)
logger.info("Start background recurring tasks")
with app.state.SessionLocal() as db:
config = app.state.config.recurring_tasks
agents = await queries.get_recent_agents_count(db, config.time_without_agent)
if agents == 0:
no_agent_alert(app.state.config)
logger.info("Agent presence checked")
removed = await queries.remove_old_results(db, config.max_results_age)
logger.info("%i result(s) removed", removed)
updated = await queries.release_old_locks(db, config.max_lock_seconds)
logger.info("%i lock(s) released", updated)
processed_jobs = await queries.process_jobs(db)
logger.info("%i job(s) processed", processed_jobs)
logger.info("Background recurring tasks ended")
return None
@asynccontextmanager
async def lifespan(appli):
async def lifespan(appli: FastAPI):
"""Server start and stop actions
Setup database connection then close it at shutdown.
@ -130,6 +169,7 @@ async def lifespan(appli):
"There is no tasks in the database. "
'Please launch the command "argos server reload-config"'
)
await recurring_tasks()
yield

View file

@ -0,0 +1,37 @@
"""Add recheck delay
Revision ID: 127d74c770bb
Revises: dcf73fa19fce
Create Date: 2024-11-27 16:04:58.138768
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "127d74c770bb"
down_revision: Union[str, None] = "dcf73fa19fce"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.add_column(sa.Column("recheck_delay", sa.Float(), nullable=True))
batch_op.add_column(
sa.Column(
"already_retried",
sa.Boolean(),
nullable=False,
server_default=sa.sql.false(),
)
)
def downgrade() -> None:
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.drop_column("already_retried")
batch_op.drop_column("recheck_delay")

View file

@ -0,0 +1,28 @@
"""Add request data to tasks
Revision ID: 31255a412d63
Revises: 80a29f64f91c
Create Date: 2024-12-09 16:40:20.926138
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "31255a412d63"
down_revision: Union[str, None] = "80a29f64f91c"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.add_column(sa.Column("request_data", sa.String(), nullable=True))
def downgrade() -> None:
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.drop_column("request_data")

View file

@ -0,0 +1,36 @@
"""Add job queue
Revision ID: 5f6cb30db996
Revises: bd4b4962696a
Create Date: 2025-02-17 16:56:36.673511
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "5f6cb30db996"
down_revision: Union[str, None] = "bd4b4962696a"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
op.create_table(
"jobs",
sa.Column("id", sa.Integer(), nullable=False),
sa.Column("todo", sa.Enum("RELOAD_CONFIG", name="todo_enum"), nullable=False),
sa.Column("args", sa.String(), nullable=False),
sa.Column(
"current", sa.Boolean(), server_default=sa.sql.false(), nullable=False
),
sa.Column("added_at", sa.DateTime(), nullable=False),
sa.PrimaryKeyConstraint("id"),
)
def downgrade() -> None:
op.drop_table("jobs")

View file

@ -0,0 +1,34 @@
"""Add IP version to checks
Revision ID: 64f73a79b7d8
Revises: a1e98cf72a5c
Create Date: 2024-12-02 14:12:40.558033
"""
from typing import Sequence, Union
from alembic import op
from sqlalchemy.dialects.postgresql import ENUM
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "64f73a79b7d8"
down_revision: Union[str, None] = "a1e98cf72a5c"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
enum = ENUM("4", "6", name="ip_version_enum", create_type=False)
enum.create(op.get_bind(), checkfirst=True)
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.add_column(
sa.Column("ip_version", enum, server_default="4", nullable=False)
)
def downgrade() -> None:
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.drop_column("ip_version")
ENUM(name="ip_version_enum").drop(op.get_bind(), checkfirst=True)

View file

@ -0,0 +1,41 @@
"""Add retries before notification feature
Revision ID: 80a29f64f91c
Revises: 8b58ced14d6e
Create Date: 2024-12-04 17:03:35.104368
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "80a29f64f91c"
down_revision: Union[str, None] = "8b58ced14d6e"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.add_column(
sa.Column(
"retry_before_notification",
sa.Integer(),
server_default="0",
nullable=False,
)
)
batch_op.add_column(
sa.Column(
"contiguous_failures", sa.Integer(), server_default="0", nullable=False
)
)
def downgrade() -> None:
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.drop_column("contiguous_failures")
batch_op.drop_column("retry_before_notification")

View file

@ -0,0 +1,35 @@
"""Add task index
Revision ID: 8b58ced14d6e
Revises: 64f73a79b7d8
Create Date: 2024-12-03 16:41:44.842213
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "8b58ced14d6e"
down_revision: Union[str, None] = "64f73a79b7d8"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.add_column(sa.Column("task_group", sa.String(), nullable=True))
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.execute(
"UPDATE tasks SET task_group = method || '-' || ip_version || '-' || url"
)
batch_op.alter_column("task_group", nullable=False)
batch_op.create_index("similar_tasks", ["task_group"], unique=False)
def downgrade() -> None:
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.drop_index("similar_tasks")
batch_op.drop_column("task_group")

View file

@ -0,0 +1,38 @@
"""Make frequency a float
Revision ID: a1e98cf72a5c
Revises: 127d74c770bb
Create Date: 2024-11-27 16:10:13.000705
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "a1e98cf72a5c"
down_revision: Union[str, None] = "127d74c770bb"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.alter_column(
"frequency",
existing_type=sa.INTEGER(),
type_=sa.Float(),
existing_nullable=False,
)
def downgrade() -> None:
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.alter_column(
"frequency",
existing_type=sa.Float(),
type_=sa.INTEGER(),
existing_nullable=False,
)

View file

@ -0,0 +1,42 @@
"""Use bigint for results id field
Revision ID: bd4b4962696a
Revises: 31255a412d63
Create Date: 2025-01-06 11:44:37.552965
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "bd4b4962696a"
down_revision: Union[str, None] = "31255a412d63"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
bind = op.get_bind()
if bind.engine.name != "sqlite":
with op.batch_alter_table("results", schema=None) as batch_op:
batch_op.alter_column(
"id",
existing_type=sa.INTEGER(),
type_=sa.BigInteger(),
existing_nullable=False,
)
def downgrade() -> None:
bind = op.get_bind()
if bind.engine.name != "sqlite":
with op.batch_alter_table("results", schema=None) as batch_op:
batch_op.alter_column(
"id",
existing_type=sa.BigInteger(),
type_=sa.INTEGER(),
existing_nullable=False,
)

View file

@ -0,0 +1,51 @@
"""Specify check method
Revision ID: dcf73fa19fce
Revises: c780864dc407
Create Date: 2024-11-26 14:40:27.510587
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
# revision identifiers, used by Alembic.
revision: str = "dcf73fa19fce"
down_revision: Union[str, None] = "c780864dc407"
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
enum = sa.Enum(
"GET",
"HEAD",
"POST",
"OPTIONS",
"CONNECT",
"TRACE",
"PUT",
"PATCH",
"DELETE",
name="method",
create_type=False,
)
enum.create(op.get_bind(), checkfirst=True)
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.add_column(
sa.Column(
"method",
enum,
nullable=False,
server_default="GET",
)
)
def downgrade() -> None:
with op.batch_alter_table("tasks", schema=None) as batch_op:
batch_op.drop_column("method")
sa.Enum(name="method").drop(op.get_bind(), checkfirst=True)

View file

@ -1,6 +1,7 @@
"""Database models"""
from datetime import datetime, timedelta
from hashlib import md5
from typing import List, Literal
from sqlalchemy import (
@ -9,15 +10,42 @@ from sqlalchemy import (
ForeignKey,
)
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship
from sqlalchemy.schema import Index
from argos.checks import BaseCheck, get_registered_check
from argos.schemas import WebsiteCheck
from argos.schemas.utils import IPVersion, Method, Todo
def compute_task_group(context) -> str:
data = context.current_parameters["request_data"]
if data is None:
data = ""
return (
f"{context.current_parameters['method']}-"
f"{context.current_parameters['ip_version']}-"
f"{context.current_parameters['url']}-"
f"{md5(data.encode()).hexdigest()}"
)
class Base(DeclarativeBase):
type_annotation_map = {List[WebsiteCheck]: JSON, dict: JSON}
class Job(Base):
"""
Job queue emulation
"""
__tablename__ = "jobs"
id: Mapped[int] = mapped_column(primary_key=True)
todo: Mapped[Todo] = mapped_column(Enum("RELOAD_CONFIG", name="todo_enum"))
args: Mapped[str] = mapped_column()
current: Mapped[bool] = mapped_column(insert_default=False)
added_at: Mapped[datetime] = mapped_column()
class Task(Base):
"""
There is one task per check.
@ -32,15 +60,39 @@ class Task(Base):
# Info needed to run the task
url: Mapped[str] = mapped_column()
domain: Mapped[str] = mapped_column()
ip_version: Mapped[IPVersion] = mapped_column(
Enum("4", "6", name="ip_version_enum"),
)
check: Mapped[str] = mapped_column()
expected: Mapped[str] = mapped_column()
frequency: Mapped[int] = mapped_column()
frequency: Mapped[float] = mapped_column()
recheck_delay: Mapped[float] = mapped_column(nullable=True)
already_retried: Mapped[bool] = mapped_column(insert_default=False)
retry_before_notification: Mapped[int] = mapped_column(insert_default=0)
contiguous_failures: Mapped[int] = mapped_column(insert_default=0)
method: Mapped[Method] = mapped_column(
Enum(
"GET",
"HEAD",
"POST",
"OPTIONS",
"CONNECT",
"TRACE",
"PUT",
"PATCH",
"DELETE",
name="method",
),
insert_default="GET",
)
request_data: Mapped[str] = mapped_column(nullable=True)
# Orchestration-related
selected_by: Mapped[str] = mapped_column(nullable=True)
selected_at: Mapped[datetime] = mapped_column(nullable=True)
completed_at: Mapped[datetime] = mapped_column(nullable=True)
next_run: Mapped[datetime] = mapped_column(nullable=True)
task_group: Mapped[str] = mapped_column(insert_default=compute_task_group)
severity: Mapped[Literal["ok", "warning", "critical", "unknown"]] = mapped_column(
Enum("ok", "warning", "critical", "unknown", name="severity"),
@ -54,8 +106,8 @@ class Task(Base):
passive_deletes=True,
)
def __str__(self):
return f"DB Task {self.url} - {self.check} - {self.expected}"
def __str__(self) -> str:
return f"DB Task {self.url} (IPv{self.ip_version}) - {self.check} - {self.expected}"
def get_check(self) -> BaseCheck:
"""Returns a check instance for this specific task"""
@ -70,7 +122,16 @@ class Task(Base):
now = datetime.now()
self.completed_at = now
self.next_run = now + timedelta(minutes=self.frequency)
if (
self.recheck_delay is not None
and severity != "ok"
and not self.already_retried
):
self.next_run = now + timedelta(minutes=self.recheck_delay)
self.already_retried = True
else:
self.next_run = now + timedelta(minutes=self.frequency)
self.already_retried = False
@property
def last_result(self):
@ -87,6 +148,9 @@ class Task(Base):
return self.last_result.status
Index("similar_tasks", Task.task_group)
class Result(Base):
"""There are multiple results per task.

View file

@ -4,25 +4,27 @@ from hashlib import sha256
from typing import List
from urllib.parse import urljoin
from sqlalchemy import asc, desc, func
from sqlalchemy import asc, func, Select
from sqlalchemy.orm import Session
from argos import schemas
from argos.logging import logger
from argos.server.models import Result, Task, ConfigCache, User
from argos.server.models import ConfigCache, Job, Result, Task, User
from argos.server.settings import read_config
async def list_tasks(db: Session, agent_id: str, limit: int = 100):
"""List tasks and mark them as selected"""
tasks = (
db.query(Task)
subquery = (
db.query(func.distinct(Task.task_group))
.filter(
Task.selected_by == None, # noqa: E711
((Task.next_run <= datetime.now()) | (Task.next_run == None)), # noqa: E711
)
.limit(limit)
.all()
.subquery()
)
tasks = db.query(Task).filter(Task.task_group.in_(Select(subquery))).all()
now = datetime.now()
for task in tasks:
@ -82,13 +84,22 @@ async def count_results(db: Session):
return db.query(Result).count()
async def has_config_changed(db: Session, config: schemas.Config) -> bool:
async def has_config_changed(db: Session, config: schemas.Config) -> bool: # pylint: disable-msg=too-many-statements
"""Check if websites config has changed by using a hashsum and a config cache"""
websites_hash = sha256(str(config.websites).encode()).hexdigest()
conf_caches = db.query(ConfigCache).all()
same_config = True
keys = [
"websites_hash",
"general_frequency",
"general_recheck_delay",
"general_retry_before_notification",
"general_ipv4",
"general_ipv6",
]
if conf_caches:
for conf in conf_caches:
keys.remove(conf.name)
match conf.name:
case "websites_hash":
if conf.val != websites_hash:
@ -100,9 +111,72 @@ async def has_config_changed(db: Session, config: schemas.Config) -> bool:
same_config = False
conf.val = str(config.general.frequency)
conf.updated_at = datetime.now()
case "general_recheck_delay":
if conf.val != str(config.general.recheck_delay):
same_config = False
conf.val = str(config.general.recheck_delay)
conf.updated_at = datetime.now()
case "general_retry_before_notification":
if conf.val != str(config.general.retry_before_notification):
same_config = False
conf.val = str(config.general.retry_before_notification)
conf.updated_at = datetime.now()
case "general_ipv4":
if conf.val != str(config.general.ipv4):
same_config = False
conf.val = str(config.general.ipv4)
conf.updated_at = datetime.now()
case "general_ipv6":
if conf.val != str(config.general.ipv6):
same_config = False
conf.val = str(config.general.ipv6)
conf.updated_at = datetime.now()
for i in keys:
match i:
case "websites_hash":
c = ConfigCache(
name="websites_hash",
val=websites_hash,
updated_at=datetime.now(),
)
case "general_frequency":
c = ConfigCache(
name="general_frequency",
val=str(config.general.frequency),
updated_at=datetime.now(),
)
case "general_recheck_delay":
c = ConfigCache(
name="general_recheck_delay",
val=str(config.general.recheck_delay),
updated_at=datetime.now(),
)
case "general_retry_before_notification":
c = ConfigCache(
name="general_retry_before_notification",
val=str(config.general.retry_before_notification),
updated_at=datetime.now(),
)
case "general_ipv4":
c = ConfigCache(
name="general_ipv4",
val=str(config.general.ipv4),
updated_at=datetime.now(),
)
case "general_ipv6":
c = ConfigCache(
name="general_ipv6",
val=str(config.general.ipv6),
updated_at=datetime.now(),
)
db.add(c)
db.commit()
if keys:
return True
if same_config:
return False
@ -115,70 +189,182 @@ async def has_config_changed(db: Session, config: schemas.Config) -> bool:
val=str(config.general.frequency),
updated_at=datetime.now(),
)
gen_recheck = ConfigCache(
name="general_recheck_delay",
val=str(config.general.recheck_delay),
updated_at=datetime.now(),
)
gen_retry_before_notif = ConfigCache(
name="general_retry_before_notification",
val=str(config.general.retry_before_notification),
updated_at=datetime.now(),
)
gen_ipv4 = ConfigCache(
name="general_ipv4",
val=str(config.general.ipv4),
updated_at=datetime.now(),
)
gen_ipv6 = ConfigCache(
name="general_ipv6",
val=str(config.general.ipv6),
updated_at=datetime.now(),
)
db.add(web_hash)
db.add(gen_freq)
db.add(gen_recheck)
db.add(gen_retry_before_notif)
db.add(gen_ipv4)
db.add(gen_ipv6)
db.commit()
return True
async def update_from_config(db: Session, config: schemas.Config):
"""Update tasks from config file"""
config_changed = await has_config_changed(db, config)
if not config_changed:
return {"added": 0, "vanished": 0}
async def update_from_config_later(db: Session, config_file):
"""Ask Argos to reload configuration in a recurring task"""
jobs = (
db.query(Job)
.filter(
Job.todo == "RELOAD_CONFIG",
Job.args == config_file,
Job.current == False,
)
.all()
)
if jobs:
return "There is already a config reloading job in the job queue, for the same file"
job = Job(todo="RELOAD_CONFIG", args=config_file, added_at=datetime.now())
db.add(job)
db.commit()
return "Config reloading has been added in the job queue"
async def process_jobs(db: Session) -> int:
"""Process job queue"""
jobs = db.query(Job).filter(Job.current == False).all()
if jobs:
for job in jobs:
job.current = True
db.commit()
if job.todo == "RELOAD_CONFIG":
logger.info("Processing job %i: %s %s", job.id, job.todo, job.args)
_config = read_config(job.args)
changed = await update_from_config(db, _config)
logger.info("%i task(s) added", changed["added"])
logger.info("%i task(s) deleted", changed["vanished"])
db.delete(job)
db.commit()
return len(jobs)
return 0
async def update_from_config(db: Session, config: schemas.Config): # pylint: disable-msg=too-many-branches
"""Update tasks from config file"""
max_task_id = (
db.query(func.max(Task.id).label("max_id")).all() # pylint: disable-msg=not-callable
)[0].max_id
tasks = []
unique_properties = []
seen_tasks: List[int] = []
for website in config.websites:
for website in config.websites: # pylint: disable-msg=too-many-nested-blocks
domain = str(website.domain)
frequency = website.frequency or config.general.frequency
recheck_delay = website.recheck_delay or config.general.recheck_delay
retry_before_notification = (
website.retry_before_notification
if website.retry_before_notification is not None
else config.general.retry_before_notification
)
ipv4 = website.ipv4 if website.ipv4 is not None else config.general.ipv4
ipv6 = website.ipv6 if website.ipv6 is not None else config.general.ipv6
if ipv4 is False and ipv6 is False:
logger.warning("IPv4 AND IPv6 are disabled on website %s!", domain)
continue
for p in website.paths:
url = urljoin(domain, str(p.path))
for check_key, expected in p.checks:
# Check the db for already existing tasks.
existing_tasks = (
db.query(Task)
.filter(
Task.url == url,
Task.check == check_key,
Task.expected == expected,
)
.all()
)
if existing_tasks:
existing_task = existing_tasks[0]
seen_tasks.append(existing_task.id)
if frequency != existing_task.frequency:
existing_task.frequency = frequency
logger.debug(
"Skipping db task creation for url=%s, "
"check_key=%s, expected=%s, frequency=%s.",
url,
check_key,
expected,
frequency,
)
else:
properties = (url, check_key, expected)
if properties not in unique_properties:
unique_properties.append(properties)
task = Task(
domain=domain,
url=url,
check=check_key,
expected=expected,
frequency=frequency,
for ip_version in ["4", "6"]:
for p in website.paths:
url = urljoin(domain, str(p.path))
for check_key, expected in p.checks:
# Check the db for already existing tasks.
existing_tasks = (
db.query(Task)
.filter(
Task.url == url,
Task.method == p.method,
Task.request_data == p.request_data,
Task.check == check_key,
Task.expected == expected,
Task.ip_version == ip_version,
)
logger.debug("Adding a new task in the db: %s", task)
tasks.append(task)
.all()
)
if (ip_version == "4" and ipv4 is False) or (
ip_version == "6" and ipv6 is False
):
continue
if existing_tasks:
existing_task = existing_tasks[0]
seen_tasks.append(existing_task.id)
if frequency != existing_task.frequency:
existing_task.frequency = frequency
if recheck_delay != existing_task.recheck_delay:
existing_task.recheck_delay = recheck_delay # type: ignore[assignment]
if (
retry_before_notification
!= existing_task.retry_before_notification
):
existing_task.retry_before_notification = (
retry_before_notification
)
logger.debug(
"Skipping db task creation for url=%s, "
"method=%s, check_key=%s, expected=%s, "
"frequency=%s, recheck_delay=%s, "
"retry_before_notification=%s, ip_version=%s.",
url,
p.method,
check_key,
expected,
frequency,
recheck_delay,
retry_before_notification,
ip_version,
)
else:
properties = (
url,
p.method,
check_key,
expected,
ip_version,
p.request_data,
)
if properties not in unique_properties:
unique_properties.append(properties)
task = Task(
domain=domain,
url=url,
ip_version=ip_version,
method=p.method,
request_data=p.request_data,
check=check_key,
expected=expected,
frequency=frequency,
recheck_delay=recheck_delay,
retry_before_notification=retry_before_notification,
already_retried=False,
)
logger.debug("Adding a new task in the db: %s", task)
tasks.append(task)
db.add_all(tasks)
db.commit()
@ -192,7 +378,8 @@ async def update_from_config(db: Session, config: schemas.Config):
)
db.commit()
logger.info(
"%i tasks has been removed since not in config file anymore", vanished_tasks
"%i task(s) has been removed since not in config file anymore",
vanished_tasks,
)
return {"added": len(tasks), "vanished": vanished_tasks}
@ -222,28 +409,13 @@ async def reschedule_all(db: Session):
db.commit()
async def remove_old_results(db: Session, max_results: int):
tasks = db.query(Task).all()
deleted = 0
for task in tasks:
# Get the id of the oldest result to keep
subquery = (
db.query(Result.id)
.filter(Result.task_id == task.id)
.order_by(desc(Result.id))
.limit(max_results)
.subquery()
)
min_id = db.query(func.min(subquery.c.id)).scalar() # pylint: disable-msg=not-callable
# Delete all the results older than min_id
if min_id:
deleted += (
db.query(Result)
.where(Result.id < min_id, Result.task_id == task.id)
.delete()
)
db.commit()
async def remove_old_results(db: Session, max_results_age: float):
"""Remove old results, base on age"""
max_acceptable_time = datetime.now() - timedelta(seconds=max_results_age)
deleted = (
db.query(Result).filter(Result.submitted_at < max_acceptable_time).delete()
)
db.commit()
return deleted

View file

@ -7,7 +7,7 @@ from sqlalchemy.orm import Session
from argos.logging import logger
from argos.schemas import AgentResult, Config, Task
from argos.server import queries
from argos.server.alerting import handle_alert
from argos.server.alerting import handle_alert, need_alert
from argos.server.routes.dependencies import get_config, get_db, verify_token
route = APIRouter()
@ -30,7 +30,7 @@ async def read_tasks(
@route.post("/results", status_code=201, dependencies=[Depends(verify_token)])
async def create_results(
async def create_results( # pylint: disable-msg=too-many-positional-arguments
request: Request,
results: List[AgentResult],
background_tasks: BackgroundTasks,
@ -58,16 +58,26 @@ async def create_results(
logger.error("Unable to find task %i", agent_result.task_id)
else:
last_severity = task.severity
last_severity_update = task.last_severity_update
result = await queries.create_result(db, agent_result, agent_id)
check = task.get_check()
status, severity = await check.finalize(config, result, **result.context)
result.set_status(status, severity)
task.set_times_severity_and_deselect(severity, result.submitted_at)
# Dont create an alert if the severity has not changed
if last_severity != severity:
send_notif = need_alert(
last_severity, last_severity_update, severity, status, task
)
if send_notif:
background_tasks.add_task(
handle_alert, config, result, task, severity, last_severity, request
handle_alert,
config,
result,
task,
severity,
last_severity,
request,
)
db_results.append(result)

View file

@ -2,6 +2,8 @@ from fastapi import Depends, HTTPException, Request
from fastapi.security import HTTPAuthorizationCredentials, HTTPBearer
from fastapi_login import LoginManager
from argos.logging import logger
auth_scheme = HTTPBearer()
@ -18,6 +20,9 @@ def get_config(request: Request):
async def get_manager(request: Request) -> LoginManager:
if request.app.state.config.general.unauthenticated_access is not None:
return await request.app.state.manager.optional(request)
return await request.app.state.manager(request)
@ -28,3 +33,35 @@ async def verify_token(
if token.credentials not in request.app.state.config.service.secrets:
raise HTTPException(status_code=401, detail="Unauthorized")
return token
async def find_ldap_user(config, ldapobj, user: str) -> str | None:
"""Do a LDAP search for user and return its dn"""
import ldap
import ldap.filter as ldap_filter
from ldapurl import LDAP_SCOPE_SUBTREE
try:
ldapobj.simple_bind_s(config.general.ldap.bind_dn, config.general.ldap.bind_pwd)
except ldap.LDAPError as err: # pylint: disable-msg=no-member
logger.error("LDAP error: %s", err)
return None
result = ldapobj.search_s(
config.general.ldap.user_tree,
LDAP_SCOPE_SUBTREE,
filterstr=ldap_filter.filter_format(
f"(&(%s=%s){config.general.ldap.user_filter})",
[
config.general.ldap.user_attr,
user,
],
),
attrlist=[config.general.ldap.user_attr],
)
# If there is a result, there should, logically, be only one entry
if len(result) > 0:
return result[0][0]
return None

View file

@ -17,6 +17,7 @@ from sqlalchemy.orm import Session
from argos.checks.base import Status
from argos.schemas import Config
from argos.server import queries
from argos.server.exceptions import NotAuthenticatedException
from argos.server.models import Result, Task, User
from argos.server.routes.dependencies import get_config, get_db, get_manager
@ -28,7 +29,17 @@ SEVERITY_LEVELS = {"ok": 1, "warning": 2, "critical": 3, "unknown": 4}
@route.get("/login")
async def login_view(request: Request, msg: str | None = None):
async def login_view(
request: Request,
msg: str | None = None,
config: Config = Depends(get_config),
):
if config.general.unauthenticated_access == "all":
return RedirectResponse(
request.url_for("get_severity_counts_view"),
status_code=status.HTTP_303_SEE_OTHER,
)
token = request.cookies.get("access-token")
if token is not None and token != "":
manager = request.app.state.manager
@ -44,7 +55,14 @@ async def login_view(request: Request, msg: str | None = None):
else:
msg = None
return templates.TemplateResponse("login.html", {"request": request, "msg": msg})
return templates.TemplateResponse(
"login.html",
{
"request": request,
"msg": msg,
"remember": config.general.remember_me_duration,
},
)
@route.post("/login")
@ -52,37 +70,86 @@ async def post_login(
request: Request,
db: Session = Depends(get_db),
data: OAuth2PasswordRequestForm = Depends(),
rememberme: Annotated[str | None, Form()] = None,
config: Config = Depends(get_config),
):
if config.general.unauthenticated_access == "all":
return RedirectResponse(
request.url_for("get_severity_counts_view"),
status_code=status.HTTP_303_SEE_OTHER,
)
username = data.username
user = await queries.get_user(db, username)
invalid_credentials = templates.TemplateResponse(
"login.html",
{"request": request, "msg": "Sorry, invalid username or bad password."},
)
if user is None:
return invalid_credentials
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
if not pwd_context.verify(data.password, user.password):
return invalid_credentials
if config.general.ldap is not None:
from ldap import INVALID_CREDENTIALS # pylint: disable-msg=no-name-in-module
from argos.server.routes.dependencies import find_ldap_user
user.last_login_at = datetime.now()
db.commit()
invalid_credentials = templates.TemplateResponse(
"login.html",
{
"request": request,
"msg": "Sorry, invalid username or bad password. "
"Or the LDAP server is unreachable (see logs to verify).",
},
)
ldap_dn = await find_ldap_user(config, request.app.state.ldap, username)
if ldap_dn is None:
return invalid_credentials
try:
request.app.state.ldap.simple_bind_s(ldap_dn, data.password)
except INVALID_CREDENTIALS:
return invalid_credentials
else:
user = await queries.get_user(db, username)
if user is None:
return invalid_credentials
pwd_context = CryptContext(schemes=["bcrypt"], deprecated="auto")
if not pwd_context.verify(data.password, user.password):
return invalid_credentials
user.last_login_at = datetime.now()
db.commit()
manager = request.app.state.manager
token = manager.create_access_token(
data={"sub": username}, expires=timedelta(days=7)
)
session_duration = config.general.session_duration
if config.general.remember_me_duration is not None and rememberme == "on":
session_duration = config.general.remember_me_duration
delta = timedelta(minutes=session_duration)
token = manager.create_access_token(data={"sub": username}, expires=delta)
response = RedirectResponse(
request.url_for("get_severity_counts_view"),
status_code=status.HTTP_303_SEE_OTHER,
)
manager.set_cookie(response, token)
response.set_cookie(
key=manager.cookie_name,
value=token,
httponly=True,
samesite="strict",
expires=int(delta.total_seconds()),
)
return response
@route.get("/logout")
async def logout_view(request: Request, user: User | None = Depends(get_manager)):
async def logout_view(
request: Request,
config: Config = Depends(get_config),
user: User | None = Depends(get_manager),
):
if config.general.unauthenticated_access == "all":
return RedirectResponse(
request.url_for("get_severity_counts_view"),
status_code=status.HTTP_303_SEE_OTHER,
)
response = RedirectResponse(
request.url_for("login_view").include_query_params(msg="logout"),
status_code=status.HTTP_303_SEE_OTHER,
@ -112,6 +179,7 @@ async def get_severity_counts_view(
"agents": agents,
"auto_refresh_enabled": auto_refresh_enabled,
"auto_refresh_seconds": auto_refresh_seconds,
"user": user,
},
)
@ -120,9 +188,14 @@ async def get_severity_counts_view(
async def get_domains_view(
request: Request,
user: User | None = Depends(get_manager),
config: Config = Depends(get_config),
db: Session = Depends(get_db),
):
"""Show all tasks and their current state"""
if config.general.unauthenticated_access == "dashboard":
if user is None:
raise NotAuthenticatedException
tasks = db.query(Task).all()
domains_severities = defaultdict(list)
@ -163,6 +236,7 @@ async def get_domains_view(
"last_checks": domains_last_checks,
"total_task_count": len(tasks),
"agents": agents,
"user": user,
},
)
@ -172,12 +246,23 @@ async def get_domain_tasks_view(
request: Request,
domain: str,
user: User | None = Depends(get_manager),
config: Config = Depends(get_config),
db: Session = Depends(get_db),
):
"""Show all tasks attached to a domain"""
if config.general.unauthenticated_access == "dashboard":
if user is None:
raise NotAuthenticatedException
tasks = db.query(Task).filter(Task.domain.contains(f"//{domain}")).all()
return templates.TemplateResponse(
"domain.html", {"request": request, "domain": domain, "tasks": tasks}
"domain.html",
{
"request": request,
"domain": domain,
"tasks": tasks,
"user": user,
},
)
@ -186,12 +271,23 @@ async def get_result_view(
request: Request,
result_id: int,
user: User | None = Depends(get_manager),
config: Config = Depends(get_config),
db: Session = Depends(get_db),
):
"""Show the details of a result"""
if config.general.unauthenticated_access == "dashboard":
if user is None:
raise NotAuthenticatedException
result = db.query(Result).get(result_id)
return templates.TemplateResponse(
"result.html", {"request": request, "result": result, "error": Status.ERROR}
"result.html",
{
"request": request,
"result": result,
"error": Status.ERROR,
"user": user,
},
)
@ -204,6 +300,10 @@ async def get_task_results_view(
config: Config = Depends(get_config),
):
"""Show history of a tasks results"""
if config.general.unauthenticated_access == "dashboard":
if user is None:
raise NotAuthenticatedException
results = (
db.query(Result)
.filter(Result.task_id == task_id)
@ -222,6 +322,7 @@ async def get_task_results_view(
"task": task,
"description": description,
"error": Status.ERROR,
"user": user,
},
)
@ -230,9 +331,14 @@ async def get_task_results_view(
async def get_agents_view(
request: Request,
user: User | None = Depends(get_manager),
config: Config = Depends(get_config),
db: Session = Depends(get_db),
):
"""Show argos agents and the last time the server saw them"""
if config.general.unauthenticated_access == "dashboard":
if user is None:
raise NotAuthenticatedException
last_seen = (
db.query(Result.agent_id, func.max(Result.submitted_at).label("submitted_at"))
.group_by(Result.agent_id)
@ -240,7 +346,12 @@ async def get_agents_view(
)
return templates.TemplateResponse(
"agents.html", {"request": request, "last_seen": last_seen}
"agents.html",
{
"request": request,
"last_seen": last_seen,
"user": user,
},
)
@ -255,8 +366,21 @@ async def set_refresh_cookies_view(
request.url_for("get_severity_counts_view"),
status_code=status.HTTP_303_SEE_OTHER,
)
response.set_cookie(key="auto_refresh_enabled", value=str(auto_refresh_enabled))
# Cookies age in Chrome cant be more than 400 days
# https://developer.chrome.com/blog/cookie-max-age-expires
delta = int(timedelta(days=400).total_seconds())
response.set_cookie(
key="auto_refresh_seconds", value=str(max(5, int(auto_refresh_seconds)))
key="auto_refresh_enabled",
value=str(auto_refresh_enabled),
httponly=True,
samesite="strict",
expires=delta,
)
response.set_cookie(
key="auto_refresh_seconds",
value=str(max(5, int(auto_refresh_seconds))),
httponly=True,
samesite="strict",
expires=delta,
)
return response

View file

@ -1,12 +1,26 @@
"""Pydantic schemas for server"""
import sys
from pathlib import Path
import yaml
from yamlinclude import YamlIncludeConstructor
from pydantic import ValidationError
from argos.logging import logger
from argos.schemas.config import Config
def read_config(yaml_file):
try:
config = read_yaml_config(yaml_file)
return config
except ValidationError as err:
logger.error("Errors where found while reading configuration:")
for error in err.errors():
logger.error("%s is %s", error["loc"], error["type"])
sys.exit(1)
def read_yaml_config(filename: str) -> Config:
parsed = _load_yaml(filename)
return Config(**parsed)

View file

@ -63,6 +63,8 @@
Agents
</a>
</li>
{% set unauthenticated_access = request.app.state.config.general.unauthenticated_access %}
{% if (user is defined and user is not none) or unauthenticated_access == "all" %}
<li>
<a href="#"
id="reschedule-all"
@ -72,13 +74,24 @@
Reschedule non-ok checks
</a>
</li>
{% endif %}
{% if user is defined and user is not none %}
<li>
<a href="{{ url_for('logout_view') }}"
class="outline {{ 'contrast' if request.url == url_for('get_agents_view') }}"
class="outline }}"
role="button">
Logout
</a>
</li>
{% elif unauthenticated_access != "all" %}
<li>
<a href="{{ url_for('login_view') }}"
class="outline }}"
role="button">
Login
</a>
</li>
{% endif %}
</ul>
</details>
</li>

View file

@ -16,7 +16,7 @@
<tbody id="domains-body">
{% for task in tasks %}
<tr scope="row">
<td>{{ task.url }}</td>
<td>{{ task.url }} (IPv{{ task.ip_version }})</td>
<td>{{ task.check }}</td>
<td class="status highlight">
{% if task.status %}

View file

@ -12,15 +12,25 @@
</a>
</li>
</ul>
<ul>
{# djlint:off H021 #}
<ul id="js-only" style="display: none; ">{# djlint:on #}
<li>
<input id="domain-search"
type="search"
spellcheck="false"
placeholder="Filter domains list"
aria-label="Filter domains list"
/>
</li>
<li>
<label for="select-status">Show domains with status:</label>
<select id="select-status">
<option value="all">All</option>
<option value="not-ok" selected>Not OK</option>
<option value="ok">✅ OK</option>
<option value="warning">⚠️ Warning</option>
<option value="critical">❌ Critical</option>
<option value="unknown">❔ Unknown</option>
<option value="all">All</option>
</select>
</li>
</ul>
@ -36,7 +46,8 @@
<tbody id="domains-body">
{% for (domain, status) in domains %}
<tr data-status={{ status }}>
<tr data-status="{{ status }}"
data-domain="{{ domain }}">
<td>
<a href="{{ url_for('get_domain_tasks_view', domain=domain) }}">
{{ domain }}
@ -60,20 +71,47 @@
</table>
</div>
<script>
document.getElementById('select-status').addEventListener('change', (e) => {
if (e.currentTarget.value === 'all') {
function filterDomains() {
let status = document.getElementById('select-status');
let filter = document.getElementById('domain-search').value;
console.log(filter)
if (status.value === 'all') {
document.querySelectorAll('[data-status]').forEach((item) => {
item.style.display = null;
if (filter && item.dataset.domain.indexOf(filter) == -1) {
item.style.display = 'none';
} else {
item.style.display = null;
}
})
} else if (status.value === 'not-ok') {
document.querySelectorAll('[data-status]').forEach((item) => {
if (item.dataset.status !== 'ok') {
if (filter && item.dataset.domain.indexOf(filter) == -1) {
item.style.display = 'none';
} else {
item.style.display = null;
}
} else {
item.style.display = 'none';
}
})
} else {
document.querySelectorAll('[data-status]').forEach((item) => {
if (item.dataset.status === e.currentTarget.value) {
item.style.display = null;
if (item.dataset.status === status.value) {
if (filter && item.dataset.domain.indexOf(filter) == -1) {
item.style.display = 'none';
} else {
item.style.display = null;
}
} else {
item.style.display = 'none';
}
})
}
});
}
document.getElementById('select-status').addEventListener('change', filterDomains);
document.getElementById('domain-search').addEventListener('input', filterDomains);
filterDomains()
document.getElementById('js-only').style.display = null;
</script>
{% endblock content %}

View file

@ -16,6 +16,14 @@
name="password"
type="password"
form="login">
{% if remember is not none %}
<label>
<input type="checkbox"
name="rememberme"
form="login">
Remember me
</label>
{% endif %}
<form id="login"
method="post"
action="{{ url_for('post_login') }}">

View file

@ -82,6 +82,48 @@ caption: argos-config.yaml
- json-is: '{"foo": "bar", "baz": 42}'
```
## Add data to requests
If you want to specify query parameters, just put them in the path:
```{code-block} yaml
websites:
- domain: "https://contact.example.org"
paths:
- path: "/index.php?action=show_messages"
method: "GET"
```
If you want, for example, to test a form and send some data to it:
```{code-block} yaml
websites:
- domain: "https://contact.example.org"
paths:
- path: "/"
method: "POST"
request_data:
# These are the data sent to the server: title and msg
data:
title: "Hello my friend"
msg: "How are you today?"
# To send data as JSON (optional, default is false):
is_json: true
```
If you need to send some headers in the request:
```{code-block} yaml
websites:
- domain: "https://contact.example.org"
paths:
- path: "/api/mail"
method: "PUT"
request_data:
headers:
Authorization: "Bearer foo-bar-baz"
```
## SSL certificate expiration
Checks that the SSL certificate will not expire soon. You need to define the thresholds in the configuration, and set the `on-check` option to enable the check.

View file

@ -60,7 +60,9 @@ Options:
--max-tasks INTEGER Number of concurrent tasks this agent can run
--wait-time INTEGER Waiting time between two polls on the server
(seconds)
--log-level [DEBUG|INFO|WARNING|ERROR|CRITICAL]
--log-level [debug|info|warning|error|critical]
--user-agent TEXT A custom string to append to the User-Agent
header
--help Show this message and exit.
```
@ -82,7 +84,6 @@ Options:
--help Show this message and exit.
Commands:
cleandb Clean the database (to run routinely)
generate-config Output a self-documented example config file.
generate-token Generate a token for agents
migrate Run database migrations
@ -93,7 +94,6 @@ Commands:
test-gotify Send a test gotify notification
test-mail Send a test email
user User management
watch-agents Watch agents (to run routinely)
```
<!--[[[end]]]
@ -150,65 +150,6 @@ Options:
-->
### Server cleandb
<!--
.. [[[cog
help(["server", "cleandb", "--help"])
.. ]]] -->
```man
Usage: argos server cleandb [OPTIONS]
Clean the database (to run routinely)
- Removes old results from the database.
- Removes locks from tasks that have been locked for too long.
Options:
--max-results INTEGER Number of results per task to keep
--max-lock-seconds INTEGER The number of seconds after which a lock is
considered stale, must be higher than 60 (the
checks have a timeout value of 60 seconds)
--config TEXT Path of the configuration file. If ARGOS_YAML_FILE
environment variable is set, its value will be
used instead. Default value: argos-config.yaml and
/etc/argos/config.yaml as fallback.
--help Show this message and exit.
```
<!--[[[end]]]
-->
### Server watch-agents
<!--
.. [[[cog
help(["server", "cleandb", "--help"])
.. ]]] -->
```man
Usage: argos server cleandb [OPTIONS]
Clean the database (to run routinely)
- Removes old results from the database.
- Removes locks from tasks that have been locked for too long.
Options:
--max-results INTEGER Number of results per task to keep
--max-lock-seconds INTEGER The number of seconds after which a lock is
considered stale, must be higher than 60 (the
checks have a timeout value of 60 seconds)
--config TEXT Path of the configuration file. If ARGOS_YAML_FILE
environment variable is set, its value will be
used instead. Default value: argos-config.yaml and
/etc/argos/config.yaml as fallback.
--help Show this message and exit.
```
<!--[[[end]]]
-->
### Server reload-config
<!--
@ -222,10 +163,15 @@ Usage: argos server reload-config [OPTIONS]
Read tasks configuration and add/delete tasks in database if needed
Options:
--config TEXT Path of the configuration file. If ARGOS_YAML_FILE environment
variable is set, its value will be used instead. Default value:
argos-config.yaml and /etc/argos/config.yaml as fallback.
--help Show this message and exit.
--config TEXT Path of the configuration file. If ARGOS_YAML_FILE
environment variable is set, its value will be used
instead. Default value: argos-config.yaml and
/etc/argos/config.yaml as fallback.
--enqueue / --no-enqueue Let Argos main recurring tasks handle
configurations loading. It may delay the
application of the new configuration up to 2
minutes. Default is --no-enqueue
--help Show this message and exit.
```
<!--[[[end]]]
@ -276,9 +222,15 @@ Options:
### Server user management
To access Argos web interface, you need to create at least one user.
You can choose to protect Argos web interface with a user system, in which case youll need to create at least one user.
You can manage users only through CLI.
See [`unauthenticated_access` in the configuration file](configuration.md) to allow partial or total unauthenticated access to Argos.
See [`ldap` in the configuration file](configuration.md) to authenticate users against a LDAP server instead of Argos database.
You can manage Argos users only through CLI.
NB: you cant manage the LDAP users with Argos.
<!--
.. [[[cog
@ -473,7 +425,7 @@ Options:
<!--[[[end]]]
-->
#### Use as a nagios probe
### Use as a nagios probe
You can directly use Argos to get an output and an exit code usable with Nagios.
@ -497,7 +449,7 @@ Options:
<!--[[[end]]]
-->
#### Test the email settings
### Test the email settings
You can verify that your mail settings are ok by sending a test email.
@ -522,7 +474,7 @@ Options:
<!--[[[end]]]
-->
#### Test the Gotify settings
### Test the Gotify settings
You can verify that your Gotify settings are ok by sending a test notification.
@ -547,7 +499,7 @@ Options:
<!--[[[end]]]
-->
#### Test the Apprise settings
### Test the Apprise settings
You can verify that your Apprise settings are ok by sending a test notification.

View file

@ -14,7 +14,9 @@ description: Many thanks to their developers!
- [Alembic](https://alembic.sqlalchemy.org) is used for DB migrations;
- [Tenacity](https://github.com/jd/tenacity) a small utility to retry a function in case an error occured;
- [Uvicorn](https://www.uvicorn.org/) is the tool used to run our server;
- [Gunicorn](https://gunicorn.org/) is the recommended WSGI HTTP server for production.
- [Gunicorn](https://gunicorn.org/) is the recommended WSGI HTTP server for production;
- [Apprise](https://github.com/caronc/apprise/wiki) allows Argos to send notifications through a lot of channels;
- [FastAPI Utilities](https://fastapiutils.github.io/fastapi-utils/) is in charge of recurring tasks.
## CSS framework

View file

@ -10,7 +10,12 @@ First, do your changes in the code, change the model, add new tables, etc. Once
you're done, you can create a new migration.
```bash
venv/bin/alembic -c argos/server/migrations/alembic.ini revision --autogenerate -m "migration reason"
venv/bin/alembic -c argos/server/migrations/alembic.ini revision \
--autogenerate -m "migration reason"
```
Edit the created file to remove comments and adapt it to make sure the migration is complete (Alembic is not powerful enough to cover all the corner cases).
In case you want to add an `Enum` type and use it in an existing table, please have a look at [`argos/server/migrations/versions/dcf73fa19fce_specify_check_method.py`](https://framagit.org/framasoft/framaspace/argos/-/blob/main/argos/server/migrations/versions/dcf73fa19fce_specify_check_method.py).
If you want to add an `Enum` type in a new table, you can do like in [`argos/server/migrations/versions/7d480e6f1112_initial_migrations.py`](https://framagit.org/framasoft/framaspace/argos/-/blob/main/argos/server/migrations/versions/7d480e6f1112_initial_migrations.py)

View file

@ -41,7 +41,8 @@ git add argos/__init__.py CHANGELOG.md
git commit -m "🏷 — Bump version ($(hatch version))"
# Create a tag on the git repository and push it
git tag "$(hatch version)" && git push
git tag "$(hatch version)" -m "$(hatch version)" &&
git push --follow-tags
# Build the project
hatch build --clean

View file

@ -10,6 +10,14 @@ NB: if you want a quick-installation guide, we [got you covered](tl-dr.md).
- Python 3.11+
- PostgreSQL 13+ (for production)
### Optional dependencies
If you want to use LDAP authentication, you will need to install some packages (here for a Debian-based system):
```bash
apt-get install build-essential python3-dev libldap-dev libsasl2-dev
```
## Recommendation
Create a dedicated user for argos:
@ -45,6 +53,18 @@ For production, we recommend the use of [Gunicorn](https://gunicorn.org/), which
pip install "argos-monitoring[gunicorn]"
```
If you want to use LDAP authentication, youll need to install Argos this way:
```bash
pip install "argos-monitoring[ldap]"
```
And for an installation with Gunicorn and LDAP authentication:
```bash
pip install "argos-monitoring[gunicorn,ldap]"
```
## Install from sources
Once you got the source locally, create a virtualenv and install the dependencies:
@ -171,18 +191,6 @@ The only requirement is that the agent can reach the server through HTTP or HTTP
argos agent http://localhost:8000 "auth-token"
```
## Cleaning the database
You have to run cleaning task periodically. `argos server cleandb --help` will give you more information on how to do that.
Here is a crontab example, which will clean the db each hour:
```bash
# Run the cleaning tasks every hour (at minute 7)
# Keeps 10 results per task, and remove tasks locks older than 1 hour
7 * * * * argos server cleandb --max-results 10 --max-lock-seconds 3600
```
## Watch the agents
In order to be sure that agents are up and communicate with the server, you can periodically run the `argos server watch-agents` command.

View file

@ -90,13 +90,13 @@ User=argos
WorkingDirectory=/opt/argos/
EnvironmentFile=/etc/default/argos-server
ExecStartPre=/opt/argos/venv/bin/argos server migrate
ExecStartPre=/opt/argos/venv/bin/argos server reload-config
ExecStartPre=/opt/argos/venv/bin/argos server reload-config --enqueue
ExecStart=/opt/argos/venv/bin/gunicorn "argos.server.main:get_application()" \\
--workers \$ARGOS_SERVER_WORKERS \\
--worker-class uvicorn.workers.UvicornWorker \\
--bind \$ARGOS_SERVER_SOCKET \\
--forwarded-allow-ips \$ARGOS_SERVER_FORWARDED_ALLOW_IPS
ExecReload=/opt/argos/venv/bin/argos server reload-config
ExecReload=/opt/argos/venv/bin/argos server reload-config --enqueue
SyslogIdentifier=argos-server
[Install]
@ -153,8 +153,7 @@ If all works well, you have to put some cron tasks in `argos` crontab:
```bash
cat <<EOF | crontab -u argos -
*/10 * * * * /opt/argos/venv/bin/argos server cleandb --max-lock-seconds 120 --max-results 1200
*/10 * * * * /opt/argos/venv/bin/argos server watch-agents --time-without-agent 10
*/10 * * * * /opt/argos/venv/bin/argos server watch-agents --time-without-agent 10:
EOF
```

View file

@ -25,12 +25,15 @@ dependencies = [
"apprise>=1.9.0,<2",
"bcrypt>=4.1.3,<5",
"click>=8.1,<9",
"durations-nlp>=1.0.1,<2",
"fastapi>=0.103,<0.104",
"fastapi-login>=1.10.0,<2",
"httpx>=0.27.2,<1",
"fastapi-utils>=0.8.0,<0.9",
"httpx>=0.27.2,<0.28.0",
"Jinja2>=3.0,<4",
"jsonpointer>=3.0,<4",
"passlib>=1.7.4,<2",
"psutil>=5.9.8,<6",
"psycopg2-binary>=2.9,<3",
"pydantic[email]>=2.4,<3",
"pydantic-settings>=2.0,<3",
@ -40,6 +43,7 @@ dependencies = [
"sqlalchemy[asyncio]>=2.0,<3",
"sqlalchemy-utils>=0.41,<1",
"tenacity>=8.2,<9",
"typing_inspect>=0.9.0,<1",
"uvicorn>=0.23,<1",
]
@ -47,12 +51,12 @@ dependencies = [
dev = [
"black==23.3.0",
"djlint>=1.34.0",
"hatch==1.9.4",
"hatch==1.13.0",
"ipdb>=0.13,<0.14",
"ipython>=8.16,<9",
"isort==5.11.5",
"mypy>=1.10.0,<2",
"pylint>=3.0.2",
"pylint>=3.2.5",
"pytest-asyncio>=0.21,<1",
"pytest>=6.2.5",
"respx>=0.20,<1",
@ -71,6 +75,9 @@ docs = [
gunicorn = [
"gunicorn>=21.2,<22",
]
ldap = [
"python-ldap>=3.4.4,<4",
]
[project.urls]
homepage = "https://argos-monitoring.framasoft.org/"

View file

@ -1,9 +1,21 @@
---
general:
# Except for frequency and recheck_delay settings, changes in general
# section of the configuration will need a restart of argos server.
db:
# The database URL, as defined in SQLAlchemy docs : https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls
# The database URL, as defined in SQLAlchemy docs:
# https://docs.sqlalchemy.org/en/20/core/engines.html#database-urls
url: "sqlite:////tmp/test-argos.db"
# Can be "production", "dev", "test".
# If not present, default value is "production"
env: test
# To get a good string for cookie_secret, run:
# openssl rand -hex 32
cookie_secret: "foo-bar-baz"
# Default delay for checks.
# Can be superseeded in domain configuration.
# For ex., to run checks every 5 minutes:
frequency: "1m"
alerts:
ok:
@ -14,12 +26,37 @@ general:
- local
unknown:
- local
no_agent:
- local
service:
secrets:
# Secrets can be generated using `argos server generate-token`.
# You need at least one. Write them as a list, like:
# - secret_token
- "O4kt8Max9/k0EmHaEJ0CGGYbBNFmK8kOZNIoUk3Kjwc"
- "x1T1VZR51pxrv5pQUyzooMG4pMUvHNMhA5y/3cUsYVs="
ssl:
thresholds:
- "1d": critical
"5d": warning
- "5d": warning
# Argos will execute some tasks in the background for you
# every 2 minutes and needs some configuration for that
recurring_tasks:
# Maximum age of results
# Use m for minutes, h for hours, d for days
# w for weeks, M for months, y for years
# See https://github.com/timwedde/durations_nlp#scales-reference for details
max_results_age: "1d"
# Max number of seconds a task can be locked
# Minimum value is 61, default is 100
max_lock_seconds: 100
# Max number of seconds without seing an agent
# before sending an alert
# Minimum value is 61, default is 300
time_without_agent: 300
# It's also possible to define the checks in another file
# with the include syntax:
#
websites: !include websites.yaml

View file

@ -21,7 +21,7 @@ def test_tasks_retrieval_and_results(authorized_client, app):
assert response.status_code == 200
tasks = response.json()
assert len(tasks) == 2
assert len(tasks) == 4
results = []
for task in tasks:
@ -33,7 +33,7 @@ def test_tasks_retrieval_and_results(authorized_client, app):
response = client.post("/api/results", json=data)
assert response.status_code == 201
assert app.state.db.query(models.Result).count() == 2
assert app.state.db.query(models.Result).count() == 4
# The list of tasks should be empty now
response = client.get("/api/tasks")
@ -60,6 +60,8 @@ def ssl_task(db):
task = models.Task(
url="https://exemple.com/",
domain="https://exemple.com/",
ip_version="6",
method="GET",
check="ssl-certificate-expiration",
expected="on-check",
frequency=1,

View file

@ -35,7 +35,13 @@ def ssl_task(now):
id=1,
url="https://example.org",
domain="https://example.org",
ip_version="6",
method="GET",
request_data=None,
task_group="GET-6-https://example.org",
check="ssl-certificate-expiration",
retry_before_notification=0,
contiguous_failures=0,
expected="on-check",
selected_at=now,
selected_by="pytest",
@ -51,6 +57,9 @@ async def test_ssl_check_accepts_statuts(
return_value=httpx.Response(http_status, extensions=httpx_extensions_ssl),
)
async with httpx.AsyncClient() as client:
check = SSLCertificateExpiration(client, ssl_task)
check_response = await check.run()
check = SSLCertificateExpiration(ssl_task)
response = await client.request(
method=ssl_task.method, url=ssl_task.url, timeout=60
)
check_response = await check.run(response)
assert check_response.status == "on-check"

View file

@ -10,9 +10,9 @@ from argos.server.models import Result, Task, User
@pytest.mark.asyncio
async def test_remove_old_results(db, ten_tasks): # pylint: disable-msg=redefined-outer-name
for _task in ten_tasks:
for _ in range(5):
for iterator in range(5):
result = Result(
submitted_at=datetime.now(),
submitted_at=datetime.now() - timedelta(seconds=iterator * 2),
status="success",
context={"foo": "bar"},
task=_task,
@ -24,12 +24,12 @@ async def test_remove_old_results(db, ten_tasks): # pylint: disable-msg=redefi
# So we have 5 results per tasks
assert db.query(Result).count() == 50
# Keep only 2
deleted = await queries.remove_old_results(db, 2)
assert deleted == 30
assert db.query(Result).count() == 20
# Keep only those newer than 1 second ago
deleted = await queries.remove_old_results(db, 6)
assert deleted == 20
assert db.query(Result).count() == 30
for _task in ten_tasks:
assert db.query(Result).filter(Result.task == _task).count() == 2
assert db.query(Result).filter(Result.task == _task).count() == 3
@pytest.mark.asyncio
@ -70,7 +70,7 @@ async def test_update_from_config_with_duplicate_tasks(db, empty_config): # py
await queries.update_from_config(db, empty_config)
# Only one path has been saved in the database
assert db.query(Task).count() == 1
assert db.query(Task).count() == 2
# Calling again with the same data works, and will not result in more tasks being
# created.
@ -87,6 +87,7 @@ async def test_update_from_config_db_can_remove_duplicates_and_old_tasks(
same_task = Task(
url=task.url,
domain=task.domain,
ip_version="6",
check=task.check,
expected=task.expected,
frequency=task.frequency,
@ -108,7 +109,7 @@ async def test_update_from_config_db_can_remove_duplicates_and_old_tasks(
empty_config.websites = [website]
await queries.update_from_config(db, empty_config)
assert db.query(Task).count() == 2
assert db.query(Task).count() == 4
website = schemas.config.Website(
domain=task.domain,
@ -122,7 +123,7 @@ async def test_update_from_config_db_can_remove_duplicates_and_old_tasks(
empty_config.websites = [website]
await queries.update_from_config(db, empty_config)
assert db.query(Task).count() == 1
assert db.query(Task).count() == 2
@pytest.mark.asyncio
@ -136,7 +137,7 @@ async def test_update_from_config_db_updates_existing_tasks(db, empty_config, ta
empty_config.websites = [website]
await queries.update_from_config(db, empty_config)
assert db.query(Task).count() == 1
assert db.query(Task).count() == 2
@pytest.mark.asyncio
@ -212,6 +213,7 @@ def task(db):
_task = Task(
url="https://www.example.com",
domain="https://www.example.com",
ip_version="6",
check="body-contains",
expected="foo",
frequency=1,
@ -233,6 +235,7 @@ def empty_config():
warning=["", ""],
critical=["", ""],
unknown=["", ""],
no_agent=["", ""],
),
),
service=schemas.config.Service(
@ -241,6 +244,11 @@ def empty_config():
]
),
ssl=schemas.config.SSL(thresholds=[]),
recurring_tasks=schemas.config.RecurringTasks(
max_results_age="6s",
max_lock_seconds=120,
time_without_agent=300,
),
websites=[],
)
@ -271,6 +279,7 @@ def ten_locked_tasks(db):
_task = Task(
url="https://www.example.com",
domain="example.com",
ip_version="6",
check="body-contains",
expected="foo",
frequency=1,
@ -291,6 +300,7 @@ def ten_tasks(db):
_task = Task(
url="https://www.example.com",
domain="example.com",
ip_version="6",
check="body-contains",
expected="foo",
frequency=1,
@ -311,6 +321,7 @@ def ten_warning_tasks(db):
_task = Task(
url="https://www.example.com",
domain="example.com",
ip_version="6",
check="body-contains",
expected="foo",
frequency=1,
@ -331,6 +342,7 @@ def ten_critical_tasks(db):
_task = Task(
url="https://www.example.com",
domain="example.com",
ip_version="6",
check="body-contains",
expected="foo",
frequency=1,
@ -351,6 +363,7 @@ def ten_ok_tasks(db):
_task = Task(
url="https://www.example.com",
domain="example.com",
ip_version="6",
check="body-contains",
expected="foo",
frequency=1,

View file

@ -1,51 +0,0 @@
import pytest
from argos.schemas.utils import string_to_duration
def test_string_to_duration_days():
assert string_to_duration("1d", target="days") == 1
assert string_to_duration("1w", target="days") == 7
assert string_to_duration("3w", target="days") == 21
assert string_to_duration("3mo", target="days") == 90
assert string_to_duration("1y", target="days") == 365
with pytest.raises(ValueError):
string_to_duration("3h", target="days")
with pytest.raises(ValueError):
string_to_duration("1", target="days")
def test_string_to_duration_hours():
assert string_to_duration("1h", target="hours") == 1
assert string_to_duration("1d", target="hours") == 24
assert string_to_duration("1w", target="hours") == 7 * 24
assert string_to_duration("3w", target="hours") == 21 * 24
assert string_to_duration("3mo", target="hours") == 3 * 30 * 24
with pytest.raises(ValueError):
string_to_duration("1", target="hours")
def test_string_to_duration_minutes():
assert string_to_duration("1m", target="minutes") == 1
assert string_to_duration("1h", target="minutes") == 60
assert string_to_duration("1d", target="minutes") == 60 * 24
assert string_to_duration("3mo", target="minutes") == 60 * 24 * 30 * 3
with pytest.raises(ValueError):
string_to_duration("1", target="minutes")
def test_conversion_to_greater_units_throws():
# hours and minutes cannot be converted to days
with pytest.raises(ValueError):
string_to_duration("1h", target="days")
with pytest.raises(ValueError):
string_to_duration("1m", target="days")
# minutes cannot be converted to hours
with pytest.raises(ValueError):
string_to_duration("1m", target="hours")

View file

@ -1,6 +1,7 @@
---
- domain: "https://mypads.framapad.org"
paths:
- path: "/mypads/"
checks:
- status-is: 200
- body-contains: '<div id= "mypads"></div>'
- body-contains: '<div id= "mypads"></div>'