mirror of
https://github.com/freedomofpress/dangerzone.git
synced 2025-05-17 18:51:50 +02:00
Compare commits
20 commits
e9fae8e7d6
...
2b45c5cfa0
Author | SHA1 | Date | |
---|---|---|---|
![]() |
2b45c5cfa0 | ||
![]() |
cbb7ed902f | ||
![]() |
eacf1eb2fa | ||
![]() |
505db39ca0 | ||
![]() |
0f0fa49923 | ||
![]() |
8911b72529 | ||
![]() |
c407e2ff84 | ||
![]() |
725ce3b9c7 | ||
![]() |
afc5e8e636 | ||
![]() |
5bb37ef48f | ||
![]() |
c70d1970dd | ||
![]() |
ec616be2c0 | ||
![]() |
acbc433717 | ||
![]() |
685cf431a3 | ||
![]() |
2b71e615a8 | ||
![]() |
7f418118e6 | ||
![]() |
02602b072a | ||
![]() |
acf20ef700 | ||
![]() |
3499010d8e | ||
![]() |
2423fc18c5 |
19 changed files with 176 additions and 139 deletions
20
.github/workflows/check_repos.yml
vendored
20
.github/workflows/check_repos.yml
vendored
|
@ -46,16 +46,30 @@ jobs:
|
|||
apt update
|
||||
apt-get install python-all -y
|
||||
|
||||
- name: Add GPG key for the packages.freedom.press
|
||||
- name: Add packages.freedom.press PGP key (gpg)
|
||||
if: matrix.version != 'trixie'
|
||||
run: |
|
||||
apt-get update && apt-get install -y gnupg2 ca-certificates
|
||||
dirmngr # NOTE: This is a command that's necessary only in containers
|
||||
# The key needs to be in the GPG keybox database format so the
|
||||
# signing subkey is detected by apt-secure.
|
||||
gpg --keyserver hkps://keys.openpgp.org \
|
||||
--no-default-keyring --keyring ./fpf-apt-tools-archive-keyring.gpg \
|
||||
--recv-keys "DE28 AB24 1FA4 8260 FAC9 B8BA A7C9 B385 2260 4281"
|
||||
mkdir -p /etc/apt/keyrings/
|
||||
mv fpf-apt-tools-archive-keyring.gpg /etc/apt/keyrings
|
||||
mv ./fpf-apt-tools-archive-keyring.gpg /etc/apt/keyrings/.
|
||||
|
||||
- name: Add packages.freedom.press PGP key (sq)
|
||||
if: matrix.version == 'trixie'
|
||||
run: |
|
||||
apt-get update && apt-get install -y ca-certificates sq
|
||||
mkdir -p /etc/apt/keyrings/
|
||||
# On debian trixie, apt-secure uses `sqv` to verify the signatures
|
||||
# so we need to retrieve PGP keys and store them using the base64 format.
|
||||
sq network keyserver \
|
||||
--server hkps://keys.openpgp.org \
|
||||
search "DE28 AB24 1FA4 8260 FAC9 B8BA A7C9 B385 2260 4281" \
|
||||
--output /etc/apt/keyrings/fpf-apt-tools-archive-keyring.gpg
|
||||
- name: Add packages.freedom.press to our APT sources
|
||||
run: |
|
||||
. /etc/os-release
|
||||
|
@ -75,8 +89,6 @@ jobs:
|
|||
strategy:
|
||||
matrix:
|
||||
include:
|
||||
- distro: fedora
|
||||
version: 39
|
||||
- distro: fedora
|
||||
version: 40
|
||||
- distro: fedora
|
||||
|
|
7
.github/workflows/ci.yml
vendored
7
.github/workflows/ci.yml
vendored
|
@ -473,8 +473,6 @@ jobs:
|
|||
bash -c 'cd dangerzone; poetry run make test'
|
||||
|
||||
check-reproducibility:
|
||||
needs:
|
||||
- build-container-image
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v4
|
||||
|
@ -489,8 +487,9 @@ jobs:
|
|||
|
||||
- name: Verify that the Dockerfile matches the commited template and params
|
||||
run: |-
|
||||
poetry run jinja2 Dockerfile.in Dockerfile.env > out
|
||||
diff Dockerfile out
|
||||
cp Dockerfile Dockerfile.orig
|
||||
make Dockerfile
|
||||
diff Dockerfile.orig Dockerfile
|
||||
|
||||
- name: Build Dangerzone container image
|
||||
run: |
|
||||
|
|
2
.github/workflows/scan.yml
vendored
2
.github/workflows/scan.yml
vendored
|
@ -26,7 +26,7 @@ jobs:
|
|||
run: |
|
||||
date=$(date "+%Y%m%d")
|
||||
sed -i "s/DEBIAN_ARCHIVE_DATE=[0-9]\+/DEBIAN_ARCHIVE_DATE=${date}/" Dockerfile.env
|
||||
poetry run jinja2 Dockerfile.in Dockerfile.env > Dockerfile
|
||||
make Dockerfile
|
||||
- name: Build container image
|
||||
run: python3 ./install/common/build-image.py --runtime docker --no-save
|
||||
- name: Get image tag
|
||||
|
|
38
.grype.yaml
38
.grype.yaml
|
@ -2,10 +2,38 @@
|
|||
# latest release of Dangerzone, and offer our analysis.
|
||||
|
||||
ignore:
|
||||
# CVE-2024-11053
|
||||
# CVE-2023-45853
|
||||
# ==============
|
||||
#
|
||||
# NVD Entry: https://nvd.nist.gov/vuln/detail/CVE-2024-11053
|
||||
# Verdict: Dangerzone is not affected because libcurl is an HTTP client, and
|
||||
# the Dangerzone container does not make any network calls.
|
||||
- vulnerability: CVE-2024-11053
|
||||
# Debian tracker: https://security-tracker.debian.org/tracker/CVE-2023-45853
|
||||
# Verdict: Dangerzone is not affected because the zlib library in Debian is
|
||||
# built in a way that is not vulnerable.
|
||||
- vulnerability: CVE-2023-45853
|
||||
# CVE-2024-38428
|
||||
# ==============
|
||||
#
|
||||
# Debian tracker: https://security-tracker.debian.org/tracker/CVE-2024-38428
|
||||
# Verdict: Dangerzone is not affected because it doesn't use wget in the
|
||||
# container image (which also has no network connectivity).
|
||||
- vulnerability: CVE-2024-38428
|
||||
# CVE-2024-57823
|
||||
# ==============
|
||||
#
|
||||
# Debian tracker: https://security-tracker.debian.org/tracker/CVE-2024-57823
|
||||
# Verdict: Dangerzone is not affected. First things first, LibreOffice is
|
||||
# using this library for parsing RDF metadata in a document [1], and has
|
||||
# issued a fix for the vendored raptor2 package they have for other distros
|
||||
# [2].
|
||||
#
|
||||
# On the other hand, the Debian security team has stated that this is a minor
|
||||
# issue [3], and there's no fix from the developers yet. It seems that the
|
||||
# Debian package is not affected somehow by this CVE, probably due to the way
|
||||
# it's packaged.
|
||||
#
|
||||
# [1] https://wiki.documentfoundation.org/Documentation/DevGuide/Office_Development#RDF_metadata
|
||||
# [2] https://cgit.freedesktop.org/libreoffice/core/commit/?id=2b50dc0e4482ac0ad27d69147b4175e05af4fba4
|
||||
# [2] From https://security-tracker.debian.org/tracker/CVE-2024-57823:
|
||||
#
|
||||
# [bookworm] - raptor2 <postponed> (Minor issue, revisit when fixed upstream)
|
||||
#
|
||||
- vulnerability: CVE-2024-57823
|
||||
|
|
|
@ -6,8 +6,8 @@ ARG DEBIAN_IMAGE_DATE=20250113
|
|||
|
||||
FROM debian:bookworm-${DEBIAN_IMAGE_DATE}-slim
|
||||
|
||||
ARG GVISOR_ARCHIVE_DATE=20250106
|
||||
ARG DEBIAN_ARCHIVE_DATE=20250114
|
||||
ARG GVISOR_ARCHIVE_DATE=20250113
|
||||
ARG DEBIAN_ARCHIVE_DATE=20250120
|
||||
ARG H2ORESTART_CHECKSUM=7760dc2963332c50d15eee285933ec4b48d6a1de9e0c0f6082946f93090bd132
|
||||
ARG H2ORESTART_VERSION=v0.7.0
|
||||
|
||||
|
@ -62,7 +62,7 @@ RUN mkdir -p /opt/dangerzone/dangerzone
|
|||
RUN touch /opt/dangerzone/dangerzone/__init__.py
|
||||
|
||||
# Copy only the Python code, and not any produced .pyc files.
|
||||
COPY conversion/*.py /opt/dangerzone/dangerzone/conversion
|
||||
COPY conversion/*.py /opt/dangerzone/dangerzone/conversion/
|
||||
|
||||
# Let the entrypoint script write the OCI config for the inner container under
|
||||
# /config.json.
|
||||
|
@ -76,6 +76,6 @@ USER dangerzone
|
|||
# store the state of its containers.
|
||||
RUN mkdir /home/dangerzone/.containers
|
||||
|
||||
COPY container/entrypoint.py /
|
||||
COPY container_helpers/entrypoint.py /
|
||||
|
||||
ENTRYPOINT ["/entrypoint.py"]
|
||||
|
|
|
@ -1,9 +1,9 @@
|
|||
# Can be bumped to the latest date in https://hub.docker.com/_/debian/tags?name=bookworm-
|
||||
DEBIAN_IMAGE_DATE=20250113
|
||||
# Can be bumped to today's date
|
||||
DEBIAN_ARCHIVE_DATE=20250114
|
||||
DEBIAN_ARCHIVE_DATE=20250120
|
||||
# Can be bumped to the latest date in https://github.com/google/gvisor/tags
|
||||
GVISOR_ARCHIVE_DATE=20250106
|
||||
GVISOR_ARCHIVE_DATE=20250113
|
||||
# Can be bumped to the latest version and checksum from https://github.com/ebandal/H2Orestart/releases
|
||||
H2ORESTART_CHECKSUM=7760dc2963332c50d15eee285933ec4b48d6a1de9e0c0f6082946f93090bd132
|
||||
H2ORESTART_VERSION=v0.7.0
|
||||
|
|
|
@ -62,7 +62,7 @@ RUN mkdir -p /opt/dangerzone/dangerzone
|
|||
RUN touch /opt/dangerzone/dangerzone/__init__.py
|
||||
|
||||
# Copy only the Python code, and not any produced .pyc files.
|
||||
COPY conversion/*.py /opt/dangerzone/dangerzone/conversion
|
||||
COPY conversion/*.py /opt/dangerzone/dangerzone/conversion/
|
||||
|
||||
# Let the entrypoint script write the OCI config for the inner container under
|
||||
# /config.json.
|
||||
|
@ -76,6 +76,6 @@ USER dangerzone
|
|||
# store the state of its containers.
|
||||
RUN mkdir /home/dangerzone/.containers
|
||||
|
||||
COPY container/entrypoint.py /
|
||||
COPY container_helpers/entrypoint.py /
|
||||
|
||||
ENTRYPOINT ["/entrypoint.py"]
|
||||
|
|
21
INSTALL.md
21
INSTALL.md
|
@ -84,9 +84,20 @@ Dangerzone is available for:
|
|||
</tr>
|
||||
</table>
|
||||
|
||||
Add our repository following these instructions:
|
||||
First, retrieve the PGP keys.
|
||||
|
||||
Download the GPG key for the repo:
|
||||
Starting with Trixie, follow these instructions to download the PGP keys:
|
||||
|
||||
```bash
|
||||
sudo apt-get update && sudo apt-get install sq -y
|
||||
mkdir -p /etc/apt/keyrings/
|
||||
sq network keyserver \
|
||||
--server hkps://keys.openpgp.org \
|
||||
search "DE28 AB24 1FA4 8260 FAC9 B8BA A7C9 B385 2260 4281" \
|
||||
--output /etc/apt/keyrings/fpf-apt-tools-archive-keyring.gpg
|
||||
```
|
||||
|
||||
On other Debian-derivatives:
|
||||
|
||||
```sh
|
||||
sudo apt-get update && sudo apt-get install gnupg2 ca-certificates -y
|
||||
|
@ -94,10 +105,12 @@ gpg --keyserver hkps://keys.openpgp.org \
|
|||
--no-default-keyring --keyring ./fpf-apt-tools-archive-keyring.gpg \
|
||||
--recv-keys "DE28 AB24 1FA4 8260 FAC9 B8BA A7C9 B385 2260 4281"
|
||||
sudo mkdir -p /etc/apt/keyrings/
|
||||
sudo mv fpf-apt-tools-archive-keyring.gpg /etc/apt/keyrings
|
||||
sudo gpg --no-default-keyring --keyring ./fpf-apt-tools-archive-keyring.gpg \
|
||||
--armor --export "DE28 AB24 1FA4 8260 FAC9 B8BA A7C9 B385 2260 4281" \
|
||||
> /etc/apt/keyrings/fpf-apt-tools-archive-keyring.gpg
|
||||
```
|
||||
|
||||
Add the URL of the repo in your APT sources:
|
||||
Then, on all distributions, add the URL of the repo in your APT sources:
|
||||
|
||||
```sh
|
||||
. /etc/os-release
|
||||
|
|
3
Makefile
3
Makefile
|
@ -47,6 +47,9 @@ test-large: test-large-init ## Run large test set
|
|||
python -m pytest --tb=no tests/test_large_set.py::TestLargeSet -v $(JUNIT_FLAGS) --junitxml=$(TEST_LARGE_RESULTS)
|
||||
python $(TEST_LARGE_RESULTS)/report.py $(TEST_LARGE_RESULTS)
|
||||
|
||||
Dockerfile: Dockerfile.env Dockerfile.in
|
||||
poetry run jinja2 Dockerfile.in Dockerfile.env > Dockerfile
|
||||
|
||||
.PHONY: build-clean
|
||||
build-clean:
|
||||
doit clean
|
||||
|
|
|
@ -42,6 +42,12 @@ def print_header(s: str) -> None:
|
|||
type=click.UNPROCESSED,
|
||||
callback=args.validate_input_filenames,
|
||||
)
|
||||
@click.option(
|
||||
"--debug",
|
||||
"debug",
|
||||
flag_value=True,
|
||||
help="Run Dangerzone in debug mode, to get logs from gVisor.",
|
||||
)
|
||||
@click.version_option(version=get_version(), message="%(version)s")
|
||||
@errors.handle_document_errors
|
||||
def cli_main(
|
||||
|
@ -50,6 +56,7 @@ def cli_main(
|
|||
filenames: List[str],
|
||||
archive: bool,
|
||||
dummy_conversion: bool,
|
||||
debug: bool,
|
||||
) -> None:
|
||||
setup_logging()
|
||||
|
||||
|
@ -58,7 +65,7 @@ def cli_main(
|
|||
elif is_qubes_native_conversion():
|
||||
dangerzone = DangerzoneCore(Qubes())
|
||||
else:
|
||||
dangerzone = DangerzoneCore(Container())
|
||||
dangerzone = DangerzoneCore(Container(debug=debug))
|
||||
|
||||
display_banner()
|
||||
if len(filenames) == 1 and output_filename:
|
||||
|
|
|
@ -129,7 +129,9 @@ class DocumentToPixels(DangerzoneConverter):
|
|||
# At least .odt, .docx, .odg, .odp, .ods, and .pptx
|
||||
"application/zip": {
|
||||
"type": "libreoffice",
|
||||
# NOTE: Older `file` command cannot detect hwpx files properly.
|
||||
# NOTE: `file` command < 5.45 cannot detect hwpx files properly, so we
|
||||
# enable the extension in any case. See also:
|
||||
# https://github.com/freedomofpress/dangerzone/pull/460#issuecomment-1654166465
|
||||
"libreoffice_ext": "h2orestart.oxt",
|
||||
},
|
||||
# At least .doc, .docx, .odg, .odp, .odt, .pdf, .ppt, .pptx, .xls, and .xlsx
|
||||
|
|
|
@ -5,7 +5,9 @@ import platform
|
|||
import signal
|
||||
import subprocess
|
||||
import sys
|
||||
import threading
|
||||
from abc import ABC, abstractmethod
|
||||
from io import BytesIO
|
||||
from typing import IO, Callable, Iterator, Optional
|
||||
|
||||
import fitz
|
||||
|
@ -18,10 +20,6 @@ from ..util import get_tessdata_dir, replace_control_chars
|
|||
|
||||
log = logging.getLogger(__name__)
|
||||
|
||||
MAX_CONVERSION_LOG_CHARS = 150 * 50 # up to ~150 lines of 50 characters
|
||||
DOC_TO_PIXELS_LOG_START = "----- DOC TO PIXELS LOG START -----"
|
||||
DOC_TO_PIXELS_LOG_END = "----- DOC TO PIXELS LOG END -----"
|
||||
|
||||
TIMEOUT_EXCEPTION = 15
|
||||
TIMEOUT_GRACE = 15
|
||||
TIMEOUT_FORCE = 5
|
||||
|
@ -75,9 +73,9 @@ def read_int(f: IO[bytes]) -> int:
|
|||
return int.from_bytes(untrusted_int, "big", signed=False)
|
||||
|
||||
|
||||
def read_debug_text(f: IO[bytes], size: int) -> str:
|
||||
"""Read arbitrarily long text (for debug purposes), and sanitize it."""
|
||||
untrusted_text = f.read(size).decode("ascii", errors="replace")
|
||||
def sanitize_debug_text(text: bytes) -> str:
|
||||
"""Read all the buffer and return a sanitized version"""
|
||||
untrusted_text = text.decode("ascii", errors="replace")
|
||||
return replace_control_chars(untrusted_text, keep_newlines=True)
|
||||
|
||||
|
||||
|
@ -86,12 +84,16 @@ class IsolationProvider(ABC):
|
|||
Abstracts an isolation provider
|
||||
"""
|
||||
|
||||
def __init__(self) -> None:
|
||||
if getattr(sys, "dangerzone_dev", False) is True:
|
||||
def __init__(self, debug: bool = False) -> None:
|
||||
self.debug = debug
|
||||
if self.should_capture_stderr():
|
||||
self.proc_stderr = subprocess.PIPE
|
||||
else:
|
||||
self.proc_stderr = subprocess.DEVNULL
|
||||
|
||||
def should_capture_stderr(self) -> bool:
|
||||
return self.debug or getattr(sys, "dangerzone_dev", False)
|
||||
|
||||
@abstractmethod
|
||||
def install(self) -> bool:
|
||||
pass
|
||||
|
@ -327,7 +329,11 @@ class IsolationProvider(ABC):
|
|||
timeout_force: int = TIMEOUT_FORCE,
|
||||
) -> Iterator[subprocess.Popen]:
|
||||
"""Start a conversion process, pass it to the caller, and then clean it up."""
|
||||
# Store the proc stderr in memory
|
||||
stderr = BytesIO()
|
||||
p = self.start_doc_to_pixels_proc(document)
|
||||
stderr_thread = self.start_stderr_thread(p, stderr)
|
||||
|
||||
if platform.system() != "Windows":
|
||||
assert os.getpgid(p.pid) != os.getpgid(
|
||||
os.getpid()
|
||||
|
@ -343,15 +349,40 @@ class IsolationProvider(ABC):
|
|||
document, p, timeout_grace=timeout_grace, timeout_force=timeout_force
|
||||
)
|
||||
|
||||
# Read the stderr of the process only if:
|
||||
# * Dev mode is enabled.
|
||||
# * The process has exited (else we risk hanging).
|
||||
if getattr(sys, "dangerzone_dev", False) and p.poll() is not None:
|
||||
assert p.stderr
|
||||
debug_log = read_debug_text(p.stderr, MAX_CONVERSION_LOG_CHARS)
|
||||
if stderr_thread:
|
||||
# Wait for the thread to complete. If it's still alive, mention it in the debug log.
|
||||
stderr_thread.join(timeout=1)
|
||||
|
||||
debug_bytes = stderr.getvalue()
|
||||
debug_log = sanitize_debug_text(debug_bytes)
|
||||
|
||||
incomplete = "(incomplete) " if stderr_thread.is_alive() else ""
|
||||
|
||||
log.info(
|
||||
"Conversion output (doc to pixels)\n"
|
||||
f"{DOC_TO_PIXELS_LOG_START}\n"
|
||||
f"----- DOC TO PIXELS LOG START {incomplete}-----\n"
|
||||
f"{debug_log}" # no need for an extra newline here
|
||||
f"{DOC_TO_PIXELS_LOG_END}"
|
||||
"----- DOC TO PIXELS LOG END -----"
|
||||
)
|
||||
|
||||
def start_stderr_thread(
|
||||
self, process: subprocess.Popen, stderr: IO[bytes]
|
||||
) -> Optional[threading.Thread]:
|
||||
"""Start a thread to read stderr from the process"""
|
||||
|
||||
def _stream_stderr(process_stderr: IO[bytes]) -> None:
|
||||
try:
|
||||
for line in process_stderr:
|
||||
stderr.write(line)
|
||||
except (ValueError, IOError) as e:
|
||||
log.debug(f"Stderr stream closed: {e}")
|
||||
|
||||
if process.stderr:
|
||||
stderr_thread = threading.Thread(
|
||||
target=_stream_stderr,
|
||||
args=(process.stderr,),
|
||||
daemon=True,
|
||||
)
|
||||
stderr_thread.start()
|
||||
return stderr_thread
|
||||
return None
|
||||
|
|
|
@ -168,6 +168,10 @@ class Container(IsolationProvider):
|
|||
) -> subprocess.Popen:
|
||||
container_runtime = container_utils.get_runtime()
|
||||
security_args = self.get_runtime_security_args()
|
||||
debug_args = []
|
||||
if self.debug:
|
||||
debug_args += ["-e", "RUNSC_DEBUG=1"]
|
||||
|
||||
enable_stdin = ["-i"]
|
||||
set_name = ["--name", name]
|
||||
prevent_leakage_args = ["--rm"]
|
||||
|
@ -177,14 +181,14 @@ class Container(IsolationProvider):
|
|||
args = (
|
||||
["run"]
|
||||
+ security_args
|
||||
+ debug_args
|
||||
+ prevent_leakage_args
|
||||
+ enable_stdin
|
||||
+ set_name
|
||||
+ image_name
|
||||
+ command
|
||||
)
|
||||
args = [container_runtime] + args
|
||||
return self.exec(args)
|
||||
return self.exec([container_runtime] + args)
|
||||
|
||||
def kill_container(self, name: str) -> None:
|
||||
"""Terminate a spawned container.
|
||||
|
|
|
@ -71,6 +71,7 @@ class DangerzoneCore(object):
|
|||
ocr_lang,
|
||||
stdout_callback,
|
||||
)
|
||||
|
||||
except Exception:
|
||||
log.exception(
|
||||
f"Unexpected error occurred while converting '{document}'"
|
||||
|
|
2
debian/rules
vendored
2
debian/rules
vendored
|
@ -9,5 +9,5 @@ export DH_VERBOSE=1
|
|||
dh $@ --with python3 --buildsystem=pybuild
|
||||
|
||||
override_dh_builddeb:
|
||||
./install/linux/vendor-pymupdf.py --dest debian/dangerzone/usr/lib/python3/dist-packages/dangerzone/vendor/
|
||||
./install/linux/debian-vendor-pymupdf.py --dest debian/dangerzone/usr/lib/python3/dist-packages/dangerzone/vendor/
|
||||
dh_builddeb $@
|
||||
|
|
|
@ -42,16 +42,21 @@ def git_verify(commit, source):
|
|||
def diffoci_hash_matches(diffoci):
|
||||
"""Check if the hash of the downloaded diffoci bin matches the expected one."""
|
||||
m = hashlib.sha256()
|
||||
m.update(DIFFOCI_PATH.open().read())
|
||||
m.update(diffoci)
|
||||
diffoci_checksum = m.hexdigest()
|
||||
return diffoci_checksum == DIFFOCI_CHECKSUM
|
||||
|
||||
|
||||
def diffoci_exists():
|
||||
"""Check if the diffoci helper exists, and if the hash matches."""
|
||||
def diffoci_is_installed():
|
||||
"""Determine if diffoci has been installed.
|
||||
|
||||
Determine if diffoci has been installed, by checking if the binary exists, and if
|
||||
its hash is the expected one. If the binary exists but the hash is different, then
|
||||
this is a sign that we need to update the local diffoci binary.
|
||||
"""
|
||||
if not DIFFOCI_PATH.exists():
|
||||
return False
|
||||
return diffoci_hash_matches(DIFFOCI_PATH.open().read())
|
||||
return diffoci_hash_matches(DIFFOCI_PATH.open("rb").read())
|
||||
|
||||
|
||||
def diffoci_download():
|
||||
|
@ -79,8 +84,7 @@ def diffoci_diff(source, local_target):
|
|||
"diff",
|
||||
source,
|
||||
target,
|
||||
"--ignore-timestamps",
|
||||
"--ignore-image-name",
|
||||
"--semantic",
|
||||
"--verbose",
|
||||
)
|
||||
except subprocess.CalledProcessError as e:
|
||||
|
@ -134,7 +138,7 @@ def main():
|
|||
commit = git_commit_get()
|
||||
git_verify(commit, args.source)
|
||||
|
||||
if diffoci_exists():
|
||||
if not diffoci_is_installed():
|
||||
logger.info(f"Downloading diffoci helper from {DIFFOCI_URL}")
|
||||
diffoci_download()
|
||||
|
||||
|
|
|
@ -11,105 +11,38 @@ Our build artifacts consist of:
|
|||
* Fedora packages (for regular Fedora distros and Qubes)
|
||||
* Debian packages (for Debian and Ubuntu)
|
||||
|
||||
As of writing this, none of the above artifacts are reproducible. For this
|
||||
reason, we purposefully build them in machines owned by FPF, since we can't
|
||||
trust third-party servers. A security hole in GitHub, or
|
||||
in our CI pipeline (check out the
|
||||
[Ultralytics cryptominer saga](https://github.com/ultralytics/ultralytics/issues/18027)),
|
||||
may allow attackers to plant a malicious artifact with no detection.
|
||||
As of writing this, only the following artifacts are reproducible:
|
||||
* Container images (see [#1047](https://github.com/freedomofpress/dangerzone/issues/1047))
|
||||
|
||||
Still, building our artifacts in private is not ideal. Third parties cannot
|
||||
easily audit if our artifacts have been built correctly or if they have been
|
||||
tampered with. For instance, our Apple Silicon container image builds PyMuPDF
|
||||
from source, and while the PyPI source package is hashed, the produced output
|
||||
does not have a known hash. So, it's not easy to verify it's been built
|
||||
correctly (read also the seminal
|
||||
["Reflections on Trusting Trust"](https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_ReflectionsonTrustingTrust.pdf)
|
||||
lecture by Ken Thompson on that subject).
|
||||
|
||||
In order to make our builds auditable and allow building artifacts in
|
||||
third-party servers safely, we want to make each artifact build reproducible. In
|
||||
the following sections, we'll lay down the plan to do so for each artifact type.
|
||||
In the following sections, we'll mention some specifics about enforcing
|
||||
reproducibility for each artifact type.
|
||||
|
||||
## Container image
|
||||
|
||||
### Current limitations
|
||||
|
||||
Our container image is currently not reproducible for the following main
|
||||
reasons:
|
||||
|
||||
* We build PyMuPDF from source, since it's not available in Alpine Linux. The
|
||||
result of this build is not reproducible. Note that PyMuPDF wheels are
|
||||
available from PyPI, but there are no ARM wheels for the musl libc platforms.
|
||||
* Alpine Linux does not have a way to pin packages and their dependencies, and
|
||||
does not retain old packages. There's a
|
||||
[workaround](https://github.com/reproducible-containers/repro-pkg-cache)
|
||||
to download the required packages and store them elsewhere, but then the
|
||||
cached package downloads cannot be easily audited.
|
||||
|
||||
## Proposed implementation
|
||||
|
||||
We can take advantage of the
|
||||
[Debian snapshot archives](https://snapshot.debian.org/)
|
||||
and pin our packages by specifying a date. There's already
|
||||
[prior art](https://github.com/reproducible-containers/repro-sources-list.sh/)
|
||||
for that, thanks to the incredible work of @AkihiroSuda on
|
||||
[reproducible containers](https://github.com/reproducible-containers).
|
||||
As for PyMuPDF, it is available from the Debian repos, so we won't have to build
|
||||
it from source.
|
||||
|
||||
Here are a few other obstacles that we need to overcome:
|
||||
* We currently download the
|
||||
[latest gVisor version](https://gvisor.dev/docs/user_guide/install/#latest-release)
|
||||
from a GCS bucket. Now that we have switched to Debian, we can take advantage
|
||||
of their
|
||||
[timestamped APT repos](https://gvisor.dev/docs/user_guide/install/#specific-release)
|
||||
and download specific releases from those. An extra benefit is that such
|
||||
releases are signed with their APT key.
|
||||
* We can no longer update the packages in the container image by rebuilding it.
|
||||
We have to bump the dates in the Dockerfile first, which is a minor hassle,
|
||||
but much more declarative.
|
||||
* The `repro-source-list-.sh` script uses the release date of the container
|
||||
image. However, the Debian image is not updated daily (see
|
||||
[newest tags](https://hub.docker.com/_/debian/tags)
|
||||
in DockerHub). So, if we want to ship an emergency release, we have to
|
||||
circumvent this limitation. A simple way is to trick the script by bumping the
|
||||
date of the `/etc/apt/sources.list.d/debian.sources` and
|
||||
`/etc/apt/sources.list` files.
|
||||
* While we talk about image reproducibility, we can't actually achieve the exact
|
||||
same SHA-256 hash for two different image builds. That's because the file
|
||||
timestamps in the image layers will differ, depending on when the build took
|
||||
place. The rest of the image though (file contents, permissions, manifest)
|
||||
should be byte-for-byte the same. A simple way to check this is with the
|
||||
[`diffoci`](https://github.com/reproducible-containers/diffoci) tool, and
|
||||
specifically this invocation:
|
||||
|
||||
```
|
||||
./diffoci diff podman://<new_image_tag> podman://<old_image_tag> \
|
||||
--ignore-timestamps --ignore-image-name --verbose
|
||||
```
|
||||
|
||||
### Updating the image
|
||||
|
||||
The fact that our image is reproducible also means that it's frozen in time.
|
||||
This means that rebuilding the image without updating our Dockerfile will **not**
|
||||
receive security updates.
|
||||
This means that rebuilding the image without updating our Dockerfile will
|
||||
**not** receive security updates.
|
||||
|
||||
We list the necessary variables that make up our image in the `Dockerfile.env`
|
||||
file. These are:
|
||||
Here are the necessary variables that make up our image in the `Dockerfile.env`
|
||||
file:
|
||||
* `DEBIAN_IMAGE_DATE`: The date that the Debian container image was released
|
||||
* `DEBIAN_ARCHIVE_DATE`: The Debian snapshot repo that we want to use
|
||||
* `GVISOR_ARCHIVE_DATE`: The gVisor APT repo that we want to use
|
||||
* `H2ORESTART_CHECKSUM`: The SHA-256 checksum of the H2ORestart plugin
|
||||
* `H2ORESTART_VERSION`: The version of the H2ORestart plugin
|
||||
|
||||
If you update these values in `Dockerfile.env`, you can create a new Dockerfile
|
||||
with:
|
||||
If you update these values in `Dockerfile.env`, you must also create a new
|
||||
Dockerfile with:
|
||||
|
||||
```
|
||||
poetry run jinja2 Dockerfile.in Dockerfile.env > Dockerfile
|
||||
make Dockerfile
|
||||
```
|
||||
|
||||
Updating `Dockerfile` without bumping `Dockerfile.in` is detected and should
|
||||
trigger a CI error.
|
||||
|
||||
### Reproducing the image
|
||||
|
||||
For a simple way to reproduce a Dangerzone container image, either local or
|
||||
|
|
|
@ -27,7 +27,7 @@ def str2bool(v):
|
|||
raise argparse.ArgumentTypeError("Boolean value expected.")
|
||||
|
||||
|
||||
def determine_tag():
|
||||
def determine_git_tag():
|
||||
# Designate a unique tag for this image, depending on the Git commit it was created
|
||||
# from:
|
||||
# 1. If created from a Git tag (e.g., 0.8.0), the image tag will be `0.8.0`.
|
||||
|
@ -90,7 +90,7 @@ def main():
|
|||
|
||||
print(f"Building for architecture '{ARCH}'")
|
||||
|
||||
tag = args.tag or determine_tag()
|
||||
tag = args.tag or determine_git_tag()
|
||||
image_name_tagged = IMAGE_NAME + ":" + tag
|
||||
|
||||
print(f"Will tag the container image as '{image_name_tagged}'")
|
||||
|
|
Loading…
Reference in a new issue