Lorry Logo

Introduction

Lorry is a high performance software asset mirroring system that duplicates Git repositories and other build artifacts into "downstream" software repositories. Lorry is capable of mirroring tens of thousands of software repositories and raw files.

Why Use Lorry

If you only have the need to mirror a few Git repositories periodically Lorry might not be the right choice as you can use tools provided by your Git host. However if you require a large volume of mirrored code so that your software can be built when upstream source code is lost or modified Lorry may be a good choice.

Upstream Code Can Disapear

Upstream software repositories can disappear for various reasons which can cause disruption in the supply chain. Lorry maintains mirrors of software so that if an upstream becomes unavailable due to an outage or other means the code is still available to you.

Mirroring Through Firewalls

Some software development environments forbid direct access to the internet. Lorry can be configured in such a way that it is placed in a DMZ zone between the isolated environment and the generally available internet.

Auditable Configuration

Using Lorry with a confgit configuration source: A lorry configuration stored within a Git repository allows auditing and the implementation of change control practices around the adding and removing of software dependencies.

Seperation of Concerns

It is often desirable to separate the roles responsibilities of operations teams with those of software developers. Using Lorry in combination with the security policies of Gitlab allow developers with appropriate repository access to be able to manage creation of software mirrors independent of operations teams.

Avoiding Re-written History

Although a discouraged practice some software repositories "force push" changes to their main branches disrupting the verifiable integrity of their codebase. In worse cases a nefarious actor can even "force push" malicious code into at Git repository and software that depends on it will pull it down by default. Lorry disallows mirroring git references that contain re-written history by default which provides a safe-guard against bad changes.

Regulatory Compliance

Some software licenses require that consumed source code is made available on public servers.

Architecture Diagram

A simple diagram illustrating the various components of a Lorry installation.

Architecture Diagram

Installation

System Requirements

Lorry is designed to run on a Linux distribution and should have at least two cores, 512mb of memory or more and adequate disk space to hold the contents of each configured mirror and raw file remote.

Runtime Dependencies

Note that only the Git binary need to be installed on your system, the other components are embedded in Lorry's crate dependencies.

DependencyDescription
gitUsed for non libgit2 cloning operations
libgit2Used for various repository management
sqlite3Stateful repository metadata

From Helm

A Helm chart for running Lorry on Kubernetes is available here

From your Package Manager

Arch Linux

A distribution package of Lorry is available on the AUR.

Other Operating Systems

If you're Linux distribution doesn't have a system package please consider creating one, merge requests are most welcome in this regard. Please open up an issue to track your distributions progress here.

From Source

To install Lorry from source you simply need to clone its Git repository and compile it with the standard Rust tool chain.

Build Dependencies

Ensure you have sqlx available on your PATH, typically it can be installed either by your system's package manager or by running cargo install sqlx-cli.

DependencyDescription
sqlxUsed for compile time verficiation of sql queries
git clone https://gitlab.com/CodethinkLabs/lorry/lorry2.git
cd lorry2
# Initialize Lorry's database in the repository
scripts/init_db_if_missing.sh
# Compile the Lorry binary
cargo build --release
# Test the compiled binary 
target/release/lorry --help

From this point you can modify the lorry.example.toml file to your liking pointing it at your gitlab instance.

Server Configuration

Server configuration is specified by a TOML file from the file system. It can be specified via lorry --config <PATH>.

Private Token

The private token used for basic authentication in Git pushes can be specified either in the configuration file as gitlab-private-token = gplat-ABC or as an environment variable by setting LORRY_GITLAB_PRIVATE_TOKEN=gplat-ABC. If both the configuration file and environment are set the environment variable will take preference.

Notes on Threading

Lorry is a multi-threaded server process that will map blocking git clone operations onto each available thread the server has configured. By default the server will spawn as many clone operations as it has cores available which is a reasonable default. Note that when using the Git binary for fetch and push operations that it has its own concept of concurrency. If 8 threads are available on a Lorry host it is recommend to configure the git binary to use only a single thread.

For example:

n-threads = 8
[clone]
engine = GitBinary
n-threads = 1

Annotated Example

An annotated file is provided below.

# Path to a SQLite database that is used to schedule mirroring operations and
# store historical information.
statedb = "./db/lorries.sqlite"
# An optional username that is used as part of basic authentication on the
# downstream Git mirror.
username = "oauth2"
# Hostname of the downstream mirror and optional port number. NOTE that
# scheme e.g. https:// should not be included here.
hostname = "127.0.0.1:9999"
# Path to where Git repositories and raw-file assets will be stored on disk.
# NOTE that this directory is safe to delete however doing so will require that
# the mirrors be re-created causing extra CPU and network utilization.
working-area = "./workd"
# The maximum number of redirections Lorry should follow when resolving 
# raw-file mirrors.
maximum-redirects = 1
# Optional URL from with a Lorry configuration can be cloned from and will be
# updated periodically.
# confgit-url = "https://my-git-repository.example.org/lorry-config"
# an optional branch to use as part of the remote git configuration
# If specified Lorry will read its configuration from this directory and 
# reload it periodically.
configuration-directory = "./examples/controller"
# Logging level 
log-level = "INFO"
# The port that Lorry should listen to for incoming network connections on.
port = 3000
# If calls to the Gitlab API can be done over insecure HTTP. Not recommended
# for production settings.
gitlab-insecure-http = true
# An optional path to a program which will return authentication credentials
# for Git push operations.
askpass-program = "./contrib/lorry-askpass"

# If sha256sums are required for all raw file mirrors.
sha256sums_required = true

# Username which should be configured for automated commits of Lorry raw-file
# mirrors. This name will show up in the Git log.
username = "lorry@example.org"

# An optional path to the gitconfig configuration file to use during commands
# where Lorry shells out to the git binary such as fetch operations when 
# engine = GitBinary. Lorry will dynamically modify this file changing settings
# that are required for its normal operation but will not clobber it.
# git-config-path = "/etc/lorry/.gitconfig"

# The number of threads that Lorry will spawn to mirror individual 
# repositories. It will default to the number of cores available on the 
# currnetly running system.
# n-threads = 8

# git clone related configuration
[clone]
# the engine to use for cloning (and fetching) operations. this setting can
# either be gitbinary or libgit2. note that using the git binary for cloning
# large repositories has considerably better performance than using the 
# libgit2 bindings.
engine = gitbinary
# the number of threads to use for cloning remote repositories. note that this
# settings only effects cloning when the gitbinary is in use. libgit2 does not
# support multi-threaded operations. due to lorry being multi-threaded "1" is
# typically a good setting here because several concurrent clone operations
# can degrade performance considerably.
n-threads = 1

Mirror Configuration

The mirror configuration is where the link between upstream repositories and downstream mirrors are defined. There are some basic requirements for this:

  • Must be a git repository
  • Must have a JSON file called lorry-controller.conf in the root of the repository.

The repository is specified in the confgit-url and confgit-branch settings for the controller, and is cloned or updated when the Read Configuration endpoint is accessed.

Lorry Controller Configuration

The main lorry-controller.conf file consists of a list of objects, containing the following required keys and values:

  • type - String that should be set to lorries.
  • interval - String in ISO8601 duration format. For example PT3H corresponds to a 3-hour duration. Specifies the interval for mirroring the various lorry configs in this group.
  • timeout - String in ISO8601 duration format, see interval. If mirroring one of the lorry configs takes longer than the timeout, it will be cancelled.
  • prefix - String specifying the downstream group prefix. This is prefixed to the individual lorry names in this group on the downstream repository.
  • globs - Array of Strings specifying the file globs containing the individual lorry configurations. For example, the folder/*.lorry example given will look for all .lorry files in the folder directory ( relative to the lorry-controller.conf file)

An example configuration file would look like this:

[
  {
    "type": "lorries",
    "interval": "PT1M",
    "timeout": "PT1M",
    "prefix": "lorry-mirrors/github",
    "globs": [
      "github.lorry"
    ]
  }
]

This would mirror any repositories specified in a github.lorry file every minute.

The individual lorry mirror configurations are YAML files. These are in the form:

mirror-name:
  type: mirror type
  # further mirror config

The mirror type currently can be either git or raw-file. This determines the extra mirror configuration required.

When the mirror type is git, the extra configuration options are:

  • url Required - String of the git URL for the repository.
  • check-certificates Default: true - Boolean if the SSL/TLS certificate for the specific repository should be checked. If the worker level check-certificates option is set to false, this will not turn the checking of certificates back on, it can only disable the checking of certificates for the current mirror.
  • ref-patterns Optional - List of glob patterns that define which git references to mirror.
  • ignore-patterns Optional - List of glob patterns to exclude from mirrors. NOTE that these take precedence over ref-patterns.

When the mirror type is raw-file, the extra configuration options are:

  • urls Required - List of URL mappings, with the following keys:
    • url Required - String of the file URL to download.
    • destination Required - String of the directory to store the downloaded file in.
  • check-certificates Default: true - Boolean if the SSL/TLS certificate for the files should be checked. If the worker level check-certificates option is set to false, this will not turn the checking of certificates back on, it can only disable the checking of certificates for the current mirror.
  • sha256sum Optional - The expected sha256sum of the raw file

With the above, an example lorry mirror configuration could look like the following:

octocat/hello-world:
  type: git
  url: https://github.com/octocat/Hello-World.git

raw-files:
  type: raw-file
  urls:
    - destination: target-directory
      url: https://my-file-host.tld/directory/more-directory/file.tar
      sha256sum: 3a1a7d59eb62f8710a46d86faea9ab9600f948660aed33acd4846658def0ef83
    - destination: another-target-directory
      url: https://my-file-host.tld/directory/another_file.tar

If this was used with the controller configuration above, then the two repositories created in your GitLab group would be at:

  • lorry-mirrors/github/octocat/hello-world
  • lorry-mirrors/github/raw-files

HTTP Interface

Lorry has a web interface as well as a REST API for configuration and management.

Once Lorry has been launched by default it will be available at http://localhost:3000.

Web Interface

Lorry Logo

The main page will list mirror status of each job and there is also a debug page which can display which active jobs are running on which thread. The config page will show each currently configured Lorry.

Endpoints

The following endpoints are available for users to interact with the controller.

Health Check

GET /1.0/health-check

List Jobs

GET /1.0/list-jobs

Lists of all the jobs that have been given to workers.

Example output:

[
  {
    "id": 1,
    "path": "lorry-mirrors/github/octocat/hello-world",
    "exit_status": {
      "Finished": {
        "exit_code": 0,
        "disk_usage": 311296
      }
    },
    "host": "worker-0"
  }
]

List Lorries

GET /1.0/list-lorries

Lists all the lorries configured on the controller.

Example Output:

[
  {
    "path": "lorry-mirrors/github/octocat/hello-world",
    "name": "lorry-mirrors/github/octocat/hello-world",
    "spec": {
      "type": "git",
      "url": "https://github.com/octocat/Hello-World.git",
      "check-certificates": true,
      "refspecs": null
    },
    "running_job": null,
    "last_run": 1696523383,
    "interval": "PT60S",
    "lorry_timeout": "PT60S",
    "last_run_results": {
      "Finished": {
        "exit_code": 0,
        "disk_usage": 311296
      }
    },
    "last_run_output": "...",
    "purge_before": 1696512967
  }
]

Metrics

GET /1.0/metrics

Returns information of jobs and lorries exported as metrics for Prometheus:

Example Output:

# HELP lorry2_total_lorries The total amount of lorries.
# TYPE lorry2_total_lorries gauge
lorry2_total_lorries 2
# HELP lorry2_total_lorries_degraded The total amount of lorries partially failed.
# TYPE lorry2_total_lorries_degraded gauge
lorry2_total_lorries_degraded 0
# HELP lorry2_total_lorries_errors The total amount of lorries in a failed state.
# TYPE lorry2_total_lorries_errors gauge
lorry2_total_lorries_errors 0
# HELP lorry2_total_lorries_successful The total amount of successful mirrors.
# TYPE lorry2_total_lorries_successful gauge
lorry2_total_lorries_successful 2
# HELP lorry2_total_lorries_degraded_namespaced The total amount of lorries partially failed.
# TYPE lorry2_total_lorries_degraded_namespaced gauge
# HELP lorry2_total_lorries_errors_namespaced The total amount of lorries in a failed state.
# TYPE lorry2_total_lorries_errors_namespaced gauge
# HELP lorry2_total_lorries_successful_namespaced The total amount of successful mirrors.
# TYPE lorry2_total_lorries_successful_namespaced gauge
lorry2_total_lorries_successful_namespaced{namespace="lorry-mirrors/lorry"} 1
lorry2_total_lorries_successful_namespaced{namespace="raw-assets/lorry-assets"} 1
# EOF

Alerting & Metric Namespacing

Lorry configures mirrors with a concatenated path name such as sources/nvidia or sources/github/google. Lorry can expose the status of mirrors such as number of errors or warnings based on these names. With this approach it is possible to use Prometheus's alertmanager to send notifications to groups of users who are responsible for maintaining a particular set of mirrors from an organization.

It's possible to configure the "depth", i.e. number of sub directories that are exposed by Lorry. The default number of mirror namespaces that will be exposed is 4.

Development

Developing Lorry

The Lorry repository contains everything you need to start hacking on the codebase right away. First ensure that you've followed the instructions for installing Lorry locally.

Although not strictly required installing cargo-watch is suggested since it makes the development workflow easier. Install it either by your package manager or via cargo install watch. Additionally you should ensure that you have podman installed which is the only supported container platform for running Lorry.

This example uses Gitlab which is the primary software forge supported by Lorry.

# In a seperate terminal pane you can launch Gitlab. This will take a few minutes.
scripts/run_gitlab.sh
# Request an authentication token from Gitlab. Note that you need to wait about
# five minutes before running this since Gitlab will not immediately be ready.
source scripts/request_gitlab_token.sh
# Launch the watch script which will restart Lorry on code changes.
scripts/watch.sh

Writing Documentation

Documentation is managaed with mdbook.

Ensure you have that installed and then run scripts/docs.sh to launch the server. Documentation content is located under docs/content/.

API Documentation

Lorry is not yet available on crates.io however its bleeding edge API documentation can be found here.

Troubleshooting

List-lorries Endpoint Analysis Tool

scrips/lorries_analysis.py Is a script for gathering the status data of the mirrored repositories, to assist with debugging systematic failures.

See ./scrips/lorries_analysis.py --help for usage.

Migration from Lorry 1

The biggest difference between Lorry 1 and Lorry 2 are the configuration files as well as the confgit repo. Thankfully as the controller configuration for Lorry 2 includes a branch, this can be used to make migration easier.

Confgit Migration

The first step will be to migrate the confgit configuration. For this, if your individual lorries are in yaml format, then nothing is going to need to be changed. The biggest change is to the root lorry-configuration.conf where several options have changed. You may also need to change/drop certain mirror files themselves.

  1. Convert any JSON based lorry configs to YAML.
  2. Convert the lorry-configuration.conf file:
  • type stays the same (the value lorries)
  • interval convert to ISO8601 Duration format
  • lorry-timeout convert to ISO8601 Duration format and rename key timeout. This is now a required key.
  • prefix stays the same
  • globs stays the same
  • any other keys should be discarded (but will be ignored if left)
  1. Convert mirror files to be lorry2 compatible by running:
  • The git type is supported as the same usage
  • The raw-file type is supported as the same usage
  • The tarball type is no longer supported, so repos with type: tarball need to be dropped.
  • Mercurial upstream is no longer supported so you may need to drop mirrors with type: hg or find a git host of them.
  • Bazaar upstream is no longer supported so you may need to drop mirrors with type: bzr or find a git host of them.
  • Subversion upstream is no longer supported so you may need to drop mirrors with type: svn or find a git host of them.
  • CVS upstream is no longer supported so you may need to drop mirrors with type: cvs or find a git host of them.
  • ZIP upstream is no longer supported so you may need to drop mirrors with type: zip
  • GZIP upstream is no longer supported so you may need to drop mirrors with type: gzip

For type: tarball, type: zip and type: gzip, it might be possible to use type: raw-file instead for these purpose. But bear in mind that type: tarball, type: zip and type: gzip used to first expand the compressed file and then commit the extracted content while raw-file just simply pushes the compressed file.

To automate step 3 we internally use a script convert_to_lorry2.py. This may be useful for your own conversion. Its usage is:

python convert_to_lorry2.py mirror_config_repo

mirror_config_repo should be the root directory of the mirroring-config repo. This script will automatically drop all repos with the unsupported types as listed above.

This can all be done in a separate branch to your main configuration, allowing for quick rollback in case of an issue.

Lorry is licensed under the Apache Licence 2.0.

See the full license text in the Lorry repository.

Copyright 2024 Codethink Ltd.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.