Introduction
Lorry is a high performance software asset mirroring system that duplicates Git repositories and other build artifacts into "downstream" software repositories. Lorry is capable of mirroring tens of thousands of software repositories and raw files.
Why Use Lorry
If you only have the need to mirror a few Git repositories periodically Lorry might not be the right choice as you can use tools provided by your Git host. However if you require a large volume of mirrored code so that your software can be built when upstream source code is lost or modified Lorry may be a good choice.
Upstream Code Can Disapear
Upstream software repositories can disappear for various reasons which can cause disruption in the supply chain. Lorry maintains mirrors of software so that if an upstream becomes unavailable due to an outage or other means the code is still available to you.
Mirroring Through Firewalls
Some software development environments forbid direct access to the internet. Lorry can be configured in such a way that it is placed in a DMZ zone between the isolated environment and the generally available internet.
Auditable Configuration
Using Lorry with a confgit
configuration source: A lorry configuration stored
within a Git repository allows auditing and the implementation of change
control practices around the adding and removing of software dependencies.
Seperation of Concerns
It is often desirable to separate the roles responsibilities of operations teams with those of software developers. Using Lorry in combination with the security policies of Gitlab allow developers with appropriate repository access to be able to manage creation of software mirrors independent of operations teams.
Avoiding Re-written History
Although a discouraged practice some software repositories "force push" changes to their main branches disrupting the verifiable integrity of their codebase. In worse cases a nefarious actor can even "force push" malicious code into at Git repository and software that depends on it will pull it down by default. Lorry disallows mirroring git references that contain re-written history by default which provides a safe-guard against bad changes.
Regulatory Compliance
Some software licenses require that consumed source code is made available on public servers.
Architecture Diagram
A simple diagram illustrating the various components of a Lorry installation.
Installation
System Requirements
Lorry is designed to run on a Linux distribution and should have at least two cores, 512mb of memory or more and adequate disk space to hold the contents of each configured mirror and raw file remote.
Runtime Dependencies
Note that only the Git binary need to be installed on your system, the other components are embedded in Lorry's crate dependencies.
Dependency | Description |
---|---|
git | Used for non libgit2 cloning operations |
libgit2 | Used for various repository management |
sqlite3 | Stateful repository metadata |
From Helm
A Helm chart for running Lorry on Kubernetes is available here
From your Package Manager
Arch Linux
A distribution package of Lorry is available on the AUR.
Other Operating Systems
If you're Linux distribution doesn't have a system package please consider creating one, merge requests are most welcome in this regard. Please open up an issue to track your distributions progress here.
From Source
To install Lorry from source you simply need to clone its Git repository and compile it with the standard Rust tool chain.
Build Dependencies
Ensure you have sqlx
available on your PATH, typically it can be installed
either by your system's package manager or by running cargo install sqlx-cli
.
Dependency | Description |
---|---|
sqlx | Used for compile time verficiation of sql queries |
git clone https://gitlab.com/CodethinkLabs/lorry/lorry2.git
cd lorry2
# Initialize Lorry's database in the repository
scripts/init_db_if_missing.sh
# Compile the Lorry binary
cargo build --release
# Test the compiled binary
target/release/lorry --help
From this point you can modify the lorry.example.toml file to your liking pointing it at your gitlab instance.
Server Configuration
Server configuration is specified by a TOML file from the file system. It can
be specified via lorry --config <PATH>
.
Private Token
The private token used for basic authentication in Git pushes can be specified
either in the configuration file as gitlab-private-token = gplat-ABC
or as an
environment variable by setting LORRY_GITLAB_PRIVATE_TOKEN=gplat-ABC
. If both
the configuration file and environment are set the environment variable will
take preference.
Notes on Threading
Lorry is a multi-threaded server process that will map blocking git clone operations onto each available thread the server has configured. By default the server will spawn as many clone operations as it has cores available which is a reasonable default. Note that when using the Git binary for fetch and push operations that it has its own concept of concurrency. If 8 threads are available on a Lorry host it is recommend to configure the git binary to use only a single thread.
For example:
n-threads = 8
[clone]
engine = GitBinary
n-threads = 1
Annotated Example
An annotated file is provided below.
# Path to a SQLite database that is used to schedule mirroring operations and
# store historical information.
statedb = "./db/lorries.sqlite"
# An optional username that is used as part of basic authentication on the
# downstream Git mirror.
username = "oauth2"
# Hostname of the downstream mirror and optional port number. NOTE that
# scheme e.g. https:// should not be included here.
hostname = "127.0.0.1:9999"
# Path to where Git repositories and raw-file assets will be stored on disk.
# NOTE that this directory is safe to delete however doing so will require that
# the mirrors be re-created causing extra CPU and network utilization.
working-area = "./workd"
# The maximum number of redirections Lorry should follow when resolving
# raw-file mirrors.
maximum-redirects = 1
# Optional URL from with a Lorry configuration can be cloned from and will be
# updated periodically.
# confgit-url = "https://my-git-repository.example.org/lorry-config"
# an optional branch to use as part of the remote git configuration
# If specified Lorry will read its configuration from this directory and
# reload it periodically.
configuration-directory = "./examples/controller"
# Logging level
log-level = "INFO"
# The port that Lorry should listen to for incoming network connections on.
port = 3000
# If calls to the Gitlab API can be done over insecure HTTP. Not recommended
# for production settings.
gitlab-insecure-http = true
# An optional path to a program which will return authentication credentials
# for Git push operations.
askpass-program = "./contrib/lorry-askpass"
# If sha256sums are required for all raw file mirrors.
sha256sums_required = true
# Username which should be configured for automated commits of Lorry raw-file
# mirrors. This name will show up in the Git log.
username = "lorry@example.org"
# An optional path to the gitconfig configuration file to use during commands
# where Lorry shells out to the git binary such as fetch operations when
# engine = GitBinary. Lorry will dynamically modify this file changing settings
# that are required for its normal operation but will not clobber it.
# git-config-path = "/etc/lorry/.gitconfig"
# The number of threads that Lorry will spawn to mirror individual
# repositories. It will default to the number of cores available on the
# currnetly running system.
# n-threads = 8
# git clone related configuration
[clone]
# the engine to use for cloning (and fetching) operations. this setting can
# either be gitbinary or libgit2. note that using the git binary for cloning
# large repositories has considerably better performance than using the
# libgit2 bindings.
engine = gitbinary
# the number of threads to use for cloning remote repositories. note that this
# settings only effects cloning when the gitbinary is in use. libgit2 does not
# support multi-threaded operations. due to lorry being multi-threaded "1" is
# typically a good setting here because several concurrent clone operations
# can degrade performance considerably.
n-threads = 1
Mirror Configuration
The mirror configuration is where the link between upstream repositories and downstream mirrors are defined. There are some basic requirements for this:
- Must be a git repository
- Must have a JSON file called
lorry-controller.conf
in the root of the repository.
The repository is specified in the confgit-url
and confgit-branch
settings for the controller, and is cloned or
updated when the Read Configuration endpoint is accessed.
Lorry Controller Configuration
The main lorry-controller.conf
file consists of a list of objects, containing the following required keys and values:
type
- String that should be set tolorries
.interval
- String in ISO8601 duration format. For examplePT3H
corresponds to a 3-hour duration. Specifies the interval for mirroring the various lorry configs in this group.timeout
- String in ISO8601 duration format, seeinterval
. If mirroring one of the lorry configs takes longer than the timeout, it will be cancelled.prefix
- String specifying the downstream group prefix. This is prefixed to the individual lorry names in this group on the downstream repository.globs
- Array of Strings specifying the file globs containing the individual lorry configurations. For example, thefolder/*.lorry
example given will look for all.lorry
files in thefolder
directory ( relative to thelorry-controller.conf
file)
An example configuration file would look like this:
[
{
"type": "lorries",
"interval": "PT1M",
"timeout": "PT1M",
"prefix": "lorry-mirrors/github",
"globs": [
"github.lorry"
]
}
]
This would mirror any repositories specified in a github.lorry
file every minute.
The individual lorry mirror configurations are YAML files. These are in the form:
mirror-name:
type: mirror type
# further mirror config
The mirror type currently can be either git
or raw-file
. This determines the extra mirror configuration required.
When the mirror type is git
, the extra configuration options are:
url
Required - String of the git URL for the repository.check-certificates
Default:true
- Boolean if the SSL/TLS certificate for the specific repository should be checked. If the worker levelcheck-certificates
option is set to false, this will not turn the checking of certificates back on, it can only disable the checking of certificates for the current mirror.ref-patterns
Optional - List of glob patterns that define which git references to mirror.ignore-patterns
Optional - List of glob patterns to exclude from mirrors. NOTE that these take precedence over ref-patterns.
When the mirror type is raw-file
, the extra configuration options are:
urls
Required - List of URL mappings, with the following keys:url
Required - String of the file URL to download.destination
Required - String of the directory to store the downloaded file in.
check-certificates
Default:true
- Boolean if the SSL/TLS certificate for the files should be checked. If the worker levelcheck-certificates
option is set to false, this will not turn the checking of certificates back on, it can only disable the checking of certificates for the current mirror.sha256sum
Optional - The expected sha256sum of the raw file
With the above, an example lorry mirror configuration could look like the following:
octocat/hello-world:
type: git
url: https://github.com/octocat/Hello-World.git
raw-files:
type: raw-file
urls:
- destination: target-directory
url: https://my-file-host.tld/directory/more-directory/file.tar
sha256sum: 3a1a7d59eb62f8710a46d86faea9ab9600f948660aed33acd4846658def0ef83
- destination: another-target-directory
url: https://my-file-host.tld/directory/another_file.tar
If this was used with the controller configuration above, then the two repositories created in your GitLab group would be at:
lorry-mirrors/github/octocat/hello-world
lorry-mirrors/github/raw-files
HTTP Interface
Lorry has a web interface as well as a REST API for configuration and management.
Once Lorry has been launched by default it will be available at http://localhost:3000.
Web Interface
The main page will list mirror status of each job and there is also a debug page which can display which active jobs are running on which thread. The config page will show each currently configured Lorry.
Endpoints
The following endpoints are available for users to interact with the controller.
Health Check
GET /1.0/health-check
List Jobs
GET /1.0/list-jobs
Lists of all the jobs that have been given to workers.
Example output:
[
{
"id": 1,
"path": "lorry-mirrors/github/octocat/hello-world",
"exit_status": {
"Finished": {
"exit_code": 0,
"disk_usage": 311296
}
},
"host": "worker-0"
}
]
List Lorries
GET /1.0/list-lorries
Lists all the lorries configured on the controller.
Example Output:
[
{
"path": "lorry-mirrors/github/octocat/hello-world",
"name": "lorry-mirrors/github/octocat/hello-world",
"spec": {
"type": "git",
"url": "https://github.com/octocat/Hello-World.git",
"check-certificates": true,
"refspecs": null
},
"running_job": null,
"last_run": 1696523383,
"interval": "PT60S",
"lorry_timeout": "PT60S",
"last_run_results": {
"Finished": {
"exit_code": 0,
"disk_usage": 311296
}
},
"last_run_output": "...",
"purge_before": 1696512967
}
]
Metrics
GET /1.0/metrics
Returns information of jobs and lorries exported as metrics for Prometheus:
Example Output:
# HELP lorry2_total_lorries The total amount of lorries.
# TYPE lorry2_total_lorries gauge
lorry2_total_lorries 2
# HELP lorry2_total_lorries_degraded The total amount of lorries partially failed.
# TYPE lorry2_total_lorries_degraded gauge
lorry2_total_lorries_degraded 0
# HELP lorry2_total_lorries_errors The total amount of lorries in a failed state.
# TYPE lorry2_total_lorries_errors gauge
lorry2_total_lorries_errors 0
# HELP lorry2_total_lorries_successful The total amount of successful mirrors.
# TYPE lorry2_total_lorries_successful gauge
lorry2_total_lorries_successful 2
# HELP lorry2_total_lorries_degraded_namespaced The total amount of lorries partially failed.
# TYPE lorry2_total_lorries_degraded_namespaced gauge
# HELP lorry2_total_lorries_errors_namespaced The total amount of lorries in a failed state.
# TYPE lorry2_total_lorries_errors_namespaced gauge
# HELP lorry2_total_lorries_successful_namespaced The total amount of successful mirrors.
# TYPE lorry2_total_lorries_successful_namespaced gauge
lorry2_total_lorries_successful_namespaced{namespace="lorry-mirrors/lorry"} 1
lorry2_total_lorries_successful_namespaced{namespace="raw-assets/lorry-assets"} 1
# EOF
Alerting & Metric Namespacing
Lorry configures mirrors with a concatenated path name such as sources/nvidia
or sources/github/google
. Lorry can expose the status of mirrors such as
number of errors or warnings based on these names. With this approach it is
possible to use Prometheus's
alertmanager to
send notifications to groups of users who are responsible for maintaining
a particular set of mirrors from an organization.
It's possible to configure the "depth", i.e. number of sub directories that
are exposed by Lorry. The default number of mirror namespaces that will be
exposed is 4
.
Development
Developing Lorry
The Lorry repository contains everything you need to start hacking on the codebase right away. First ensure that you've followed the instructions for installing Lorry locally.
Although not strictly required installing cargo-watch is suggested since it makes the development workflow easier. Install
it either by your package manager or via cargo install watch
. Additionally
you should ensure that you have podman installed which is the only
supported container platform for running Lorry.
This example uses Gitlab which is the primary software forge supported by Lorry.
# In a seperate terminal pane you can launch Gitlab. This will take a few minutes.
scripts/run_gitlab.sh
# Request an authentication token from Gitlab. Note that you need to wait about
# five minutes before running this since Gitlab will not immediately be ready.
source scripts/request_gitlab_token.sh
# Launch the watch script which will restart Lorry on code changes.
scripts/watch.sh
Writing Documentation
Documentation is managaed with mdbook.
Ensure you have that installed and then run scripts/docs.sh
to launch the
server. Documentation content is located under docs/content/
.
API Documentation
Lorry is not yet available on crates.io however its bleeding edge API documentation can be found here.
Development Containers
Lorry uses several containers during it's CI process. To update the version
of Rust for example that is used to compile Lorry you should modify the
appropriate variable in scripts/setup_build_containers.sh
and then execute
that script (assuming you have appropriate permissions to access Lorry's
Gitlab container registry).
Changing the version of Rust in CI / Containers
To change the version of Rust bump the version number defined in
scripts/setup_build_containers.sh
and then run the script to mirror the new
version. Additionally set the version defined in .gitlab-ci.yml
to match the
container / tag you mirrored in the previous step.
Troubleshooting
Diagnostic Steps for Troubleshooting Failed Mirrors
Step 1
Check the UI in Lorry for the particular mirror and look for any logs or error messages.
Step 2
Access the logs of the running Lorry instance and select Schedule Run
from
the user interface of the mirror that is failing. Monitor the log output and
look for any suspicious error messages.
Step 3
Inspect Lorry's working directory.
find workd -maxdepth 3
workd
workd/raw-assets_lorry-assets <-- A raw file mirror
workd/raw-assets_lorry-assets/raw-files <--- Raw file mirrors store data here
workd/raw-assets_lorry-assets/raw-files/lorry-tiny.png
workd/raw-assets_lorry-assets/raw-files/.gitattributes
workd/raw-assets_lorry-assets/git-repository <--- Git repository is stored here
workd/raw-assets_lorry-assets/git-repository/logs
workd/raw-assets_lorry-assets/git-repository/hooks
workd/raw-assets_lorry-assets/git-repository/lfs
workd/raw-assets_lorry-assets/git-repository/info
workd/raw-assets_lorry-assets/git-repository/index
workd/raw-assets_lorry-assets/git-repository/config
workd/raw-assets_lorry-assets/git-repository/description
workd/raw-assets_lorry-assets/git-repository/COMMIT_EDITMSG
workd/raw-assets_lorry-assets/git-repository/refs
workd/raw-assets_lorry-assets/git-repository/objects
workd/raw-assets_lorry-assets/git-repository/FETCH_HEAD
workd/raw-assets_lorry-assets/git-repository/HEAD
workd/lorry_lorry <-- A normal mirror without raw files
workd/lorry_lorry/raw-files
workd/lorry_lorry/git-repository
workd/lorry_lorry/git-repository/hooks
workd/lorry_lorry/git-repository/info
workd/lorry_lorry/git-repository/config
workd/lorry_lorry/git-repository/description
workd/lorry_lorry/git-repository/refs
workd/lorry_lorry/git-repository/objects
workd/lorry_lorry/git-repository/FETCH_HEAD
workd/lorry_lorry/git-repository/HEAD
Step 4
Verify the integrity of the mirror repository
cd workd/lorry_lorry/git-repository
git fsck --full
Checking object directories: 100% (256/256), done.
Checking objects: 100% (4011/4011), done.
Check the status of a raw-file mirror
cd workd/raw-assets_lorry-assets/git-repository
git --work-tree=../raw-files status
Step 5
Delete the mirror repository on disk and allow Lorry to initialize it again.
rm -rf workd/lorry_lorry
You can select Schedule Run
in the web interface to start the mirroring
operation again. NOTE that if your downstream repository is corrupt or otherwise
in an invalid state Lorry will first clone from there prior to fetching new
updates.
Step 6
Stop the Lorry process and delete the mirrored repository from your downstream (likely Gitlab) and then delete the working directory as described in Step 5.
Start Lorry again and monitor it's logs to confirm the subsequent clone and initialization was successful.
List-lorries Endpoint Analysis Tool
scrips/lorries_analysis.py
Is a script for gathering the status data of the mirrored repositories, to assist with debugging systematic failures.
See ./scrips/lorries_analysis.py --help
for usage.
Migration from Lorry 1
The biggest difference between Lorry 1 and Lorry 2 are the configuration files as well as the confgit repo. Thankfully as the controller configuration for Lorry 2 includes a branch, this can be used to make migration easier.
Confgit Migration
The first step will be to migrate the confgit configuration. For this, if your individual lorries are in yaml format,
then nothing is going to need to be changed. The biggest change is to the root lorry-configuration.conf
where several
options have changed. You may also need to change/drop certain mirror files themselves.
- Convert any JSON based lorry configs to YAML.
- Convert the
lorry-configuration.conf
file:
type
stays the same (the valuelorries
)interval
convert to ISO8601 Duration formatlorry-timeout
convert to ISO8601 Duration format and rename keytimeout
. This is now a required key.prefix
stays the sameglobs
stays the same- any other keys should be discarded (but will be ignored if left)
- Convert mirror files to be lorry2 compatible by running:
- The
git
type is supported as the same usage - The
raw-file
type is supported as the same usage - The
tarball
type is no longer supported, so repos withtype: tarball
need to be dropped. - Mercurial upstream is no longer supported so you may need to drop mirrors with
type: hg
or find agit
host of them. - Bazaar upstream is no longer supported so you may need to drop mirrors with
type: bzr
or find agit
host of them. - Subversion upstream is no longer supported so you may need to drop mirrors with
type: svn
or find agit
host of them. - CVS upstream is no longer supported so you may need to drop mirrors with
type: cvs
or find agit
host of them. - ZIP upstream is no longer supported so you may need to drop mirrors with
type: zip
- GZIP upstream is no longer supported so you may need to drop mirrors with
type: gzip
For type: tarball
, type: zip
and type: gzip
, it might be possible
to use type: raw-file
instead for these purpose. But bear in mind that
type: tarball
, type: zip
and type: gzip
used to first expand the
compressed file and then commit the extracted content while raw-file
just
simply pushes the compressed file.
To automate step 3 we internally use a script convert_to_lorry2.py
.
This may be useful for your own conversion. Its usage is:
python convert_to_lorry2.py mirror_config_repo
mirror_config_repo
should be the root directory of the mirroring-config repo. This script will
automatically drop all repos with the unsupported types as listed above.
This can all be done in a separate branch to your main configuration, allowing for quick rollback in case of an issue.
Lorry is licensed under the Apache Licence 2.0.
See the full license text in the Lorry repository.
Copyright 2024 Codethink Ltd.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.