HPC environments, where hundreds or even thousands of servers can be easily used for complex computations or data analysis, are expensive to establish and complex to maintain. HPC infrastructure provides access to vast compute resources and cost reductions are also often part of the motivation behind establishing this kind of infrastructure. However, there are large capital costs associated with establishing HPC infrastructure and it's not an option for many teams with tight budgets. Another challenge with purpose built infrastructure, such as HPC, is that a high utilization of the hardware in system is needed for the infrastructure to become cost effective. On the other hand, if utilization is too high, then users will have to wait too long for their computations to finish.
Cloud computing solves the problems of capital expenditure and utilization by offering on-demand services. However, cloud computing can quickly become expensive for HPC workloads, as most cloud pricing models scales linearly with core-usage and the price per core has to cover both under-utilized hardware and a complex software stack. Cloud services do have the benefit of scale and some services, such as object storage, can be very cost-effective if used correctly.
Between the options of on-premise HPC hardware and cloud computing there's a number of vendors that specialize in bare-metal hosting services. These vendors provide low-cost HPC capable hardware but without the software layer to run advanced data analysis. Since the hardware is hosted in a remote professional data center and managed by on-site staff, no staff is needed for hardware maintenance. But the lack of advanced virtualization software layers and management interfaces, as provided by most cloud services, makes it challenging to utilize the hardware in an efficient way.
On-premise, hosted and cloud all has advantages and disadvantages and often teams has to choose one based on the competencies available in the team and the size of their budget. ByteRouter was motivated by the need to use any cost-efficient hardware to solve HPC problems without needing to onboard a team of infrastructure experts. Another key goal for ByteRouter is to minimize maintenance costs to avoid that cost savings on infrastructure are offset by increased maintenance costs.
ByteRouter is not a direct replacement for HPC systems and does not try to compete with the performance of HPC clusters or the on-demand scalability of cloud services. In fact, integration of HPC clusters and cloud resources is one of the design goals, to allow ByteRouter to leverage the strengths of such infrastructure. A key goal of ByteRouter is to enable the use of low-cost bare-metal servers effectively with minimal infrastructure in place. If a server can be rented at 20% of the cost of a comparable cloud instance, then, in many cases, it doesn't matter if the rented server has a utilization of 50% as the cost is still 40% lower than an on-demand cloud instance.
ByteRouter is a software layer between the end-user and any connected hardware infrastructure. The system exposes connected resources as a virtual cluster and use a built-in scheduler to execute jobs where all dependencies can be distributed in OCI compatible containers or are installed on the connected nodes. Data management is an integrated part of the system and provides access to storage systems and automatically transfer data between nodes as needed. A user-friendly web interface is provided for both system administration and job execution. The REST API can be used by more advanced users for automation and CLI access.
Compute Nodes
ByteRouter is designed to have a minimum of dependencies and can establish a connection to nodes using only SSH and a set of credentials. Any server running Linux with ssh access can be connected to ByteRouter and used as a compute node. ByteRouter may choose to deploy alternative protocols to improve communication latency and reduce the network load if it's supported by the network, but this happens automatically. ByteRouter also depends on Apptainer which is an OCI compatible container technology similar to Docker, but with strong focus on HPC. Apptainer is used by ByteRouter to simplify provisioning of e.g. job monitoring, data management and distribution of user software.
In future releases it will also be possible to perform automatic provisioning of cloud instances (e.g. AWS EC2) and use cloud batch services (e.g. AWS Batch). Additionally we plan to release a connector for HPC clusters (e.g. Slurm) to support the use of cheap HPC resources when available. Note that cloud instances can be connected today using SSH.
Configuration of a compute node, installation of dependencies and disabling maintenance mode
Storage Services
Storage services are any storage system which can be reached by supported protocols such as SFTP and S3. The data manager in ByteRouter connects storage services and provides a uniform interface for transferring data to and from jobs running on compute nodes. Data transfer between nodes and storage is encrypted and ByteRouter will attempt to optimize the transfer speed based on the network and storage service type. Compute nodes communicate directly with storage services and ByteRouter takes care of key management to ensure secure communication.
Configuration of an SFTP based storage service. A username and password is used once to create ssh-keys. Storage services are accessed directly by jobs and and can be used to share data between jobs.
Job Execution
Jobs are defined as bash scripts where special tags provide easy access to managed storage services and containers. ByteRouter is designed to execute containerized applications to support flexible and scalable pipelines but any application/program which are installed on the nodes can be started by jobs. Each job is submitted to the built-in scheduler which will handled execution on available nodes. The scheduling system is heavily inspired by HPC schedulers such as Slurm, but it has been designed to support robust execution of jobs across a distributed infrastructure and not only nodes in the same data center.
Jobs can be submitted through both the web interface and a REST API. The web interface provides a user-friendly way to define jobs and any jobs created here can be used as a template for the API interface. The API can be used for both CLI development and automating of job submission using any language. The built-in queue and scheduling system supports +100k jobs with minimal resource usage.
Workflows can be established using job dependencies that prevents jobs from being excuted until all dependencies have been resolved. To support complex workflows, containing hundreds of jobs, ByteRouter implements robust mechanisms for handling e.g. bad network connections and node failures. Support for common workflow systems, such as NextFlow and SnakeMake, is planned to allow users of these to seamlessly integrate with ByteRouter. Direct support for Slurm jobs is also planned as an interesting use case for ByteRouter is to connect to a Slurm cluster and then setup additional compute nodes outside of the cluster to handle e.g. peak load or adding special hardware capabilities such as GPUs for AI training.
Definition of a job using a job template which is modified. Storage tags are used to download 2 files from a local NAS storage. The 2 files are concatenated and the result is uploaded to a S3 bucket at Backblaze.
Finally, the job log is inspected.
Software Distribution and Containerization
ByteRouter was built to leverage modern containerization of applications. Containers such as Docker and Apptainer makes it easy to deploy applications with all dependencies on nodes at scale and manage pipeline versioning. However, containerization is optional and ByteRouter can also be used with nodes that are provisioned manually or using e.g. Ansible or Terraform. To simplify the use of containers, ByteRouter implements a versioned container image repository that can be used to automatically distribute images to nodes as needed without having to worry about credentials and pull limits for public container repositories.
ByteRouter implements tags for referring to containers in the container image store. When a container tag is used in a job script, the container is automatically download as part of the job initialization. A container image cache is automatically setup on each node to minimize network traffic and speed up execution.
Since ByteRouter only use containers to distribute the software needed by jobs, the container orchestration mechanism is much simpler than e.g. K8s and Nomad as such tools were designed for providing high-availability web-services which is much more complex.
Example of a 6 step workflow using containerized applications for both CPU and GPU nodes. The workflow is submitted through the REST API using a simple python script.
The job monitoring show the status of dependencies.
User Management
ByteRouter provides an integrated role-based user-management module which is fully compatible with the standard Linux user management system. Connected resources (nodes and storage) can be assigned to users in a fine-grained manner as needed. The user management system also includes a secure key-store that manages e.g. ssh keys and API secrets and ensures that authenticated users can access to connected resources using key-based authentication without having to worry about key management.
Example of user management in ByteRouter. Connecting a storage service to a user
Deployment
ByteRouter can be deployed using containers that store data in a file-backed database. This simple deployment model ensures that the system can be easily deployed on any hardware. More advanced deployment models are planned such as "high-availability" mode with fast failover and "cluster deployment" where multiple ByteRouter instances on possible different locations, can be accessed through a unified interface.