research & innovation >> brc >> bioinformatics >> biohpc & cac
 

BioHPC & CAC

Cornell University offers two distinct options for high-performance computing (HPC) resources available to the entire university: the Center for Advanced Computing (CAC) and the Bioinformatics Facility’s BioHPC. Both provide on-demand computing and storage, hosting of faculty- or departmentally-owned servers, as well as consulting and technical support. However, the user experience between them is quite different. Bioinformatics Facility covers HPC for Life Sciences at Cornell and specializes in supporting unique needs in bioinformatics and computational biology while CAC supports general HPC at Cornell. BioHPC was designed to provide preconfigured computing tools and is easier for novice users to get started with. CAC supports a wide range of applications based on general HPC and offers more flexibility in computing environments but assumes some familiarity with system administration and software installation procedures. The Bioinformatics facility and CAC work together to deliver solutions that cover the diverse needs of researchers in practically any field. Below you can explore a detailed description of services offered by both facilities side by side, a decision guide comparing BioHPC and CAC may be also helpful.

 

Bioinformatics Facility

Center for Advanced Computing[JV1] 

HPC Environment

All servers at BioHPC have the same operating system, currently Rocky 9 (Linux). Over 1,200 life science-related software titles are installed and available to use on all servers. Each server features shared, network-mounted storage in the home directory, as well as local, fast storage. BioHPC is designed as affordable on-site computing solution.

Users can access machines interactively via command-line (SSH), graphical environment (VNC/X11), or 3rd party tools like R-studio or Jupyter Lab. It is also possible to start an instance of SLURM scheduler.

Users do not have root access but can use docker/apptainer and have root access within a container. BioHPC provides several tools to help PIs manage file permissions and server access and will install software on request.

CAC provides professional support for general computing environment including both scheduled and interactive computing where users can choose the operating system (Windows/Linux) and software configuration with administrator access in an environment similar to Amazon EC2 with CAC’s cost and security guardrails. Access methods include SSH, VNC/remote desktop, web and API.

For users that require management, CAC system administrators can manage updates and configuration of operating system and applications for an additional charge.

 

 

On-demand computing

BioHPC has over 90 Linux servers that can be reserved using a calendar-based online scheduler, or a command-line tool. All servers share the same BioHPC OS and software configuration. The servers have a wide range of computing capacities, with between 12 - 256 cores, RAM ranging from 48 GB to 3.0 TB, and up to 16 TB local hard drive storage. There are 9 GPU servers available to rent and many servers have fast NVMe storage. A user has exclusive access to the entire machine for the duration of the reservation, though may share this access with other users/groups. It is a cost-effective solution protected from overcharging.

Red Cloud is a subscription-based Infrastructure as a Service (IaaS) cloud platform that provides on-demand virtual servers and storage with root access. Several linux distributions and Windows operating systems are available, and users can create custom images with the application configurations they require. It offers a cost-effective alternative to public cloud services, with no oversubscription of computing resources.

Choose from a variety of configurations that fit the needs of your application, with up to 128 CPU cores and 1TB RAM, including NVIDIA T4 and V100 GPUs. Most instances have 8GB per CPU core.

Hosting services

Hosted servers are owned by a research group and managed by BioHPC. Bioinformatics Facility staff will help to choose appropriate hardware, obtain a quote, and purchase a server. Hosted servers are configured and managed just like rental servers, with the same access to software and network storage, but with a permanent reservation for group members. Group membership can be managed through our webpage. BioHPC currently manages 199 hosted servers.

CAC works with research groups to architect performant HPC or GPU clusters for their needs. CAC’s professional systems staff provide full systems management and maintenance—handling software updates, server and network upkeep, power, cooling, and more—so you can focus on your research, not infrastructure. Clusters are deployed with OpenHPC and Rocky Linux 9, using a slurm scheduler. Research groups retain administrator access and control users and groups through the CAC portal.

Storage

2.7PB of networked Lustre storage is available, divided into a “fast” storage pool (suitable for some direct computing) and a “safe” pool, designed for longer term storage.

Individual servers have high-speed local storage suitable for intensive workloads. Local storage configuration for hosted servers is tailored to your needs; some hosted servers have over 500Tb of attached storage.

BioHPC offers user-configurable, automated backups of storage, hosted in a separate building at Cornell.

Storage can be accessed from outside of the BioHPC environment using NFS, SMB, SSHFS, and Globus.

CAC provides a 1.9PB Ceph cluster which offers:

·        Volumes – persistent disk storage that can be attached/detached from instances.

·        Object Storage – fully Amazon S3-compatible; access via any S3 client/SDK or through the Red Cloud Object Storage collection on Globus.

·        Distributed File System – mount CephFS on Linux instances for reliable, high-performance storage.

In addition to Ceph storage, CAC provides globus-accessible Archive storage for long-term retention of data.

Consulting and Research Support

The Bioinformatics Facility team includes PhD level researchers who divide their time between BioHPC maintenance/development, user support, and research collaborations.

 

Basic consultations and support, delivered via e-mail support, Zoom office hours and workshops, are included in BioHPC rental and hosting fees. Topics range from basic system and software usage questions to data management issues, pipeline optimization, help with design of research projects and grant applications.

 

We also provide long-term, in-depth support for Life Sciences research programs through hourly consulting or by committing time to grants.

 

 

CAC provides a wide range of consulting and research software engineering services:

·        Proposal and Project Development

·        AI and Machine Learning research support

·        Cloud consulting and applications for Red Cloud, AWS, Azure, Google and more

·        Help getting started on NSF ACCESS-CI and NAIRR resources

·        Software Design and Development from specialized software components to web application portals

·        Code improvement and performance optimizations

·        Database design and implementation

·        Data visualization

·        Workflow, data management, and automation

·        Instrument control

·        Workshops and Training



Website credentials: login  Web Accessibility Help