What’s the best way to run Jenkins on AWS? As Jenkins is still a popular automation server used for continuous integration and deployment, consulting clients have engaged me to design and implement a cloud architecture for Jenkins several times in recent years. This blog post summarizes my learnings and presents a modern architecture for Jenkins on AWS.
As shown in the following figure, our architecture consists of the following building blocks:
- An Auto Scaling Group (ASG) to ensure an EC2 instance running the Jenkins controller is up and running 24/7.
- The Jenkins plugin ec2 launches and terminates EC2 instances acting as Jenkins agents automatically.
- An Application Load Balancer (ALB) acts as the entry point for users and forwards the incoming request to the Jenkins controller. On top of that, the ALB terminates HTTPS and handles authentication.
- The Cognito User Pool provides a secure way to authenticate with Jenkins. By the way, replacing the Cognito User Pool with any OpenID Connect compatible identity provider is possible.
- An Elastic File System (EFS) is used to ensure no data is getting lost when replacing the Jenkins controller in case of a failure or rolling update.
- Backup takes snapshots of EFS to be able to restore data from a point-in-time snapshot. For example, to recover from a human error.
- All EC2 instances send metrics and logs to CloudWatch. Alarms notify operators in case of problems. Also, operators have access to centralized logs to analyze issues.
- The Simple E-Mail Service allows Jenkins to send notifications to engineers. For example, to notify about failed jobs.
- Engineers access EC2 instances using the Sytems Manager Session Manager in a very secure manner. The Session Manager replaces SSH and, therefore, publicly accessible inbound ports.
I do not recommend running Jenkins on a single EC2 instance. Next, you will learn how I set up Jenkins more professionally, taking high availability, scalability, security, and operations into account.
High Availability
The impact of a Jenkins downtime is huge for many organizations. Engineers get blocked, and deployments become impossible. Therefore, ensuring high availability is crucial. Unfortunately, Jenkins cannot be run as a cluster; only one Jenkins controller should run at a time.
Also, the ASG allows rolling out new versions of the Jenkins controller with short downtime. In this case, the running EC2 instance is terminated and replaced by a new instance.
I recommend using an Auto Scaling Group (ASG) to ensure a single Jenkins controller is running 24/7. If the EC2 instance fails, the ASG will launch a new EC2 instance in another availability zone. As all data is stored on EFS, which distributes the data among multiple availability zones out-of-the-box, Jenkins will recover within minutes.
It is worth noting that the Application Load Balancer will make sure that the requests from the engineers get routed to the active Jenkins controller. No need to update DNS names or IP addresses.
Scalability
I remember a consulting gig where the client ran a CI/CD infrastructure in-house. Every Friday afternoon, I wanted to deploy my changes before leaving the office to catch the train back home. But on Fridays, the CI/CD infrastructure was heavily utilized. I was not the only one preparing for the weekend. Sometimes, I had to wait more than 30 minutes for my build job to start.
Your automation server should not slow down engineers because of limited resources. Instead, the number of Jenkins agents should scale automatically to balance peak loads.
Therefore, I recommend the Jenkins plugin ec2, which launches and terminates EC2 instances acting as Jenkins agents as needed. The plugin even allows you to utilize spot instances for cost efficiency.
Security
In my observation, many organizations underestimate the potential damage caused by a security breach in their Jenkins infrastructure. An attacker gaining access to Jenkins could most likely deploy changes to production or delete the whole production infrastructure, including all databases and backups.
First, a Jenkins server available via the Internet enables a smooth experience for remote workers. However, I am skeptical about relying on the authentication layer built into Jenkins. I prefer delegating authentication to the Application Load Balancer (ALB). Luckily, the ALB supports a widespread standard called OpenID Connect. If your organization does not provide a user directory with OpenID Connect support yet, the Coginit User Pool offered by AWS is a good fit.
Second, you should encrypt data in transit and at rest. Luckily, the ALB handles HTTPS out-of-the-box. Jenkins and EFS provide TLS encrypted connections as well. Besides that, encryption-at-rest is only a configuration option away for EBS and EFS.
Third, I cannot stress the importance of least privilege for IAM roles. You should grant minimal access to AWS services for EC2 instances running the Jenkins controller and agents.
Operations
Although, operating Jenkins professionally creates immense value, there is no glory in managing Jenkins in most organizations. Therefore, it is even more critical to get the job done with as little effort as possible.
I recommend to following tools to operate Jenkins with little effort:
- Use CloudWatch to define alarms on metrics like
CPUUtilization
of the EC2 instanecs,HTTPCode_ELB_5XX_Count
andTargetConnectionErrorCount
of the ALB, andPercentIOLimit
,BurstCreditBalance
as well asMeteredIOBytes
of EFS. - Configure all EC2 instances to send logs to CloudWatch. This allows you to search through the log messages of all instances – even those already terminated – to debug issues.
- Use the Session Manager provided by the Systems Manager to connect to your EC2 instances securely. No need to establish an SSH connection over the Internet anymore.
Summary
How to set up Jenkins on AWS? A few learnings.
- Launching a single EC2 instance is insufficient to build a highly available and scalable solution.
- Leveraging AWS services like the Application Load Balancer, an Auto Scaling Group, an Elastic File System enables high availability for Jenkins and avoids costly downtimes.
- The topic of security is critical when running Jenkins. For example, I recommend implementing a separate authentication layer via the ALB and OpenID Connect.