This project demonstrates best-in-class standards and practices for running an Elasticsearch cluster on Azure Kubernetes Service (AKS) and aims to implement as many best practices recommended by Azure.
- Ingress with valid certificates and DNS is not yet implemented.
- AKS Backup instances are not automatically configured for backup feature flag.
Note: Backup requires Storage Accounts with
Allow Storage Account Key Access
offering no support for Managed Identity -- Bad Practice.
Note: Flux Configurations with AzureBlob don't work with NAT Gateway so requires Public Network Access -- Bad Practice
-
Enable Threat Protection with Defender for Containers.
Enable Defender for Containers to help secure containers. Defender for Containers can assess cluster configurations and provide security recommendations, run vulnerability scans, and provide real-time protection and alerting for Kubernetes nodes and clusters.
-
Use Microsoft Entra ID and Kubernetes role-based access control (Kubernetes RBAC) to secure API server access.
Secure access to the Kubernetes API server. To control access to the API server, integrate Kubernetes RBAC with Microsoft Entra ID. Enables MFA for API server access.
-
Upgrade AKS clusters to the latest Kubernetes version with Automatic Upgrades.
Stay current on new features and bug fixes, with automated regular upgrades of the Kubernetes version in the AKS cluster.
-
Keep nodes up to date and automatically apply node os security patches automatically.
Linux nodes in AKS get security patches through their distro update channel nightly.
-
Use Azure Linux for the nodes.
The Azure Linux Container Host is an operating system image that's optimized for running container workloads on Azure Kubernetes Service (AKS). Microsoft maintains the Azure Linux Container Host and based it on CBL-Mariner, an open-source Linux distribution created by Microsoft.
-
Disable SSH access to the nodes using AKS Preview feature DisableSSHPreview.
To improve security and support your corporate security requirements or strategy, AKS supports disabling SSH (preview) both on the cluster and at the node pool level.
-
Prohibit changes made directly to resources in the node resource group using AKS Preview feature NRGLockdownPreview.
Prevent changes from being made to the node resource group, can apply a deny assignment and block users from modifying resources created as part of the AKS cluster.
-
Scan for and remediate image vulnerabilities.
Verify the security of images and runtime used in applications being hosted in the cluster.
[!NOTE] This solution is using the latest versions of Elastic Cluster operator but allows specific versions to be specified.
-
Automatically trigger and redeploy container images when a base image is updated.
Use automation to build new images when the base image is updated. Since updated base images typically include security fixes, update any downstream application container images.
[!NOTE] This solution uses flux but does not currently use the image automation controller.
-
Remove any unused images from the node to reduce vulnerabilities using Image Cleaner
Perform automatic image identification and removal, which mitigates the risk of stale images and reduces the time required to clean them up.
-
Use pod security context to limit access to processes and services or privilege escalation
Run as a different user or group and limit access to the underlying node processes and services, define pod security context settings. Assign the least number of privileges required.
[!NOTE] This solution does not currently use App Armor.
-
Use a Microsoft Entra Workload Identity to authenticate the workload to Azure services.
A workload identity is an identity used by an application running on a pod that can authenticate itself against other Azure services that support it, such as Storage or Key Vault.
-
Use Azure Key Vault with Secrets Store CSI Driver to manage secrets at runtime.
Secrets Store CSI Driver is an open-source CSI driver that lets you store secrets in Azure Key Vault and mount them as files for use by workloads running in your cluster.
-
Use Policy to watch for and alert on best practices using AKS Policy Deployment Safeguards.
Deployment safeguards enforce Kubernetes best practices in your AKS cluster through Azure Policy controls.
-
Use Azure CNI Overlay for enhanced network security and performance.
Cluster nodes are deployed into an Azure Virtual Network (VNet) subnet. Pods are assigned IP addresses from a private CIDR logically different from the VNet hosting the nodes. Pod and node traffic within the cluster use an Overlay network. Network Address Translation (NAT) uses the node's IP address to reach resources outside the cluster.
-
Use a managed NAT Gateway to provide outbound access to the internet.
NAT Gateway is a managed service that provides outbound internet connectivity for the AKS cluster which then allows network traffic isolation to other azure resources such as Key Vault with network firewall rules.
-
Use a managed App Routing or Service Mesh add-on to route external traffic to the cluster.
App Routing is an nginx ingress controller add-on for AKS that provides a fully integrated ingress controller for applications running in the cluster.
-
Use Azure Managed Disks for storage.
Azure Managed Disks are block-level storage volumes that are attached to a VM for the purposes of storing data.
-
Use Azure AKS Backup to back up the cluster.
Azure Backup is a fully managed backup service for Azure resources. It provides a secure and reliable way to back up and restore data from Azure resources.
-
Use Azure Verified Modules to deploy the infrastructure as code.
Azure Verified Modules are pre-validated modules that are designed to work together seamlessly to deploy infrastructure as code.
-
Use Azure GitOps Configurations to deploy the workload applications.
Azure GitOps Configurations are a set of tools and services that enable you to use Git as a single source of truth for your infrastructure and workload applications.
-
Use Application Configuration Provider to manage feature flags and configuration information.
Application Configuration Provider is a fully managed feature flag and configuration management service for applications running on AKS.
-
Use Node Auto Provisioning to reduce the number of nodes in the cluster and optimize for cost.
Node Auto Provisioning is a feature that automatically provisions nodes in the cluster based on the configuration specified in the node template.
-
Use AKS managed KEDA to scale the cluster pods based on the workload.
KEDA is a Kubernetes-based Event Driven Autoscaler. It provides a way to scale Kubernetes workloads based on events from external systems.
-
Use Vertical Pod Autoscaler to ensure the proper memory and cpu resources are allocated for pods.
When configured, the VPA automatically sets resource requests and limits on containers per workload based on past usage. The VPA frees up CPU and Memory for other pods and helps ensure effective utilization of your AKS clusters.
Note: Observability is not yet completed.
- Managed Prometheus
- Container Insights
- Azure Managed Grafana
- Container Insights Workbooks
- Azure Policy Dashboards
- Prometheus Alert Rules
- Azure Action Groups
-
Test Workload (Test Stamp)
-
Elastic Search (Elastic Stamp)
-
PostgreSql (PostgreSql Stamp) -- Not yet implemented.
-
Redis (Redis Stamp) -- Not yet implemented.
-
Airflow (Airflow Stamp) -- Not yet implemented.
To use AKS Automatic in preview, you must register feature flags for other required features. Register the following flags using the az feature register command.
az feature register --namespace Microsoft.ContainerService --name EnableAPIServerVnetIntegrationPreview
az feature register --namespace Microsoft.ContainerService --name NRGLockdownPreview
az feature register --namespace Microsoft.ContainerService --name SafeguardsPreview
az feature register --namespace Microsoft.ContainerService --name NodeAutoProvisioningPreview
az feature register --namespace Microsoft.ContainerService --name DisableSSHPreview
az feature register --namespace Microsoft.ContainerService --name AutomaticSKUPreview
Verify the registration status by using the az feature show command. It takes a few minutes for the status to show Registered:
az feature show --namespace Microsoft.ContainerService --name AutomaticSKUPreview
When the status reflects Registered, refresh the registration of the Microsoft.ContainerService resource provider by using the az provider register command: