Navigating the EC2 Instance Maze
Picture this: You're ready to launch an EC2 instance, and suddenly, you're faced with an alphabet soup of options - t2.micro, t3a.xlarge, m6a.4xlarge, c6i.2xlarge, x2iezn.metal. With AWS offering well over 750 instance types, choosing the right one can feel overwhelming. Don't worry - this guide will help you navigate through the maze of AWS instance types and make informed decisions for your workload.
The Foundation: Understanding Nitro
Before diving into instance types, let's discuss something fundamental: AWS Nitro. Nitro is AWS's underlying virtualization platform that powers modern EC2 instances. It's a collection of purpose-built hardware and software components that provide:
- Enhanced security through hardware-based isolation
- Improved performance with dedicated hardware acceleration
- Better networking capabilities
- More consistent performance across instance types
Pro Tip: As a general rule, prefer instance types that support Nitro. Nitro is used for all new generations and offer better performance and security features. Previous generation instances lack support for integrations such Network Load Balancers and other common features and the performance different can be noticeable.
Instance Naming Conventions
AWS instance names might look like cryptic codes, but they follow a somewhat logical pattern. Let's break down an instance name like "c7gn.xlarge":
- Instance Family (c): Indicates the use case (e.g., compute optimised)
- Generation Number (7): Higher numbers mean newer hardware
- Additional Capabilities (gn): Processor type or/and special features
- Size (xlarge): Determines resources allocated per underlying physical host
Instance Families
AWS provides groupings of instance families, which are types of hardware targeted for specific use cases.
- General Purpose (m, t)
- Balanced compute, memory, and networking resources. Best for applications that use these resources in equal proportions (web servers, development environments, small databases, code repositories)
- Compute Optimised (c)
- Optimized for compute-intensive workloads. Ideal for batch processing
- Memory Optimised (r, u, u-1, x, z)
- For workloads that process large datasets in memory. Perfect for databases, distributed web scale cache stores, real-time big data analytics
- Storage Optimised (I, Im, is, d, g)
- High, sequential read/write access to large datasets on local storage. Built for data warehousing, log processing, distributed file systems
- Accelerated Computing (p, g, trn, inf, dl, f, vt)
- Hardware accelerators or co-processors for graphics and data pattern matching. Designed for machine learning, video encoding, 3D visualizations
- High-Performance Computing (hpc)
- Optimized for high-performance computing workloads requiring high levels of inter-instance communication and network performance. Used for complex scientific simulations, financial risk modeling, weather prediction
Generation
The generation number of an instance type refers to the hardware generation. These can sometimes be consistent across different types, such as general, compute-optimised, and memory-optimised. Specialised instance types typically increment on their timelines.
For the latest generations of General-purpose, Compute and Memory-optimized instances, the generation + CPU type combination results in the same underlying physical CPU type. This trend follows for additional capabilities - you can visualise these additional capabilities as running the same hardware, but using additional nitro cards to support things like additional networking (n
) or onboard storage (d
).
- m6a, c6a, r6a = AMD EPYC 7R13 Processor
- m6i, c6i, r6i = Intel Xeon 8375C (Ice Lake)
- m6g, m6gd = AWS Graviton 2 Processor
- m8g, c8g, r8g = AWS Graviton 4 Processor
As a result, when moving across the same generation of instances, you can be reasonably confident you will get very similar CPU performance from two comparable instances. For example, using benchmarking tools such as PassMark in CPU-bound tests, you will get similar results for r8g.4xlarge, c8g.4xlarge and m8g.4xlarge instances. You can observe this pattern on published benchmark websites, such as RunsOn.
This makes capacity planning and instance type selection slightly more straightforward to understand. However, it should be noted this rule does not apply to older instances; for example, the c5
uses Intel Xeon Platinum 8124M vs the m5
's Intel Xeon Platinum 8175. As with all parts of EC2 instance selection, there are exceptions.
Processor Types and Architecture Indicators
For new generations, the letters after the generation number tell you essential information about the processor, for example:
- a: AMD processors (e.g., c7a = AMD EPYC 9R14)
- i: Intel processors (e.g., c7i = Intel Xeon Sapphire Rapids)
- g: AWS Graviton processors (e.g., m7g = AWS Graviton 3)
This convention can break down for specialised instance types which don't come with CPU distinctions, such as the g6
or p5
instances. In these cases, instances are typically provisioned with Intel CPUs.
Additional Capabilities
Further letters highlight additional capabilities of specific instance types:
- z: High-frequency CPU boost (ex. all-core Turbo up to 4.5Ghz)
- b: Block storage optimised - comes with additional EBS networking capacity
- e: High memory-to-CPU ratio
- n: Enhanced networking capacity
- d: Instance storage (local SSDs)
- flex: Burstable CPU
Instance Size
The sizing approach is relatively consistent across instance types. Instance sizes affect CPU and memory allocation, but they have other impacts, as discussed below.
AWS use the term vCPU to describe an allocation of the hosts CPU, for fixed instance types (not burstable), the whole CPU core is typically dedicated to the launched instance.
The term vCPU comes with the following rules (and exceptions!):
- For Intel/AMD, an instance is assigned a minimum of 1 CPU core (to avoid specific CPU vulnerabilities such as sidelining). This core is typically multithreaded using Simultaneous Multi-Threading (SMT). Therefore, 2 vCPU is one underlying multithreaded CPU core. This rule currently has exceptions for M7a, R7a, C7a instances, T2 instances, and m3.medium [1].
- For Graviton/ARM, each vCPU is a dedicated single-threaded core.
The labelling for sizes follows this table for "vCPUs".
Size Category | vCPU Allocation | Note |
---|---|---|
medium | 1 | Supported by instances with no-hyperthreading (ARM, c7a, m7a, r7a, t2, etc) |
large | 2 | |
xlarge | 4 | |
2xlarge | 8 | |
4xlarge | 16 | |
8xlarge | 32 | |
12xlarge | 48 | |
16xlarge | 64 | |
24xlarge | 96 | |
32xlarge | 128 | |
48xlarge | 192 | |
metal | Host Capacity |
Therefore, it follows that the following instance types all have 32vCPUs:
- m6a.8xlarge
- c6i.8xlarge
- r6g.8xlarge
The AMD and Intel instance types will have 32vCPU over 16 physical cores (unless you work with the exceptions listed above), and the Graviton instance type will be 32vCPU across 32 physical cores.
The hidden details that will catch you out
You've launched a c7i.2xlarge
, great, but wait - writing to EBS becomes slow after some time - why is this?
There are further aspects of the hardware which are tied to instance size; these are:
- Networking Performance
- EBS Baseline, Maximum Throughput and I/O Operations/second
- Allocatable GPUs or other accelerated Computing modules
- NVMe Storage Capacity
Looking at the c7i instance type, we can see that EBS throughput is limited on smaller instance sizes.
Such information is available in AWS Documentation. There is currently a bug in Vantage.sh which shows this EBS throughput information incorrectly.
Type | Networking (Gbps) | EBS MB/s Baseline | EBS MB/s Burst |
---|---|---|---|
c7i.large | Up to 12.5 | 81.25 | 1250.0 |
c7i.xlarge | Up to 12.5 | 156.25 | 1250.0 |
c7i.2xlarge | Up to 12.5 | 312.5 | 1250.0 |
c7i.4xlarge | Up to 12.5 | 625.0 | 1250.0 |
c7i.8xlarge | 12.5 | 1250.0 | 1250.0 |
c7i.12xlarge | 18.75 | 1875.0 | 1875.0 |
c7i.16xlarge | 25 | 2500.0 | 2500.0 |
c7i.24xlarge | 37.5 | 3750.0 | 3750.0 |
c7i.metal-24xlarge | 37.5 | 3750.0 | 3750.0 |
c7i.48xlarge | 50 | 5000.0 | 5000.0 |
c7i.metal-48xl | 50 | 5000.0 | 5000.0 |
Important: Check the full details of the instance type you are launching to understand the limitations! It's not just CPU + Memory!
When running between .large
-> .4xlarge
, instances can typically be throttled on EBS throughput, networking throughput, and IOPS. This is worth bearing in mind if you have a persistent high-throughput workload.
So, how do I decide on an instance type?
With this knowledge, some key questions come to mind:
- Does the workload require dedicated CPU allocation?
- No - Use latest-generation burstable (t3, t3a, t4g, m7i-flex) to save costs
- Yes - Continue
- Does the workload have any special requirements?
- Extra low latency instance store local SSD => Storage Optimised
- GPU / Inference / FPGA => Accelerated Computing Instances
- HPC workloads (200Gbps interconnects) => HPC Instances
- Is a specific balance of CPU and Memory required?
- Compute? 1CPU:2GB => Compute Optimised
- Balanced? 1CPU:4GB => General Purpose
- Memory? 1CPU:8GB => Memory Optimised
- Are there any requirements for EBS throughput and Network Throughput?
- This could determine the minimum Instance Size selectable!
- Check the full specifications of the instance
- This could determine the minimum Instance Size selectable!
- If the size is just right, but a specific capability is required, there may be an additional capability instance type:
- c6i.large = Up to 12.5Gbps
- c6in.large = Up to 25Gbps (Network enhanced)
Use Vantage.sh has a great tool to review instance specs, see below the c6a.large
instance type.
Final Thoughts: Don't Over-Optimise Early
Remember: EC2 instances aren't permanent decisions. Start with a reasonable choice, monitor your application's performance, and adjust as needed. The beauty of cloud computing is its flexibility.
Start Smart:
- Go with recent generations (m6a or higher is a solid starting point)
- If cost is your primary concern, try Graviton instances (ex. r8g)
- Benchmark your workload between Intel, AMD and Graviton to see what works best for you
Once you've gathered real-world performance data, you can optimize further and commit to savings plans or reserved instances for cost optimisation.
Check out tools like Vantage.sh for detailed instance specifications and pricing comparisons. They provide valuable insights into instance capabilities and can help you make more informed decisions.
Interested in how AWS instances translate to real-world metal? Check out my other post here!