Yusuf Elnady Logo
Back to Notes

Azure Virtual Machines

Last updated: 10/1/2025
  • Azure Virtual Machines (Intro, Supported OS, Advantages)
  • Azure Virtualization (Intro, Type 1 and 2, Azure Hypervisor Vs. Hyper-V, BareMetal Infrastructure on Azure, Running Azure on-premises? [Azure Stack HCI])
  • Azure VM Images:
  • Containers
  • Azure Data Centers / Geographies / Regions
  • Azure Virtual Machine Availability
    • Availability Zones (Zonal Architecture (IaaS resources), Zone-redundant Architecture (PaaS services))
    • Availability Sets (AZs)
    • Paired Regions
    • Virtual Machine Scale Set
  • Azure Pricing Options (Free Azure Options, Pay as You Go, Savings Plan, Reserved Virtual Machine Instances, Spot Virtual Machines)
  • Azure Virtual Machine Sizes Naming Conventions

Pricing: https://azure.microsoft.com/en-us/pricing/details/machine-learning/

Azure VM Images

  • Always going down to configuring a new VM whenever needed is a daunting task.
  • Hence, Azure provides pre-configured VM images to choose from the Azure marketplace.
  • Azure VM Gallery: Like the Azure Marketplace, a gallery is a repository for managing and sharing images and other resources.

An image source can be an existing Azure VM that is either generalized or specialized, a managed image, a snapshot, or an image version in another image gallery. It is pre-configured via various providers to meet the requirements of a few popular and conventionally used OS setup.

Image definitions are created within an Image gallery and carry information about the image and requirements for using it internally. This includes whether the image is Windows or Linux, release notes, and minimum and maximum memory requirements. It is a definition of a type of image.

Whenever you want to fire a VM, you basically choose from an existing VM image as below.

💡
If you can’t find your desired VM SKUs مواصفات when you attempt to create the VM, it could be possible that your Azure subscription doesn’t have enough quota to deploy GPU-intensive VMs, or the region doesn’t support the SKUs, or your Azure offer doesn't support GPU.

Containers

Containers are a different kind of virtualization that are often touted as alternatives to using virtual machines.

A container is a lightweight, standardized digital vessel in which you can store everything you need to run a piece of software. Placing code, configurations, and dependencies into a container means that you can run that software anywhere, and because it’s “boxed up,” the software is easy to move between environments.

Often, when you transfer software from one place to another, you can’t be sure it’ll run reliably. It can be like trying to put a puzzle piece into a hole that it wasn’t designed to fit into. But by putting a program into a container and running it from there, you can be sure that your application is going to “fit” anywhere you put it, making deployments quick, reliable, and consistent no matter what environment you’re using.

Think of it like zipping up files, except a container zips up code and settings into a uniform box that fits anywhere. If you’re moving a program from your developer’s computer to a sandbox, from a staging environment to a live production, or from a physical machine to a private or public cloud, using a container will make sure that that program will work the same way wherever you put it.

The main difference between a container and a virtual machine is that a virtual machine packages up the operating system along with the code. Containers tend to be smaller, so you can fit more containers than virtual machines onto a single server.

Azure Virtual Machine Availability (Zones, Sets, Virtual Machine Scale Set, Paired Regions)

Availability Zones (AZs)

  • Separated groups of datacenters within a region.
  • Each AZ has its own Power, Network, Cooling, and is fault-isolated against hardware failures, natural disasters, and other unforeseen events...
  • AZs are physically and logically separated.
  • AZs are used to protect applications and data from datacenter-level failures.
  • If AZ is enabled in a region, the region must have a minimum of 3 separate zones to achieve failover.
  • Each AZ is created from one or more data centers.
    • The mapping from AZs to data centers is dynamic — different subscriptions might have a different mapping order — as shown in the left image (YouTube).
  • Data traversing within or between regions is encrypted.
  • See Azure regions with availability zone support.

  • Azure services that support Availability Zones are divided into:
    • Zonal services: A resource is pinned to a specific zone, you configure the replications.
    • Zone-redundant services: Azure replicates automatically across zones. (not all services are supported here)

[1] Zonal Architecture (IaaS resources)

  • In this approach, resources are deployed to specific Availability Zones within a region.
  • Why? Choosing a specific AZ (to be the main one) (to pin the resource to it) allows meeting stringent الصارمة latency or performance requirements by placing resources closer to users or other necessary components in the same AZ.
  • In general, zonal configuration is supported by infrastructure as a service (IaaS) resources, like a virtual machine or managed disk.
  • By itself, a zonal approach doesn't give you resiliency. In the case of your chosen availability zone failure, the zonal services in the failed zone become unavailable until the zone has recovered.
    • In zonal architecture, the responsibility for replicating resources typically falls on the customer or the application architect.
    • Azure provides the infrastructure and tools necessary for replication, such as storage redundancy options, but it's up to the customer to configure and manage replication according to their specific requirements.
    • By yourself, you need to replicate your applications and data to one or more zones within the region so that you're resilient to a zone outage.
    • You can configure your zonal solution to be Synchronous replication or Asynchronous replication.
🚦
For most scenarios, it's a good idea to use synchronous replication across availability zones. Most applications aren't sensitive enough to be affected by the small amount of latency required for synchronous replication.
  • When creating a single VM, you have to select the AZ you want to put it into. The only time you can choose multiple AZ's is when you're creating multiple Virtual Machine Scale Set (VMSS).
    • VMSS will allows you to manage, configure, and scale load balanced machines
    • So if you select Zones 1,2 and 3 for your VM, then it will create 3 VM's, one in each zone.
    • I need to read more about VMSS

[2] Zone-redundant Architecture (PaaS services)

  • With zone-redundant architecture, the Azure platform automatically replicates the resource and data across zones.
  • In general, zone-redundant configuration is supported by platform as a service (PaaS) services, including Azure Storage, Azure Service Bus, Azure Application Gateway, VPN gateways, Azure SQL.
🚦
For many production workloads, a zone-redundant deployment provides the best balance of tradeoffs. If you aren't sure which approach to select, start with this type of deployment.
💡
Production workloads should be configured to use availability zones if the region support them. For mission-critical workloads, you should consider a solution that is both multi-region and multi-zone.
https://k21academy.com/microsoft-azure/architect/azure-availability-zones-and-regions/
https://k21academy.com/microsoft-azure/architect/azure-availability-zones-and-regions/

Availability Sets (AZs)

An availability set is nothing but a logical grouping of multiple VMs that help keep the application available during maintenance.

When the VM is part of an availability set, the Azure fabric updates are sequenced so not all of the associated VMs are rebooted at the same time. VMs are put into different update domains. Update domains indicate groups of VMs and underlying physical hardware that can be rebooted at the same time. Update domains are a logical part of each data center and are implemented with software and logic. The group of virtual machines that share common hardware are in the same fault domain. A fault domain is essentially a rack of servers.

VMs can be placed across fault domains and update domains depending on the need and function of applications run on them by the user.

Paired Regions

Virtual Machine Scale Set


Azure Pricing Options

[0] Free Azure Options:

  • Azure provides free 750 hours each of B1s, B2pts v2 (Arm-based), and B2ats v2 (AMD-based) burstable VMs (For Linux and For Windows) for 12 months to new customers only.
  • More Free Azure Services here

[1] Pay as You Go:

  • No long-term commitments or upfront payments, you can increase or decrease capacity as you need it and pay (by the second) only for what you use.
  • Expensive than Azure’s other pricing options.
  • Useful for mission-critical, unpredictable workloads.

[2] Savings Plan:

[3] Azure Reserved Virtual Machine Instances (RIs):

  • https://spot.io/resources/azure-pricing/azure-reserved-instances-the-complete-guide
  • Azure Reserved Instances (RIs) are an Azure pricing plan that can help you reduce cloud costs.
  • Azure RIs can be applied to many Azure services/products, not VMs only.
  • It offers discounts in return to a commitment to use Azure offerings for a duration of one or three years.
  • You can reduce costs by up to 72% when signing up for Azure Reservations.
  • The discount is applied upon billing and does not affect the runtime state of any Azure resources.
  • Azure lets you pay for reservations either upfront or on a monthly basis (no extra fee for monthly).
  • You should try to get the most out of reservations because unused reservations cannot be saved for later use.
  • If your usage exceeds the reserved capacity, you will be charged the pay-per-use rate.
  • Azure provides automated reservation recommendations by analyzing your hourly usage over the past 7, 30, and 60 days.
  • For example, if workload uses 100 VMs regularly, but occasionally demand spikes to 150, Azure will calculate potential savings for a reservation of 100 VMs as well as 150.
  • Reserved Instances are best suited for workloads with predictable usage patterns or steady-state workloads.
💡
EXCELLENT: You can exchange a reservation for another reservation of the same type. You can also refund a reservation, up to $50,000 USD in a 12 month rolling window, if you no longer need it.
  • To purchase a reservation —> select All Services > Reservations.

[4] Azure Spot Virtual Machines:

  • To spot a VM —> Use this link: https://azure.microsoft.com/en-us/pricing/spot-advisor/
  • Azure Spot Virtual Machines offer access to unused Azure compute capacity at significantly reduced prices compared to pay-as-you-go pricing.
  • Azure Spot VMs offer Azure compute capacity at 90% of the cost of pay-as-you-go.
    • Formerly known as Low Priority VMs.
  • IMPORTANT: The lower cost comes with the provision that these Azure spot instances can be taken away with minimal warning (with only a 30-second warning) if demand for capacity increases or instances are needed to service reserved instances or pay-as-you-go customers.
    • Azure evicts Spot VMs if Azure needs the capacity —> giving priority to regular pay-as-you-go VMs.
  • Whom it’s suitable for?
    • Spot VMs are suitable for workloads that are fault-tolerant and can handle interruptions, such as batch processing, dev/test environments, or workloads that can be checkpointed and restarted.
  • Customers can purchase and use compute between Azure Spot VMs, as well as with pay-as-you-go and reserved instance (RIs) options.
  • With Spot VMs, customers can bid on spare capacity and set the price they are willing to pay for compute.
  • Prices for an Azure Spot VM fluctuate based on demand and availability and vary based on the capacity for size or SKU in an Azure region or time of day.
  • When a Spot VM will be terminated?
    1. Capacity availability
    2. Your configured MAX price.
  • Scenario: When other customers request more compute, the Azure spot pricing will increase. With a max price set, your VMs will terminate if the cost exceeds the limit you set.
  • Running Azure spot instances are then managed just like any other Windows or Linux VM, and functions just like pay-as-you-go or reserved instances.
    These are the options I see when I create a VM normally from Azure Portal, it normally asks if I want to have it with Spot Discount.
    These are the options I see when I create a VM normally from Azure Portal, it normally asks if I want to have it with Spot Discount.

Azure Disks

  • Definition: Azure Disks refer to virtual hard disks (VHDs) that are used to provide persistent storage for virtual machines (VMs) running in the Microsoft Azure cloud platform.
  • Managed Disk Storage
    • (1) managed by Microsoft Azure.
    • (2) you don't need any storage account while creating a new disk.
    • (3) The storage account is managed by Azure you do not have full control of the managed disks.
  • Unmanaged Disk Storage
    • (1) requires you to create a storage account before you create any new disk.
    • (2) The storage account is created and owned by you, you have full control over all the data that is present on your storage account.
    • (3) Additionally, you also need to take care of encryption, data recovery plans, etc.
    • (4) No longer a thing
      • As of January 30, 2024, new customers won't be able to create unmanaged disks.
      • On September 30, 2025, customers will no longer be able to start IaaS VMs by using unmanaged disks. Any VMs that are still running or allocated will be stopped and deallocated
The managed disk is expensive as compared to unmanaged disks.
In 2017, we launched Azure managed disks. We've been enhancing capabilities ever since. Because Azure managed disks now have the full capabilities of unmanaged disks and other advancements, we'll begin deprecating unmanaged disks on September 13, 2022. https://learn.microsoft.com/en-us/azure/virtual-machines/unmanaged-disks-deprecation

Azure Managed Disks

  • Introduction to Azure Managed Disks
  • Managed Disks pricing
  • Managed disks are like a physical disk in an on-premises server but, virtualized.
  • With managed disks, all you have to do is specify the disk size, the disk type, and provision the disk.
  • They are persistent!
  • Persistent: data will be available through reboots, start/stop, or other lifecycle events.
  • Managed Disks are just one storage type, in addition to Azure Blob Storage, Azure Files, and Azure Disk Storage.
  • Four Disk Types: ultra disks, premium solid-state drives (SSD), standard SSDs, and standard hard disk drives (HDD).
  • For a VM, we have an OS disk (C:) and optionally Data Disks.
  • Managed disks are designed for 99.999% availability. (three replicas of your data)
  • Disk roles — For an Azure VM, we have three main disk roles (OS, Data, Temporary)
    • Lazeez: When you create a new VM, you choose the OS disk settings (OS Disk Size, OS Disk Type, Delete with VM, …).
      • You can additionally, attach or create data disks for your virtual machine.
      • Additionally, all VMs come with temporary storage.
    • The Data Disk
      • Store application data, or other data you need to keep.
      • labeled with a letter that you choose.
      • Each VM specifies the max data disks it can be attached to.
      • It’s optional.
    • The OS Disk
      • Every VM has one attached operating system disk.
      • That OS disk has a pre-installed OS, which was selected when the VM was created. This disk contains the boot volume.
    • The Temporary Disk
      • It’s not a managed disk
      • Discussed in detail here:
  • If you have a VHD from on-premises VMs, you can use Direct Upload and transfer your vhd to Azure through CLI or Powershell.
  • Attach a managed data disk to a Windows VM by using the Azure portal
Example of how you configure OS disks and Data disks for creating VM machines
Example of how you configure OS disks and Data disks for creating VM machines
Showing the OS disk we have for my GPU machine
Showing the OS disk we have for my GPU machine
💡
In general: Azure offers various storage solutions for persistent data storage, such as Azure Blob Storage, Azure Files, Azure Disk Storage, and Azure Managed Disks, which are designed for different use cases and requirements.

Premium SSD

Premium SSD v2

Standard SSD

Standard HDD

Ultra Disk

Ultra diskPremium SSD v2Premium SSDStandard SSDStandard HDD
Disk typeSSDSSDSSDSSD
ScenarioIO-intensive workloads such as SAP HANA, top tier databases (for example, SQL, Oracle), and other transaction-heavy workloads.Production and performance-sensitive workloads that consistently require low latency and high IOPS and throughputProduction and performance sensitive workloadsWeb servers, lightly used enterprise applications and dev/test
Max disk size65,536 GiB65,536 GiB32,767 GiB32,767 GiB
Max throughput10,000 MB/s1,200 MB/s900 MB/s750 MB/s
Max IOPS400,00080,00020,0006,000
Usable as OS Disk?NoNoYesYes

Azure Temporary Storage (D:)

  • In the previous section , we discussed that Azure VMs come with an OS disk to store persistent data, and you pay for it separately away from the VM pricing.
    • We also learned that we can attach Data Disks to Azure VMs.
  • In addition, every Azure VM – regardless of Linux or Windows – gets a temporary disk assigned automatically.
  • The disk size doesn’t have to be the same as the Temporary Storage size. —> You define the OS disk size in VM creation, but the temporary storage is part of the VM size you choose.
  • If Azure VMs moved from their current host to a new host at any time due to maintenance, hardware failures, or other reasons —> the data from the temporary storage will not be moved to the new host.
  • Really, the temporary disk should NEVER be used for data that has to be persistent. It’s non-replicated.
  • To avoid misconfiguration, the disk is labeled in the OS as “Temporary Storage” and includes a text file “DATALOSS_WARNING_README.txt”.
  • The size of the temporary disk varies, based on the size of the virtual machine (and its available physical memory).
  • It's intended for temporary data, such as the paging file (virtual memory), swap files, or SQL Server tempdb, or temporary files used by applications.
  • In Windows VMs, it’s always the D: drive.
  • On Azure Linux VMs, the temporary disk is typically /dev/sdb

  • Azure Temporary storage is measured using base-2 numbering systems (GiBs).
    • GB (Gigabyte) typically refers to a billion bytes, where each byte is considered as 8 bits. So, 1 GB is 1,000,000,000 bytes (or 10^9 bytes). This is the decimal-based measurement system.
    • GiB (Gibibyte), on the other hand, uses the binary-based numbering system, where each unit is based on powers of 2. So, 1 GiB is 1,073,741,824 bytes (or 2^30 bytes).
    • In a nutshell, the capacity numbers given in GiB may appear smaller, but they are not. [Smaller GiB means High GB] [Example: 1023 GiB = 1098.4 GB].
    • Kibibyte (KiB) (2^10), Mibibytes (MiB) (2^20), Gigibytes (GiB) (2^30), Tebibytes (TiB) (2^40), Pebibytes (PiB) (2^50), …

  • To ensure that you are not incorrectly using the temporary disk, we recommend that you take an action that will cause the temporary disk to be reset as part of your testing procedures.
    • The simplest method to reset the temporary disk is by changing the VM size.

How can it be useful to have Temporary Storage?

  • A great example of this type of data for Windows is the pagefile (i.e. swapfiles).
    • Remember in my ppt when I studied (Swap File / Page File / Paging File / Virtual Memory).
    • Created when a VM is powered on and deleted when it is powered off.
    • When a new Windows VM is provisioned from an image in Azure we configure the pagefile to be located on this temporary disk.

What is diskful

Azure Virtual Machine Sizes Naming Conventions

markdown
[Family]
  • Family: The VM Family Series (A, B, D, E, F, N, H)
  • Subfamily [Optional]: Used for differentiating within one Family, like in GPUs (N-Series), we have NC, ND, and NV.
    • V for example is for powerful remote visualization workloads and other graphics-intensive applications.
  • # of vCPUs: How many physical cores will get virtualized for your VM?
  • Constrained vCPUs [Optional]: Used for certain VM sizes only. Denotes the number of vCPUs for the constrained vCPU capable size
  • Additive Features: lower case letters denote additive features.
    • a = AMD-based processor (This is I found in NCasT4_v3 )
    • b = Block Storage performance
    • d = diskful (that is, a local temp disk is present); this feature is for newer Azure VMs, see Ddv4 and Ddsv4-series
    • i = isolated size
    • l = low memory; a lower amount of memory than the memory-intensive size
    • m = memory intensive; the most amount of memory in a particular size
    • p = ARM Cpu
    • t = tiny memory; the smallest amount of memory in a particular size
    • s = Premium Storage capable, including the possible use of Ultra SSD
    • C = Confidential
    • NP = node packing
  • Accelerator Type [Optional]: asd
  • Version: Denotes the version of the VM Family Series. Some versions retire

Example: Standard_NV48s_v3

  • 48 Cores (Virtual CPUs) from Intel E5-2690 v4.
    • Intel E5-2690 v4 has a max of 14 cores.
    • However, the beauty of vCPUs is that the underlying infrastructure is highly virtualized and abstracted from the physical hardware.
    • The hypervisor (Azure Hypervisor) picks any hardware from infrastructure that satisfies our requirements.
    • The physical hardware running Azure VMs is often based on various CPU architectures.
    • Azure abstracts away the underlying hardware details, allowing users to focus on provisioning VMs based on their performance and resource requirements rather than the specific hardware specifications.
  • Temporary Storage: shown in units of GiB or 2^30 bytes.
    • Discussed in detail here:
  • Max Data Disks: The maximum number of data disks that can be attached to a virtual machine (VM) instance.
  • GPU Memory:
  • Disk throughput:
    • Measured in input/output operations per second (IOPS) and MBps where MBps = 10^6 bytes/sec.

Data disks can operate in cached or uncached modes. For cached data disk operation, the host cache mode is set to ReadOnly or ReadWrite. For uncached data disk operation, the host cache mode is set to None.

To learn how to get the best storage performance for your VMs, see Virtual Machine and disk performance.

Expected network bandwidth is the maximum aggregated bandwidth allocated per VM type across all NICs, for all destinations. For more information, see Virtual Machine Network Bandwidth.

  • 448 GBs (The RAM of the VM, not the GPU)
  • Temp Storage (SSD) (GiB) of 1280 GB
  • Standard - recommended tier
  • N – GPU enabled (Nividia)
  • V – Remote visualization workloads and other graphics-intensive applications
  • s – Premium Storage capable
  • v3 – version
  • GPU Memory: 32 GB

Azure Machine Learning Compute Instance

Data Science Virtual Machines (DSVM)

  • If you need sudo command, and it asks for passwords go to azure portal —> VM —> Reset password —> Reset Password.
    • Another way to set password is using: sudo passwd azureuser
Unsupported block type: toggle
Unsupported block type: toggle
Unsupported block type: toggle
Unsupported block type: toggle
  • Install NVIDIA GPU drivers: Go to extensions in the VM on Azure Portal —> NvidiaGpuDriverLinux —> Voila
  • On this VM, I got multiple notebooks on '/home/azureuser/notebooks/sdkv1/how-to-use-azureml/machine-learning-pipelines/intro-to-pipelines'

Azure VM Family Series:

  • This documentation describes the available sizes and options for the Azure virtual machines
  • Virtual Machine Series Summary by Azure
  1. General Purpose (A-series / Bs-series / D-series)
    • Balanced CPU, memory, and storage resources.
    • Suitable for a wide range of workloads.
    • Cost-effective option for moderate workloads.
    • Use Cases: Web servers (Low-Mid Traffic), Small databases (Low-Mid), and Development/testing environments.
  2. Memory Optimized
  3. Compute Optimized
    • Designed for CPU-intensive tasks.
    • More powerful CPUs compared to general-purpose VMs.
    • Use Cases: Medium Traffic Web Servers, Network Appliances, Batch Processes, and Application Servers (Like the backend of Firebase).
  4. Storage Optimized (D-series):
  5. GPUs (N-series):
    • Check if a series is available at a region: https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/
    • The letter (N) stands for Nvidia GPUs.
    • One N-Series VM can have a SINGLE GPU, MULTIPLE GPUS, or Fractional GPU (Part of ONE GPU, a specific number of its cores).
      • Single GPU: Standard NC6s V3
      • Multiple GPUs: Standard NC12s V3
      • Fractional GPU: Standard NV12s V3
    • GPU Series
      1. NC Series — (NC, NC v2, NC V3, NCas T4 V3, NC A100 v4, NCads A10 v4, NCads H100 v5) Series — (High Performance Computing & Machine Learning Workloads) — (V100, T4, A100, A10)
      2. ND Series — (ND, ND V2, ND A100 V4, NDasr A100 V4, ND H100 V5) Series — (Deep Learning Training and Inference) — ()
      3. NV Series — (NV, NV V3, NV V4) Series — (Visualization)
      4. NG Series — (NGads V620) — (Gaming)
    • NVIDIA has different architectures, each architecture builds upon its predecessor.
    • The ND series is focused on training and inference scenarios for deep learning. It uses the NVIDIA Tesla P40 GPUs. The latest version – NDv2 – features the NVIDIA Tesla V100 GPUs.
    • The NV-series enables powerful remote visualization workloads and other graphics-intensive applications backed by the NVIDIA Tesla M60 GPU.

    The N-series has three different offerings aimed at specific workloads:

    NCsv3, NCsv2, NC, and NDs VMs offer optional InfiniBand interconnect to enable scale-up performance.

NCs V3 Series

  • GPU: NVIDIA Tesla V100 GPUs (Volta)
  • CPU: Intel Xeon E5-2690 v4 (Broadwell)
  • CUDA Cores: 5,120
  • Tensor Cores: 640 (Much Better than T4)
  • s = Premium Storage capable, including the possible use of Ultra SSD
  • Use cases: HPC workloads that will benefit from a performance boost, powering scenarios like reservoir modeling, DNA sequencing, protein analysis, Monte Carlo simulations, and others.
  • 1 GPU = one V100 card.
  • It can provide 1.5x the computational performance of the NCv2-series, which was Tesla P100 (June, 2016).
  • If we notice, NC24rs V3 has (r) —> Remote Direct Memory Access (RDMA) Enabled. [I don’t know much about it, most probably not needing it]
    • This allows applications running on the VM to take advantage of RDMA for high-performance, low-latency data transfers over the network.
    • It provides a low latency, high-throughput network interface optimized for tightly coupled parallel computing workloads.
💡
For this VM series, the vCPU (core) quota in your subscription is initially set to 0 in each region. (But generally available in East US & East US 2)

NCas T4 V3 Series

  • GPU: Nvidia Tesla T4 (Turing)
  • CPU: AMD EPYC 7V12 (Rome)
  • CUDA Cores: 2,560
  • Tensor Turing Cores: 320
  • a = AMD-based processor
  • s = Premium Storage capable, including the possible use of Ultra SSD.
  • Specifically designed for the AI and machine learning workloads.
  • Optimized for compute-intensive GPU-accelerated applications.
  • Use cases:
    • Ideal for deploying AI services- such as real-time inferencing of user-generated requests.
    • Interactive graphics and visualization workloads using NVIDIA's GRID driver.
    • CUDA, TensorRT, Caffe, ONNX, and other frameworks, or GPU-accelerated graphical applications based on OpenGL and DirectX can be deployed economically

NC A100 V4 Series

  • GPU: NVIDIA A100 PCIe (2020)
  • CPU: 3rd-generation AMD EPYC™ 7V13 (Milan)
  • Focused on midrange AI training and batch inference workload.
  • Flexibility to select one, two, or four NVIDIA A100 80GB PCIe Tensor Core GPUs per VM.
  • 1 GPU = one A100 card
  • It features up to 4 NVIDIA A100 PCIe GPUs with 80GB memory each, up to 96 non-multithreaded AMD EPYC Milan processor cores, and 880 GiB of system memory.
  • Pricing is only available on Azure ML Studio.
SizeCost
Standard_NC24ads_A100_v4$3.67/hr
Standard_NC48ads_A100_v4$7.35/hr
Standard_NC96ads_A100_v4$14.69/hr

NCads A10 v4 Series (Preview)

  • It’s mentioned only here, but only available in West US 3 (Preview).

NCads H100 v5 Series (Preview)

💡
So far, it seems Standard_NC6s_v3 or Standard_NC24ads_A100_v4

ND V2 Series

ND A100 V4 Series (MONSTER)

  • Not available in East US 2 😟

NDm A100 V4 Series

  • They are using NVLink 3.0 here, which I discussed in that identifies I can connect each 2 GPUs with each other.

ND H100 V5 Series


NV V3 Series

  • GPU: NVIDIA Tesla M60 (Aug, 2015)
  • CPU: Intel E5-2690 v4 (Broadwell)
  • CUDA Cores: 4096 NVIDIA CUDA Cores (2048 per GPU)
  • NO TENSOR CORES! NOT SUITABLE FOR AI
  • Standard NV12s V3 is the one Neill assigned to me earlier.
  • 1 GPU = one-half M60 card!!!
    • 1 x M60 GPU (1/2 Physical Card)
    • So, in NV12s V3, we actually got half an M60 Card, and a full one should have been 16 GB of vRAM as in NV24s V3.
    • But actually, the M60 board has 2 GPUs, that’s why.
  • Use cases: The NV series is mostly for graphic needs related to engineering and 3D modeling.
  • Data Sheet

NV V4 Series

NVads A10 V5 Series


NGads V620 Series

optimized for high-performance, interactive gaming experiences hosted in Azure. They're powered by AMD Radeon PRO V620 GPUs and AMD EPYC 7763 (Milan) CPUs.


Tensor Cores

  • Introduced in Volta, and advanced with each generation.
  • It performs matrix-matrix multiplications and addition with support of mixed precisions.

Analysis of a Tensor Core

Nvidia P100 (2016) (Retired)

Nvidia V100 (2017)

NVIDIA V100 offers advanced features in the world of data science and AI. It comes with the facility of optimum memory usage. The 32 GB models of NVIDIA V100 graphics card can compile the tasks of 100 computers into one computer at a time. As a result, it becomes pretty efficient.

Nvidia A100 (2020)

  • It’s a powerhouse! a monster! an intimidating, gargantuan titan of unparalleled magnitude!
  • Designed for data center applications, including deep learning, high-performance computing (HPC), and data analytics.
  • The A100 is powered by the NVIDIA Ampere Architecture.
    • NVIDIA has developed other architectures, each architecture builds upon its predecessor.
    • Volta —> Turing —> Ampere —> Ada Lovelace —> Hopper
  • A100 provides up to 20X higher performance compared to the previous generation, which is Volta.
    • A100 is the predeccessor to V100.
  • Memory: 80GB (There was 40GB but it’s discontinued, and no longer produced)
💡
The A100 80GB debuts تقدم the world’s fastest memory bandwidth at over 2 terabytes per second (TB/s) to run the largest models and datasets.
  • Side note: M1 Pro offers up to 200GB/s of memory bandwidth & M1 Max delivers up to 400GB/s of memory bandwidth. ——— A100 is way faster.
  • NVIDIA Multi-Instance GPU (MIG): An A100 GPU Card can be partitioned into as many as seven GPU instances, fully isolated at the hardware level with their own high-bandwidth memory, cache, and compute cores.
  • NVIDIA NVLink Bridge: It is a high-speed point-to-point peer transfer connection, where one GPU can transfer data to and receive data from one other GPU.
    • A100 80GB card supports NVLink bridge connection with a single adjacent A100 80GB card.
  • Number of Transistors: 54.2 billion transistors with 7 nm process.
  • Cores: The A100 doesn't rely solely on the number of traditional CUDA cores for its processing power:
    1. Pipelines / CUDA Cores: 6,912 CUDA cores.
    2. Tensor Cores: This is where the A100 shines in AI and HPC applications.
  • No DirectX support, no Ray Trace Cores, Mainly full of Tensor Cores for AI stuff. Even the number of CUDA Cores is not huge compared to GeForce 3090 for example.
  • Ampere Architecture:
    1. Third-Generation Tensor Cores
    2. Multi-Instance GPU (MIG):
      • Not every application requires the full power of a GPU.
      • MIG, available on A100 and A30 GPUs, allows partitioning a GPU into multiple instances.
      • Each instance is isolated, secured, and has its own memory, cache, and compute cores.
    3. Third-Generation NVLink 3.0:
      • For scaling across multiple GPUs with rapid data movements across them.
      • Doubles the GPU-to-GPU direct bandwidth to 600 gigabytes per second (GB/s).
    4. Sparsity
    5. RT Cores
    6. Memory Acceleration

TF32

FP32

FP16

FP64

BFLOAT16

INT8 Tensor

TFLOPS stands for TeraFLOPS, which stands for Trillion Floating-Point Operations Per Second. It is a measure of computer performance, particularly relevant in fields of scientific computations that involve floating-point calculations. When we talk about TFLOPS, we’re essentially quantifying how many calculations a computer system can perform in one second using special math called floating-point arithmetic123. For example, if a system has a performance rating of 1 TFLOP, it can execute one trillion floating-point operations (additions, multiplications, etc.) per second. This metric is commonly used to assess the computational power of GPUs, CPUs, and supercomputers.

  • Tensor Cores: This is where the A100 shines in AI and HPC applications. Tensor Cores are specialized cores designed to accelerate AI workloads and specific HPC tasks. They achieve this through:
    • Higher Throughput: Tensor Cores can perform certain operations, like matrix multiplication, significantly faster than traditional CUDA cores. This translates to quicker training times for deep learning models and faster processing of HPC applications that rely on similar math operations.
    • TF32 Precision: A100 introduces a new data format called TF32. This format offers a balance between precision (typically provided by FP32 format) and performance (often achieved by FP16 format). TF32 allows for efficient training of AI models while maintaining good accuracy.
    • Double Precision Support: Unlike previous generations, the A100’s Tensor Cores can handle double-precision (FP64) calculations. This is a major leap for HPC tasks that require higher precision for scientific simulations and other computationally intensive workloads.

https://huggingface.co/spaces/Vokturz/can-it-run-llm

  • For Llama 2 70B —> you need 48GB VRAM to fit the entire model. That means 2x RTX 3090 or better.
  • what‘s the difference between A100 in NC and ND?

Nvidia H100 (2023)

https://resources.nvidia.com/en-us-tensor-core/nvidia-tensor-core-gpu-datasheet

Quick Comparison

● NVIDIA A30 provides ten times higher speed in comparison to NVIDIA T4.

● Like NVIDIA A100, NVIDIA V100 also helps in the data science fields. But the NVIDIA V100 is not suitable to use in gaming fields.

● RTX 8000 is the best NVIDIA graphics card for gaming.

  1. -High-Performance Compute (H-series):
    • These instances provide premium CPU support and resources and high-throughput network interfaces, such as RDMA. These VMs were meant for high compute, mission-critical workloads.

    These VMs are the fastest and most powerful available on Azure, built to handle intense predictive scenarios like financial risk modeling and scientific simulations like weather and molecular modeling.

Azure services cost money. Azure Cost Management helps you set budgets and configure alerts to keep spending under control.
there is no additional charge to use Azure Machine Learning. However, along with compute, you will incur separate charges for other Azure services consumed, including but not limited to Azure Blob Storage, Azure Key Vault, Azure Container Registry, and Azure Application Insights.

Which One for Deep Learning: https://learn.microsoft.com/en-us/azure/virtual-machines/sizes-gpu

Virtual Machine Scale Sets? / Azure Load Balancer

Storage Accounts, Disks, Blob Storage, File Storage

IOPS (Input/Output Operations Per Second) [Speed]

  • Definition: The maximum number of input/output operations that a storage device can perform per second.
  • Easy Explanation: Think of it as the maximum speed at which a storage device can handle read and write requests in a second.

Max Throughput [Volume]

  • Definition: The maximum amount of data that can be transferred in a unit of time (typically measured in MB/s or GB/s).
  • Easy Explanation: It's like the maximum bandwidth or data transfer rate of a storage device.

Bursting انفجار (with both Max IOPS and Max Throughput):

  • Definition: Bursting allows a storage resource to temporarily exceed its baseline performance level for a limited duration.
  • Easy Explanation: Bursting is like a short burst of extra speed. It allows the storage device to go faster than its usual maximum performance for a short period, typically to handle temporary spikes in workload demand.

Storage Account:

A Storage Account is a general-purpose storage solution provided by Azure. It can store various types of data, including blobs, files, tables, queues, and disks. Blob storage within a Storage Account is typically used for storing unstructured data such as images, videos, documents, backups, and logs. File storage within a Storage Account provides a fully managed file share in the cloud, accessible via the SMB protocol. Tables and queues are used for storing structured data and message queues, respectively. A Storage Account is suitable for a wide range of storage scenarios and provides features like high availability, durability, scalability, and security. Disk:

A disk in Azure refers to a virtual hard disk (VHD) that can be attached to a virtual machine (VM) to provide persistent storage for the VM's operating system, applications, and data. Disks can be either managed disks or unmanaged disks. Managed disks are recommended because they offer simplified management, better reliability, and integration with Azure features like snapshots, backups, and availability sets. Managed disks are created and managed within the Azure Disk Storage service, which handles tasks such as replication, encryption, and maintenance. Disks attached to VMs are typically used for primary storage, where the operating system and applications are installed, or for data storage where persistent data needs to be retained even if the VM is deallocated or deleted.

CUDA, TensorRT, Caffe, ONNX, and other frameworks, or GPU-accelerated graphical applications based on OpenGL and DirectX can be deployed economically (Explain each term)

Memory Intensive vs Low Memory

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb