NCP-AIN 시험 - NVIDIA실제시험문제와 답 - 70문항

Question No : 1

You are investigating a performance issue in a Spectrum-X network and suspect there might be congestion problems.
Which component executes the congestion control algorithm in a Spectrum-X environment?

A.BlueField-3 Super NICs
B.NVIDIA DOCA software
C.NVIDIA NetQ
D.Spectrum-4 switches

정답:
Explanation:
In the Spectrum-X architecture, BlueField-3 SuperNICsare responsible for executing the congestion control algorithm. They handle millions of congestion control events per second with microsecond reaction latency, applying fine-grained rate decisions to manage data flow effectively. This ensures optimal network performance by preventing congestion and packet loss.
Reference: NVIDIA Spectrum-X Networking Platform

Question No : 2

How does Spectrum-X achieve network isolation for multiple tenants?

A.By assigning unique IP address ranges to each tenant.
B.By implementing a Layer 3 Virtual Network Identifier (L3VNI) per VRR
C.By implementing physical network segmentation.
D.Using manual configuration of access control lists (ACLs).

정답:
Explanation:
Spectrum-X achieves network isolation in multi-tenant environments by implementing Layer 3 Virtual Network Identifiers (L3VNIs) per Virtual Routing and Forwarding (VRF) instance. This approach allows each tenant to have a separate routing table and network segment, ensuring that traffic is isolated and secure between tenants.
Reference Extracts from NVIDIA Documentation:
"Spectrum-X enhances multi-tenancy with performance isolation to ensure tenants' AI workloads perform optimally and consistently."

Question No : 3

In Cumulus Linux, which technology enables the ability to provide active-active redundancy to servers, without the need for direct inter-switch links?

A.MLAG
B.VSS
C.EVPN Multi-homing

정답:
Explanation:
EVPN Multi-homingenablesactive-active redundancy without inter-switch links by using overlay routing over VXLAN and distributed control plane using BGP EVPN.
From the official NVIDIA Cumulus Linux EVPN Multihoming Documentation:
"EVPN multihoming allows multiple Top-of-Rack (ToR) switches to connect to the same server while maintaining full layer-2 redundancy without the need for inter-switch links or traditional MLAG configuration."
Key benefits:
Simplified topology (no ISL/peer-link needed)
BGP-based control plane
Fast convergence
Active-active links per host NIC
Incorrect Options:
MLAG requires ISL between switches and peer-link configuration.
VSS (Virtual Switching System) is a Cisco term, not supported in NVIDIA networking.
Reference: Cumulus Linux Docs C EVPN Multihoming

Question No : 4

Why is the InfiniBand LRH called a local header?
A. It is used for routing traffic between nodes in the local subnet.
B. It provides the LIDs from the local subnet manager.
C. It allows traffic on a local link only.
D. It provides the parameters for each local HCA.

정답: A
Explanation:
The Local Route Header (LRH)in InfiniBand is termed "local" because it is used exclusively for routing packets within a single subnet. The LRH contains the destination and source Local Identifiers (LIDs), which are unique within a subnet, facilitating efficient routing without the need for global addressing. This design optimizes performance and simplifies routing within localized network segments.
InfiniBand is a high-performance, low-latency interconnect technology widely used in AI and HPC data centers, supported by NVIDIA’s Quantum InfiniBand switches and adapters. The Local Routing Header (LRH) is a critical component of the InfiniBand packet structure, used to facilitate routing within an InfiniBand fabric. The question asks why the LRH is called a “local header,” which relates to its role in the InfiniBand network architecture.
According to NVIDIA’s official InfiniBand documentation, the LRH is termed “‘local’ because it contains the addressing information necessary for routing packets between nodes within the same InfiniBand subnet.” The LRH includes fields such as the Source Local Identifier (SLID) and Destination Local Identifier (DLID), which are assigned by the subnet manager to identify the source and destination endpoints within the local subnet. These identifiers enable switches to forward packets efficiently within the subnet without requiring global routing information, distinguishing the LRH from the Global Routing Header (GRH), which is used for inter-subnet routing.
Exact Extract from NVIDIA Documentation:
“The Local Routing Header (LRH) is used for routing InfiniBand packets within a single subnet. It contains the Source LID (SLID) and Destination LID (DLID), which are assigned by the subnet manager to identify the source and destination nodes in the local subnet. The LRH is called a ‘local header’ because it facilitates intra-subnet routing, enabling switches to forward packets based on LID-based forwarding tables.”
―NVIDIA InfiniBand Architecture Guide
This extract confirms that option A is the correct answer, as the LRH’s primary function is to route traffic between nodes within the local subnet, leveraging LID-based addressing. The term “local” reflects its scope, which is limited to a single InfiniBand subnet managed by a subnet manager.
Reference: LRH and GRH InfiniBand Headers VIDIA Enterprise Support Portal

Question No : 5

You are tasked with configuring multi-tenancy using partition key (PKey) for a high-performance storage fabric running on InfiniBand. Each tenant’s GPU server is allowed to access the shared storage system but cannot communicate with another tenant’s GPU server.
Which of the following partition key membership configurations would you implement to set up multi-tenancy in this environment?

A.Assign full membership to both GPU servers and storage system.
B.Assign limited membership to both GPU servers and storage system.
C.Assign limited membership PKey to the shared storage system and full membership PKey to each tenant's GPU servers.
D.Assign full membership PKey to the shared storage system and limited membership PKey to each tenant’s GPU servers.

정답:
Explanation:
To enforce strictmulti-tenancy, where:
Tenant A’s GPU cannot talk to Tenant B’s GPU
But both can access shared storage
The correct solution is:
Storage system # Full PKey membership
Each tenant’s GPU # Limited PKey membership
From the NVIDIA InfiniBand P_Key Partitioning Guide:
"A port with limited membership can only communicate with full members of the same PKey. It cannot communicate with other limited members, even within the same partition."
This isolates tenants from each other, while allowing shared access to storage.
Incorrect Options:
Apermits tenant-to-tenant communication.
Bisolates everything, including access to storage.
Cprevents GPU access to storage.
Reference: NVIDIA InfiniBand C Multi-Tenant PKey Partitioning Design

Question No : 6

You are concerned about potential security threats and unexpected downtime in your InfiniBand data center.
Which UFM platform uses analytics to detect security threats, operational issues, and predict network failures in InfiniBand data centers?

A.Host Agent
B.Enterprise Platform
C.Cyber-AI Platform
D.Telemetry Platform

정답:
Explanation:
The NVIDIA UFM Cyber-AI Platform is specifically designed to enhance security and operational efficiency in InfiniBand data centers. It leverages AI-powered analytics to detect security threats, operational anomalies, and predict potential network failures. By analyzing real-time telemetry data, it identifies abnormal behaviors and performance degradation, enabling proactive maintenance and threat mitigation.
This platform integrates with existing UFM Enterprise and Telemetry services to provide a comprehensive view of the network's health and security posture. It utilizes machine learning algorithms to establish baselines for normal operations and detect deviations that may indicate security breaches or hardware issues.
Reference: NVIDIA UFM Cyber-AI Documentation v2.9.1

Question No : 7

You are troubleshooting InfiniBand connectivity issues in a cluster managed by the NVIDIA Network Operator. You need to verify the status of the InfiniBand interfaces.
Which command should you use to check the state and link layer of InfiniBand interfaces on a node?

A.rdma show devices
B.ibstat -d mlx5_X
C.ifconfig ib0
D.ip link show dev ib0

정답:
Explanation:
To check the status and link layer of InfiniBand interfaces, the ibstat command is used.
For example:
ibstat -d mlx5_0
This command provides detailed information about the InfiniBand device, including its state (e.g., Active), physical state (e.g., LinkUp), and link layer (e.g., InfiniBand).
Reference: NVIDIA DGX BasePOD Deployment Guide C Network Operator Section

Question No : 8

When upgrading DOCA on a BlueField DPU, what command should first be run on the host?

A.sudo apt-get autoremove
B./usr/sbin/ofed_uninstall.sh -force
C.sudo apt-get upgrade doca
D.sudo apt-get install doca

정답:
Explanation:
Before upgrading the DOCA SDK on a BlueField DPU, it is mandatory to uninstall the existing OFED drivers to prevent compatibility conflicts.
From the NVIDIA DOCA Installation Guide:
"Before upgrading DOCA or BlueField-related software, you must remove existing OFED packages using: /usr/sbin/ofed_uninstall.sh -force."
This ensures:
Clean driver state
No residual kernel modules or user space libraries
Proper registration of new DOCA/OFED versions
Incorrect Options:
AandCmay not resolve conflicts.
Dinstalls but doesn’t remove conflicting packages.
Reference: DOCA SDK Installation C Uninstall OFED Requirement

Question No : 9

When utilizing the ib_write_bw tool for performance testing, what does the -S flag define?

A.The burst size
B.The number of QP's
C.The maximum rate of sent packages
D.Which service level to use

정답:
Explanation:
From NVIDIA Performance Tuning Guide (ib_write_bw Tool Usage):
"-S <SL>: Specifies the Service Level (SL) to use for the InfiniBand traffic. SL is used for setting priority and mapping to virtual lanes (VLs) on the IB fabric."
This flag is useful when testing QoS-aware setups or validating SL/VL mappings.
Incorrect Options:
AC No such flag for burst size.
BC -q defines number of QPs.
CC --rate or -R is used for rate-limiting.
Reference: NVIDIA InfiniBand Performance Guide C ib_write_bw Options Section

Question No : 10

You suspect there might be connectivity issues in your InfiniBand fabric and need to perform a comprehensive check.
Which tool should you use to run a full fabric diagnostic and generate a report?

A.ibnetdiscover
B.perfquery
C.ibdiagnet
D.taping

정답:
Explanation:
The ibdiagnet utility is a fundamental tool for InfiniBand fabric discovery, error detection, and diagnostics. It provides comprehensive reports on the fabric's health, including error reporting, switch and Host Channel
Adapter (HCA) configuration dumps, various counters reported by the switches and HCAs, and parameters of devices such as switch fans, power supply units, cables, and PCI lanes. Additionally, ibdiagnet performs validation for Unicast Routing, Adaptive Routing, and Multicast Routing to ensure correctness and a credit-loop-free routing environment.
Reference Extracts from NVIDIA Documentation:
"The ibdiagnet utility is one of the basic tools for InfiniBand fabric discovery, error detection and diagnostic. The output files of the ibdiagnet include error reporting, switch and HCA configuration dumps, various counters reported by the switches and the HCAs."
"ibdiagnet also performs Unicast Routing, Adaptive Routing and Multicast Routing validation for correctness and credit-loop free routing."

Question No : 11

What command sequence is used to identify the exact name of the server that runs as the master SM in a multi-node fabric?

A.sminfo smpquery ND <LID>
B.sminfo smpquery Nl <LID>
C.ibstat sminfo <LID>
D.ibis ibsim <LID>

정답:
Explanation:
To identify the activeSubnet Manager (SM)node in an InfiniBand fabric, the correct command sequence is:
sminfo
Displays general information about the active SM in the fabric, including itsLID.
smpquery ND <LID>
Resolves theNode Description (ND)at the given LID, revealing the exact hostname or label of the SM server.
From the InfiniBand Tools Guide:
"The sminfo utility provides the LID of the master SM. Use smpquery ND <LID> to resolve the node name hosting the SM."
This two-step approach is standard for locating and validating the SM identity in fabric diagnostics.
Incorrect Options:
B (Nl)is an invalid query type.
CandDdo not identify SMs.
Reference: InfiniBand SM Tools C sminfo & smpquery Usage

Question No : 12

A leading AI research center is upgrading its infrastructure to support large language model projects.
The team is debating whether to implement a dedicated storage fabric for their AI workloads.
Which of the following best explains why a dedicated storage fabric is crucial for this AI network architecture?
Pick the 2 correct responses below

A.To enable parallel data access and improve storage performance for distributed AI workloads.
B.To ensure data security and isolation from other network traffic.
C.To provide high-bandwidth, low-latency data access that prevents I/O bottlenecks during AI model training.
D.To reduce the overall cost of the storage infrastructure.

정답:
Explanation:
Modern AI training (especially with LLMs) requires extremely high-speed, parallel access to large datasets. A dedicated storage fabric separates data I/O traffic from the training compute path and avoids contention.
From NVIDIA DGX Infrastructure Reference Architectures:
"Dedicated storage networks eliminate I/O bottlenecks by providing low-latency, high-bandwidth access to distributed storage for large-scale training jobs."
"Parallel access to datasets is key for performance, especially in multi-node, multi-GPU AI clusters."
Security (B)is important, but not the core reason for a storage fabric.
Cost (D)is typically increased, not reduced, with dedicated fabrics.
Reference: NVIDIA Base POD/AI Infrastructure Deployment Guidelines C Storage Section

Question No : 13

How is congestion evaluated in an NVIDIA Spectrum-X system?

A.By assessing the physical distance between network devices.
B.By monitoring the CPU and power usage of network devices.
C.By measuring the number of connected devices in the network.
D.By analyzing the egress queue loads ensuring all ports are well-balanced.

정답:
Explanation:
In NVIDIA Spectrum-X, congestion is evaluated based on egress queue loads. Spectrum-4 switches assess the load on each egress queue and select the port with the minimal load for packet transmission. This approach ensures that all ports are well-balanced, optimizing network performance and minimizing congestion.

Question No : 14

You are planning to deploy a large-scale Spectrum-X network for AI workloads. Before physical implementation, you want to validate the network design and configuration using a digital twin approach.
Which NVIDIA tool would be most appropriate for creating and simulating a digital twin of your Spectrum-X network?

A.NVIDIA Base Command Manager
B.NVIDIA Omniverse
C.NVIDIA NetQ
D.NVIDIA Air

정답:
Explanation:
NVIDIA Air is a cloud-based network simulation tool designed to create digital twins of data center infrastructure, including Spectrum-X networks. It allows users to model switches, SuperNICs, and storage components, enabling the simulation, validation, and automation of network configurations before physical deployment. This facilitates Day 0, 1, and 2 operations, ensuring that network designs are tested and optimized for AI workloads.
Reference Extracts from NVIDIA Documentation:
"NVIDIA Air enables cloud-scale efficiency by creating identical replicas of real-world data center infrastructure deployments."
"NVIDIA Air allows users to model data center deployments with full software functionality, creating a digital twin. Transform and accelerate time to AI by simulating, validating, and automating changes and updates."
"NVIDIA Air supports simulation of NVIDIA Spectrum Ethernet (Cumulus Linux and SONiC) switches and NVIDIA BlueField DPUs and SuperNICs as well as the NetQ network operations toolset."

Question No : 15

You are optimizing an InfiniBand network for AI workloads that require low-latency and high-throughput data transfers.
Which feature of InfiniBand networks minimizes CPU overhead during data transfers?

A.TCP/IP Offloading
B.SHARP
C.Direct Memory Access (DMA)
D.PKey

정답:
Explanation:
Direct Memory Access (DMA) in InfiniBand networks allows data to be transferred directly between the memory of two devices without involving the CPU. This capability significantly reduces CPU overhead, lowers latency, and increases throughput, making it ideal for AI workloads that demand efficient data transfers.

NVIDIA NCP-AIN 시험