NVIDIA Certified Professional AI Infrastructure 온라인 연습
최종 업데이트 시간: 2025년10월03일
당신은 온라인 연습 문제를 통해 NVIDIA NCP-AII 시험지식에 대해 자신이 어떻게 알고 있는지 파악한 후 시험 참가 신청 여부를 결정할 수 있다.
시험을 100% 합격하고 시험 준비 시간을 35% 절약하기를 바라며 NCP-AII 덤프 (최신 실제 시험 문제)를 사용 선택하여 현재 최신 300개의 시험 문제와 답을 포함하십시오.
정답:
Explanation:
DCGM provides the most comprehensive and automated solution for dynamic power management. It can monitor GPIJ utilization in real-time and adjust the power limit based on predefined policies, ensuring optimal power efficiency without manual intervention. Manually adjusting the power limit is possible but requires scripting and continuous monitoring. Dynamic Boost is typically for laptops, and BIOS power profiles may not be fine-grained enough. Disabling ECC reduces power but compromises data integrity.
정답:
Explanation:
VMware DirectPath I/O (Passthrough) allows a VM to have exclusive access to a physical PCle device, such as a GPIJ. This provides the best performance because the VM can directly access the GPU without virtualization overhead. vGPI allows sharing of a GPU among multiple VMs, but DirectPath I/O provides dedicated access. vMotion migrates VMs. HA restarts VMS after failure. DRS balances resources across hosts.
정답:
Explanation:
Using Slurm’s node features is the most straightforward and recommended approach for tagging nodes with specific capabilities. The ‘―constraint’ option allows jobs to request nodes with particular features. GresTypeS can be used, but node features provide more flexibility and control. Installing drivers dynamically is impractical and inefficient. DCGM is primarily for monitoring, not core scheduling requirements.
정답:
Explanation:
IPMI is a standard interface for out-of-band server management, commonly used for monitoring hardware sensors like temperature and utilization. BMCs typically support IPMI. SDRs are the data format used by IPMI for sensor data. SNMP is also an option, but IPMI is more directly tied to hardware monitoring. The rest are less efficient or require additional software installation.
정답:
Explanation:
‘CUDA VISIBLE DEVICES’ is essential for GPU affinity. It allows you to specify which GPUs are visible to a particular process. Without it, all processes might try to use the same GPU, leading to performance bottlenecks. controls the order in which GPUs are enumerated. specifies the path to shared libraries. is hypothetical. forces synchronous CUDA calls.
정답:
Explanation:
All the options are valid reasons. The NVIDIA driver must be present on the host, the nodes need to be labelled to be recongnized by the Kubernetes, container tookit is required for running GPU enabled container and configuration of GPU operator must be correct.
정답:
Explanation:
Memory leaks and single-allocation limits are common causes of ‘out of memory’ errors, even when sufficient physical memory exists. ‘cuda-memcheck’ is specifically designed to find memory errors in CUDA applications. While driver incompatibility is possible, leaks and allocation size limits are more frequent occurrences.
정답:
Explanation:
Liquid cooling is the most effective way to remove heat from high-power components like GPUs and CPUs, allowing them to operate at their maximum performance without overheating. Choosing lower TDP GPUs will reduce thermal output but will also significantly reduce performance. Throttle frequency is useful, but liquid cooling enables optimal performance within thermal constraints. Data center should reduce cooling cost but is counter intuitive to reduce server temparature.
정답:
Explanation:
Purging the existing drivers using the package manager ensures that all related files and configurations are removed, preventing conflicts with the new driver. Rebooting after purging allows the system to load without the old drivers. While using the .run file is an option, using the package manager (if available) is generally preferred for easier management.
정답:
Explanation:
‘dcgmi diag -t 1004’ is the correct command. ‘nvidia-smi’ provides basic GPIJ information, but ‘dcgmi diag -t 1004’ (part of the Data Center GPU Manager) provides specific diagnostic tests for NVLink connectivity. ‘Ispci’ lists PCle devices, not specifically NVLink. ‘gpustat’ is a monitoring tool. ‘nvlink_info’ is hypothetical.
정답:
Explanation:
Incorrect BIOS/UEFI settings are the most likely cause when GPUs are physically present but not detected. The BIOS controls PCle lane allocation and slot enabling. Reseating GPUs is a good first step, but if the BIOS is misconfigured, it won’t resolve the issue. Insufficient power is also a possibility, but BIOS configuration is more common in initial setup.
정답: A,D
Explanation:
The most likely causes are network configuration issues (incorrect IP, subnet, or VLAN). The BMC requires a valid IP configuration and network connectivity to be accessible. While other options are possible, they are less common as initial causes.
정답:
Explanation:
VXLAN is most suitable for multi-tenant environments because it provides a larger address space (24-bit VNI) compared to VLANs (12-bit VLAN ID), allowing for a greater number of isolated networks. VXLAN also supports Layer 2 connectivity across Layer 3 networks, facilitating VM mobility across different subnets. While QinQ can extend the VLAN ID space, it’s not as scalable as VXLAN. GRE provides tunneling but doesn’t inherently provide isolation. IPsec is primarily for secure communication.
정답:
Explanation:
Direct-attached NVLink provides significantly higher bandwidth and lower latency compared to routing traffic over a traditional network. This is crucial for applications that require intensive GPU-to-GPU communication, such as large-scale AI training. While direct-attached NVLink can simplify configuration in some cases, its primary advantage is the performance improvement.
정답:
Explanation:
InfiniBand is designed for high-performance computing and offers significantly lower latency and higher bandwidth compared to Ethernet or Fibre Channel, making it the most suitable choice for demanding workloads like recommendation systems. While 100 Gigabit Ethernet provides high bandwidth, InfiniBand generally offers lower latency.