시험덤프
매달, 우리는 1000명 이상의 사람들이 시험 준비를 잘하고 시험을 잘 통과할 수 있도록 도와줍니다.
  / DY0-001 덤프  / DY0-001 문제 연습

CompTIA DY0-001 시험

CompTIA DataX Certification Exam 온라인 연습

최종 업데이트 시간: 2025년06월18일

당신은 온라인 연습 문제를 통해 CompTIA DY0-001 시험지식에 대해 자신이 어떻게 알고 있는지 파악한 후 시험 참가 신청 여부를 결정할 수 있다.

시험을 100% 합격하고 시험 준비 시간을 35% 절약하기를 바라며 DY0-001 덤프 (최신 실제 시험 문제)를 사용 선택하여 현재 최신 85개의 시험 문제와 답을 포함하십시오.

 / 2

Question No : 1


A data scientist built several models that perform about the same but vary in the number of features.
Which of the following models should the data scientist recommend for production according to Occam's razor?

정답:
Explanation:
According to Occam’s razor, when models perform equivalently, you choose the simplest one - in this case, the model that achieves the needed performance with the fewest features.

Question No : 2


Which of the following JOINS would generate the largest amount of data?

정답:
Explanation:
A CROSS JOIN produces the Cartesian product of the two tables (every row from the first paired with every row from the second), yielding far more rows than any of the other join types.

Question No : 3


A computer vision model is trained to identify cats on a training set that is composed of both cat and
dog images. The model predicts a picture of a cat is a dog.
Which of the following describes this error?

정답:
Explanation:
Classifying an actual cat (positive instance) as a dog (negative prediction) is a false negative, which corresponds to a Type II error.

Question No : 4


In a modeling project, people evaluate phrases and provide reactions as the target variable for the model.
Which of the following best describes what this model is doing?

정답:
Explanation:
The model predicts people’s reactions (e.g., positive, negative, neutral) to given phrases, which is the core of sentiment analysis.

Question No : 5


Which of the following techniques enables automation and iteration of code releases?

정답:
Explanation:
Continuous Integration/Continuous Deployment pipelines automate the building, testing, and delivery of code, enabling rapid, repeatable, and iterative releases with minimal manual intervention.

Question No : 6


Which of the following does k represent in the k-means model?

정답:
Explanation:
In k-means clustering, the parameter k directly defines how many clusters the algorithm will partition the data into.

Question No : 7


An analyst wants to show how the component pieces of a company's business units contribute to the company's overall revenue.
Which of the following should the analyst use to best demonstrate this breakdown?

정답:
Explanation:
A Sankey diagram visualizes flows from individual business units into the total, with the width of each flow proportional to its revenue contribution, making it ideal for showing how each component feeds the overall total.

Question No : 8


A data scientist is building a forecasting model for the price of copper. The only input in this model is the daily price of copper for the last ten years.
Which of the following forecasting techniques is the most appropriate for the data scientist to use?

정답:
Explanation:
An autoregressive model uses past values of the series itself (here, historical daily copper prices) as predictors for future values, making it the most suitable technique when only the time-series history is available.

Question No : 9


Which of the following distance metrics for KNN is best described as a straight line?

정답:
Explanation:
Euclidean distance measures the straight-line distance between two points in space, matching the geometric “as-the-crow-flies” notion of distance.

Question No : 10


A data scientist needs to analyze a company's chemical businesses and is using the master database of the conglomerate company. Nothing in the data differentiates the data observations for the different businesses.
Which of the following is the most efficient way to identify the chemical businesses' observations?

정답:
Explanation:
Engaging the business team leverages domain expertise to pinpoint which records pertain to chemical operations, allowing you to extract and analyze just the relevant subset. This avoids the time and resource waste of ingesting and sifting through unrelated data.

Question No : 11


A statistician notices gaps in data associated with age-related illnesses and wants to further aggregate these observations.
Which of the following is the best technique to achieve this goal?

정답:
Explanation:
Binning groups continuous age values into discrete intervals (e.g., age ranges), filling gaps by aggregating observations into broader categories. This directly addresses uneven or sparse age data by creating consistent age groups.

Question No : 12


Which of the following best describes the minimization of the residual term in a ridge linear regression?

정답:
Explanation:
Ridge regression extends ordinary least squares by adding an L2 penalty on the coefficients, but it still minimizes the sum of squared residuals (e²) as its loss term.

Question No : 13


A data scientist has constructed a model that meets the minimum performance requirements specified in the proposal for a prediction project. The data scientist thinks the model's accuracy should be improved, but the proposed deadline is approaching.
Which of the following actions should the data scientist take first?

정답:
Explanation:
Since the model already meets the agreed-upon requirements and the deadline is near, the first step is to confirm with the stakeholder whether pursuing further accuracy gains is worth the additional time and resources. This ensures you align with business priorities before collecting more data, requesting funding, or tweaking the model further.

Question No : 14


Which of the following distribution methods or models can most effectively represent the actual arrival times of a bus that runs on an hourly schedule?

정답:
Explanation:
Scheduled buses tend to arrive around a fixed time with random delays that cluster symmetrically around the hour. A normal distribution effectively models those continuous, bell-shaped deviations from the exact schedule.

Question No : 15


During EDA, a data scientist wants to look for patterns, such as linearity, in the data.
Which of the following plots should the data scientist use?

정답:
Explanation:
Scatter plots display pairs of numeric values on two axes, letting you visually assess relationships and patterns, such as linear trends, between variables.

 / 2