Microsoft DP-100 시험

Designing and Implementing a Data Science Solution on Azure 온라인 연습

최종 업데이트 시간: 2025년10월09일

당신은 온라인 연습 문제를 통해 Microsoft DP-100 시험지식에 대해 자신이 어떻게 알고 있는지 파악한 후 시험 참가 신청 여부를 결정할 수 있다.

시험을 100% 합격하고 시험 준비 시간을 35% 절약하기를 바라며 DP-100 덤프 (최신 실제 시험 문제)를 사용 선택하여 현재 최신 110개의 시험 문제와 답을 포함하십시오.

/ 14

Question No : 1

You train a machine learning model.
You must deploy the model as a real-time inference service for testing. The service requires low CPU utilization and less than 48 MB of RAM. The compute target for the deployed service must initialize automatically while minimizing cost and administrative overhead.
Which compute target should you use?

A.Azure Kubernetes Service (AKS) inference cluster
B.Azure Machine Learning compute cluster
C.Azure Container Instance (ACI)
D.attached Azure Databricks cluster

정답:
Explanation:
Azure Container Instances (ACI) are suitable only for small models less than 1 GB in size.
Use it for low-scale CPU-based workloads that require less than 48 GB of RAM.
Note: Microsoft recommends using single-node Azure Kubernetes Service (AKS) clusters for dev-test of larger models.
Reference: https://docs.microsoft.com/id-id/azure/machine-learning/how-to-deploy-and-where

Question No : 2

You are solving a classification task.
You must evaluate your model on a limited data sample by using k-fold cross-validation. You start by configuring a k parameter as the number of splits.
You need to configure the k parameter for the cross-validation.
Which value should you use?

A.k=1
B.k=10
C.k=0.5
D.k=0.9

정답:
Explanation:
Leave One Out (LOO) cross-validation
Setting K = n (the number of observations) yields n-fold and is called leave-one out cross-validation (LOO), a special case of the K-fold approach.
LOO CV is sometimes useful but typically doesn’t shake up the data enough. The estimates from each fold are highly correlated and hence their average can have high variance.
This is why the usual choice is K=5 or 10. It provides a good compromise for the bias-variance tradeoff.

Question No : 3

You use Azure Machine Learning Studio to build a machine learning experiment.
You need to divide data into two distinct datasets.
Which module should you use?

A.Split Data
B.Load Trained Model
C.Assign Data to Clusters
D.Group Data into Bins

정답:
Explanation:
The Group Data into Bins module supports multiple options for binning data. You can customize how the bin edges are set and how values are apportioned into the bins.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/group-data-into-bins

Question No : 4

You are building a recurrent neural network to perform a binary classification. You review the training loss, validation loss, training accuracy, and validation accuracy for each training epoch.
You need to analyze model performance.
Which observation indicates that the classification model is over fitted?

A.The training loss .stays constant and the validation loss stays on a constant value and close to the training loss value when training the model.
B.The training loss increases while the validation loss decreases when training the model.
C.The training loss decreases while the validation loss increases when training the model.
D.The training loss stays constant and the validation loss decreases when training the model.

정답:

Question No : 5

You are evaluating a completed binary classification machine.
You need to use the precision as the evaluation metric.
Which visualization should you use?

A.scatter plot
B.coefficient of determination
C.Receiver Operating Characteristic CROC) curve
D.Gradient descent

정답:
Explanation:
Receiver operating characteristic (or ROC) is a plot of the correctly classified labels vs. the incorrectly classified labels for a particular model.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-understand-automated-ml#confusion-matrix

Question No : 6

You arc creating a new experiment in Azure Machine Learning Studio. You have a small dataset that has missing values in many columns. The data does not require the application of predictors for each column. You plan to use the Clean Missing Data module to handle the missing data.
You need to select a data cleaning method.
Which method should you use?

A.Synthetic Minority
B.Replace using Probabilistic PAC
C.Replace using MICE
D.Normalization

정답:

Question No : 7

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a model to predict the price of a student’s artwork depending on the following variables: the student’s length of education, degree type, and art form.
You start by creating a linear regression model.
You need to evaluate the linear regression model.
Solution: Use the following metrics: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error, Accuracy, Precision, Recall, F1 score, and AUC.
Does the solution meet the goal?

A.Yes
B.No

정답:
Explanation:
Accuracy, Precision, Recall, F1 score, and AUC are metrics for evaluating classification models.
Note: Mean Absolute Error, Root Mean Absolute Error, Relative Absolute Error are OK for the linear regression model.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/evaluate-model

Question No : 8

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You are creating a new experiment in Azure Machine Learning Studio.
One class has a much smaller number of observations than the other classes in the training set.
You need to select an appropriate data sampling strategy to compensate for the class imbalance.
Solution: You use the Stratified split for the sampling mode.
Does the solution meet the goal?

A.Yes
B.No

정답:
Explanation:
Instead use the Synthetic Minority Oversampling Technique (SMOTE) sampling mode.
Note: SMOTE is used to increase the number of underepresented cases in a dataset used for machine learning. SMOTE is a better way of increasing the number of rare cases than simply duplicating existing cases.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/smote

Question No : 9

HOTSPOT
You are analyzing the asymmetry in a statistical distribution.
The following image contains two density curves that show the probability distribution of two datasets.

Use the drop-down menus to select the answer choice that answers each question based on the information presented in the graphic. NOTE: Each correct selection is worth one point.

정답:

Explanation:
Box 1: Positive skew
Positive skew values means the distribution is skewed to the right.
Box 2: Negative skew
Negative skewness values mean the distribution is skewed to the left.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/compute-elementary-statistics

Question No : 10

You create a binary classification model by using Azure Machine Learning Studio.
You must tune hyperparameters by performing a parameter sweep of the model.
The parameter sweep must meet the following requirements:
iterate all possible combinations of hyperparameters minimize computing resources required to perform the sweep You need to perform a parameter sweep of the model.
Which parameter sweep mode should you use?

A.Random sweep
B.Sweep clustering
C.Entire grid
D.Random grid
E.Random seed

정답:
Explanation:
Maximum number of runs on random grid: This option also controls the number of iterations over a random sampling of parameter values, but the values are not generated randomly from the specified range; instead, a matrix is created of all possible combinations of parameter values and a random sampling is taken over the matrix. This method is more efficient and less prone to regional oversampling or undersampling.
If you are training a model that supports an integrated parameter sweep, you can also set a range of seed values to use and iterate over the random seeds as well. This is optional, but can be useful for avoiding bias introduced by seed selection.
Incorrect Answers:
B: If you are building a clustering model, use Sweep Clustering to automatically determine the optimum number of clusters and other parameters.
C: Entire grid: When you select this option, the module loops over a grid predefined by the system, to try different combinations and identify the best learner. This option is useful for cases where you don't know what the best parameter settings might be and want to try all possible combination of values.
E: If you choose a random sweep, you can specify how many times the model should be trained, using a random combination of parameter values.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/tune-model-hyperparameters

Question No : 11

HOTSPOT
You are evaluating a Python NumPy array that contains six data points defined as follows:
data = [10, 20, 30, 40, 50, 60]
You must generate the following output by using the k-fold algorithm implantation in the Python Scikit-learn machine learning library:
train: [10 40 50 60], test: [20 30]
train: [20 30 40 60], test: [10 50]
train: [10 20 30 50], test: [40 60]
You need to implement a cross-validation to generate the output.
How should you complete the code segment? To answer, select the appropriate code segment in the dialog box in the answer area. NOTE: Each correct selection is worth one point.

정답:

Explanation:
Box 1: k-fold
Box 2: 3
K-Folds cross-validator provides train/test indices to split data in train/test sets. Split dataset into k consecutive folds (without shuffling by default).
The parameter n_splits ( int, default=3) is the number of folds. Must be at least 2.
Box 3: data
Example: Example:
>>>
>>> from sklearn.model_selection import KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4])
>>> kf = KFold(n_splits=2)
>>> kf.get_n_splits(X)
2
>>> print(kf)
KFold(n_splits=2, random_state=None, shuffle=False)
>>> for train_index, test_index in kf.split(X):
... print("TRAIN:", train_index, "TEST:", test_index)
... X_train, X_test = X[train_index], X[test_index]
... y_train, y_test = y[train_index], y[test_index]
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]
Reference: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html

Question No : 12

You use the Two-Class Neural Network module in Azure Machine Learning Studio to build a binary
classification model. You use the Tune Model Hyperparameters module to tune accuracy for the model.
You need to select the hyperparameters that should be tuned using the Tune Model Hyperparameters module.
Which two hyperparameters should you use? Each correct answer presents part of the solution. NOTE: Each correct selection is worth one point.

A.Number of hidden nodes
B.Learning Rate
C.The type of the normalizer
D.Number of learning iterations
E.Hidden layer specification

정답:
Explanation:
D: For Number of learning iterations, specify the maximum number of times the algorithm should process the training cases.
E: For Hidden layer specification, select the type of network architecture to create.
Between the input and output layers you can insert multiple hidden layers. Most predictive tasks can be
accomplished easily with only one or a few hidden layers.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/two-class-neural-network

Question No : 13

You create a binary classification model.
You need to evaluate the model performance.
Which two metrics can you use? Each correct answer presents a complete solution. NOTE: Each correct selection is worth one point.

A.relative absolute error
B.precision
C.accuracy
D.mean absolute error
E.coefficient of determination

정답:
Explanation:
The evaluation metrics available for binary classification models are: Accuracy, Precision, Recall, F1 Score, and AUC.
Note: A very natural question is: ‘Out of the individuals whom the model, how many were classified correctly (TP)?’
This question can be answered by looking at the Precision of the model, which is the proportion of positives that are classified correctly.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio/evaluate-model-performance

Question No : 14

You are building a machine learning model for translating English language textual content into French language textual content.
You need to build and train the machine learning model to learn the sequence of the textual content.
Which type of neural network should you use?

A.Multilayer Perceptions (MLPs)
B.Convolutional Neural Networks (CNNs)
C.Recurrent Neural Networks (RNNs)
D.Generative Adversarial Networks (GANs)

정답:
Explanation:
To translate a corpus of English text to French, we need to build a recurrent neural network (RNN).
Note: RNNs are designed to take sequences of text as inputs or return sequences of text as outputs, or both.
They’re called recurrent because the network’s hidden layers have a loop in which the output and cell state from each time step become inputs at the next time step. This recurrence serves as a form of memory. It allows contextual information to flow through the network so that relevant outputs from previous time steps can be applied to network operations at the current time step.
Reference: https://towardsdatascience.com/language-translation-with-rnns-d84d43b40571

Question No : 15

HOTSPOT
You are using C-Support Vector classification to do a multi-class classification with an unbalanced training dataset.
The C-Support Vector classification using Python code shown below:

You need to evaluate the C-Support Vector classification code.
Which evaluation statement should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

정답:

Explanation:
Box 1: Automatically adjust weights inversely proportional to class frequencies in the input data
The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as n_samples / (n_classes * np.bincount(y)).
Box 2: Penalty parameter
Parameter: C: float, optional (default=1.0)
Penalty parameter C of the error term.
Reference: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

/ 14

Microsoft: GH-100 덤프; GH-300 덤프; GH-200 덤프; GH-900 덤프; GH-500 덤프; AZ-104 덤프; SC-401 덤프; DP-700 덤프; MD-102 덤프; AZ-801 덤프; AZ-800 덤프; AZ-700 덤프; AI-102 덤프; AI-900 덤프; 70-744 덤프; AZ-204 덤프; MS-900 덤프; AZ-900 덤프; MS-700 덤프; 98-349 덤프; AZ-400 덤프; 98-367 덤프; 70-764 덤프; 70-779 덤프; AZ-500 덤프; 70-345 덤프; MD-100 덤프; 98-366 덤프; 70-743 덤프; 70-486 덤프; MB-220 덤프; MD-101 덤프; AZ-203 덤프; 70-483 덤프; MB6-898 덤프; 70-762 덤프; 70-778 덤프; AZ-102 덤프; 70-767 덤프; 70-412 덤프; 70-411 덤프; AZ-200 덤프; 70-761 덤프; 70-333 덤프; 70-705 덤프; 70-410 덤프; MB2-716 덤프; MB2-718 덤프; 70-768 덤프; 70-417 덤프; 98-365 덤프; 98-364 덤프