Filters
Question type

Study Flashcards

Which of the following statements a) , b) or c) is false?


A) We can use xe "k-means clustering algorithm"k-means clustering via scikit-learn's KMeans estimator (from the sklearn.cluster module) to place each sample in a dataset into a cluster. The KMeans estimator hides from you the algorithm's complex mathematical details, making it straightforward to use.
B) The following code creates a KMeans object: from sklearn.cluster import KMeans
Kmeans = KMeans(n_clusters=3, random_state=11)
C) The keyword argument n_clusters specifies the k-means clustering algorithm's hyperparameter k (in this case, 3) , which KMeans requires to calculate the clusters and label each sample. The default value for n_clusters is 8.
D) All of the above statements are true.

Correct Answer

verifed

verified

Which of the following statements a) , b) or c) is false?


A) In big data, samples can have hundreds, thousands or even millions of features.
B) To visualize a dataset with many features (that is, many dimensions) , you must first reduce the data to two or three dimensions. This requires a supervised machine learning technique called dimensionality reduction.
C) When you graph the resulting data after dimensionality reduction, you might see patterns in the data that will help you choose the most appropriate machine learning algorithms to use. For example, if the visualization contains clusters of points, it might indicate that there are distinct classes of information within the dataset.
D) All of the above statements are true.

Correct Answer

verifed

verified

Which of the following statements a) , b) or c) is false?


A) Because the Iris dataset is labeled, we can look at its target array values to get a sense of how well the k-means algorithm clustered the samples for the three Iris species.
B) In the Iris dataset, the first 50 samples are Iris setosa, the next 50 are Iris versicolor, and the last 50 are Iris virginica.
C) If the KMeans estimator chose the Iris dataset clusters perfectly, then each group of 50 elements in the estimator's labels_ array should have mostly the same label.
D) All of the above statements are true.

Correct Answer

verifed

verified

Which of the following statements about the k-means clustering algorithm is false?


A) Each cluster of samples is grouped around a centroid-the cluster's center point.
B) Initially, the algorithm chooses k centroids at random from the dataset's samples. Then the remaining samples are placed in the cluster whose centroid is the closest.
C) The centroids are iteratively recalculated and the samples re-assigned to clusters until, for all clusters, the distances from a given centroid to the samples in its cluster are maximized.
D) The algorithm's results are a one-dimensional array of labels indicating the cluster to which each sample belongs, and a two-dimensional array of centroids representing the center of each cluster.

Correct Answer

verifed

verified

Which of the following statements is false?


A) The two main types of machine learning are xe "supervised machine learning"supervised machine learning, which works with unxe "labeled data"labeled data, and xe "unsupervised machine learning"unsupervised machine learning, which works with xe "unlabeled data"labeled data.
B) If you're developing a computer vision application to recognize dogs and cats, you'll train your model on lots of dog photos labeled "dog" and cat photos labeled "cat." If your model is effective, when you put it to work processing unlabeled photos it will recognize dogs and cats it has never seen before. The more photos you train with, the greater the chance that your model will accurately predict which new photos are dogs and which are cats.
C) In this era of big data and massive, economical computer power, you should be able to build some pretty accurate machine learning models.

Correct Answer

verifed

verified

Which of the following statements a) , b) or c) is false?


A) Scikit-learn provides many metrics functions for evaluating how well estimators predict results and for comparing estimators to choose the best one(s) for your particular study.
B) Scikit-learn's metrics vary by estimator type.
C) Functions confusion_matrix and classification_report (from the module sklearn.metrics) are two of many metrics functions specifically for evaluating regression estimators.
D) All of the above statements are true.

Correct Answer

verifed

verified

Which of the following statements a) , b) or c) is false?


A) We can make machines learn.
B) The "secret sauce" of machine learning is data-and lots of it.
C) With machine learning, rather than programming expertise into our applications, we program them to learn from data.
D) All of the above statements are true.

Correct Answer

verifed

verified

Which of the following statements a) , b) or c) is false?


A) The amount of data that's available today is already enormous and continues to grow exponentially-the data produced in the world in the last few years alone equals the amount produced up to that point since the dawn of civilization.
B) People used to say "I'm drowning in data and I don't know what to do with it. With machine learning, we now say, "Flood me with big data so I can use machine-learning technology to extract insights and make predictions from it."
C) The big data phenomenon is occurring at a time when computing power is exploding and computer memory and secondary storage are exploding in capacity while costs dramatically decline. This enables us to think differently about solution approaches.
D) All of the above statements are true.

Correct Answer

verifed

verified

Which of the following statements a) , b) or c) is false?


A) It's difficult to know in advance which machine learning model(s) will perform best for a given dataset, especially when they hide the details of how they operate from their users.
B) Even though the KNeighborsClassifier predicts digit images with a high degree of accuracy, it's possible that other scikit-learn estimators are even more accurate.
C) Scikit-learn provides many models with which you can quickly train and test your data. This encourages you to run multiple models to determine which is the best for a particular machine learning study.
D) All of the above statements are true.

Correct Answer

verifed

verified

Which of the following statements a) , b) or c) is false?


A) Another common metric for regression models is the mean squared error, which \bullet calculates the difference between each expected and predicted value-this is called the error,
\bullet squares each difference and
\bullet calculates the average of the squared values.
B) To calculate a regression estimator's mean squared error, call function mean_squared_error (from module sklearn.metrics) with the arrays representing the expected and predicted results, as in: In [46]: metrics.mean_squared_error(expected, predicted)
Out[46]: 0.5350149774449119
C) When comparing estimators with the mean squared error metric, the one with the value closest to 1 best fits your data.
D) All of the above statements are true.

Correct Answer

verifed

verified

Which of the following statements is false?


A) Regression models predict a continuous output, such as the predicted temperature output in a weather time-series analysis.
B) The LinearRegression estimator can perform simple linear regression.
C) The LinearRegression estimator also can perform multiple linear regression.
D) The LinearRegression estimator, by default, uses all the nonnumerical features in a dataset to make more sophisticated predictions than you can with a single-feature simple linear regression.

Correct Answer

verifed

verified

Which of the following statements is false?


A) In machine learning, a model implements a machine-learning algorithm. In xe "machine learning:scikit-learn"xe "scikit-learn (sklearn) machine-learning library"scikit-learn, models are called estimators.
B) There are two parameter types in machine learning-those the estimator calculates as it learns from the data you provide and those you specify in advance when you create the scikit-learn estimator object that represents the model.
C) The machine-learning parameters the estimator calculates as it learns from the data are called hyperparameters-in the k-nearest neighbors algorithm, k is a hyperparameter.
D) For simplicity, we use scikit-learn's default hyperparameter values. In real-world machine-learning studies, you'll want to experiment with different values of k to produce the best possible models for your studies-this process is called hyperparameter tuning.

Correct Answer

verifed

verified

Which of the following statements is false?


A) By default, train_test_split reserves 75% of the data for training and 25% for testing.
B) To specify different splits, you can set the sizes of the testing and training sets with the train_test_split function's keyword arguments test_size and train_size. Use floating-point values from 0.0 through 100.0 to specify the percentages of the data to use for each.
C) You can use integer values to set the precise numbers of samples.
D) If you specify one of the keyword arguments test_size and train_size, the other is inferred-for example, the statement X_train, X_test, y_train, y_test = train_test_split(
Digits.data, digits.target, random_state=11, test_size=0.20)
Specifies that 20% of the data is for testing, so train_size is inferred to be 0.80.

Correct Answer

verifed

verified

Which of the following statements a) , b) or c) is false?


A) We can use a TSNE estimator (from the sklearn.manifold module) to perform dimensionality reduction. This estimator analyzes a dataset's features and reduces them to the specified number of dimensions.
B) The following code creates a TSNE object for reducing a dataset's features to two dimensions, as specified by the keyword argument n_components: In [3]: from sklearn.manifold import TSNE
In [4]: tsne = TSNE(n_components=2, random_state=11)
C) When using TSNE on the Digits dataset bundled with scikit-learn, the TSNE estimator's random_state keyword argument in Part (b) ensures the reproducibility of the "render sequence" when we display the digit clusters, for example.
D) All of the above statements are true.

Correct Answer

verifed

verified

Which of the following statements a) , b) or c) is false?


A) The simplest supervised machine-learning algorithm we use is k-means clustering.
B) In k-means clustering, each cluster's centroid is the cluster's center point.
C) You'll often run multiple clustering estimators to compare their ability to divide a dataset's samples effectively into clusters.
D) All of the above statements are true.

Correct Answer

verifed

verified

Which of the following statements a) , b) or c) is false?


A) By default, a xe "sklearn.linear_model module:LinearRegression estimator"xe "LinearRegression estimator from sklearn.linear_model"LinearRegression estimator uses all the features in the dataset's data array to perform a xe "linear regression:multiple"multiple linear regression.
B) An error occurs if any of the features passed to a xe "sklearn.linear_model module:LinearRegression estimator"xe "LinearRegression estimator from sklearn.linear_model"LinearRegression estimator for training are categorical rather than numeric. If a dataset contains categorical data, you must exclude the categorical features from the training process.
C) A benefit of working with scikit-learn's bundled datasets is that they're already in the correct format for machine learning using scikit-learn's models.
D) All of the above statements are true.

Correct Answer

verifed

verified

Unsupervised machine learning uses ________ algorithms.


A) classification
B) clustering
C) regression
D) None of the above

Correct Answer

verifed

verified

Which of the following statements a) , b) or c) is false?


A) "Toy" datasets, generally have a small number of samples with a limited number of features. In the world of big data, datasets commonly have millions and billions of samples, or even more.
B) There's an enormous number of free and open datasets available for data science studies. Libraries like scikit-learn bundle popular datasets for you to experiment with and provide mechanisms for loading datasets from various repositories (such as openml.org) .
C) Governments, businesses and other organizations worldwide offer datasets on a vast range of subjects.
D) All of the above statements are true.

Correct Answer

verifed

verified

Which of the following statements a) , b) or c) is false?


A) We train the KMeans estimator by calling the object's fit method-this performs the k-means algorithm.
B) As with the other estimators, the fit method returns the estimator object.
C) When the training completes, the KMeans object contains a labels_ array with values from 0 to n_clusters - 1 (in the Iris dataset example, 0-2) , indicating the clusters to which the samples belong, and a cluster_centers_ array in which each row represents a cluster.
D) All of the above statements are true.

Correct Answer

verifed

verified

Consider the following code and output: In [57]: for k in range(1, 20, 2) : ) ..: kfold = KFold(n_splits=10, random_state=11, shuffle=True) ) ..: knn = KNeighborsClassifier(n_neighbors=k) ) ..: scores = cross_val_score(estimator=knn, ) ..: X=digits.data, y=digits.target, cv=kfold) ) ..: print(f'k={k:<2}; mean accuracy={scores.mean() :.2%}; ' + ) ..: f'standard deviation={scores.std() :.2%}') ) ..: K=1 ; mean accuracy=98.83%; standard deviation=0.58% K=3 ; mean accuracy=98.78%; standard deviation=0.78% K=5 ; mean accuracy=98.72%; standard deviation=0.75% K=7 ; mean accuracy=98.44%; standard deviation=0.96% K=9 ; mean accuracy=98.39%; standard deviation=0.80% K=11; mean accuracy=98.39%; standard deviation=0.80% K=13; mean accuracy=97.89%; standard deviation=0.89% K=15; mean accuracy=97.89%; standard deviation=1.02% K=17; mean accuracy=97.50%; standard deviation=1.00% K=19; mean accuracy=97.66%; standard deviation=0.96% Which of the following statements is false?


A) The loop creates KNeighborsClassifiers with odd k values from 1 through 19 and performs k-fold cross-validation on each.
B) The k value 7 in kNN produces the most accurate predictions for the Digits dataset.
C) The accuracy tends to decrease for higher k values.
D) Compute time grows with k, because k-NN needs to perform many more calculations to find the nearest neighbors.

Correct Answer

verifed

verified

Showing 21 - 40 of 66

Related Exams

Show Answer