Interview questions for Machine learning

Machine learning interview questions are an integral part of the data science interview and the path to becoming a data scientist, machine learning engineer, or data engineer. In case you have attended any Machine Learning interview in the recent past, do paste those interview questions in the comments section and we’ll answer them at the earliest. You can also comment below if you have any questions in your mind, which you might face in your Machine Learning interview. You may go through this recording of Machine Learning Interview Questions and Answers where our instructor has explained the topics in a detailed manner with examples that will help you to understand this concept better.

1 .Explain false negative, false positive, true negative and true positive with a simple example?

Let’s consider a scenario of a fire emergency:

• True Positive: If the alarm goes on in case of a fire.
Fire is positive and prediction made by the system is true.
• False Positive: If the alarm goes on, and there is no fire.
System predicted fire to be positive which is a wrong prediction, hence the prediction is false.
• False Negative: If the alarm does not ring but there was a fire.
System predicted fire to be negative which was false since there was fire.
• True Negative: If the alarm does not ring and there was no fire.
The fire is negative and this prediction was true.

2. What’s the trade-off between bias and variance?

Bias is error due to erroneous or overly simplistic assumptions in the learning algorithm you’re using. This can lead to the model underfitting your data, making it hard for it to have high predictive accuracy and for you to generalize your knowledge from the training set to the test set.
Variance is error due to too much complexity in the learning algorithm you’re using. This leads to the algorithm being highly sensitive to high degrees of variation in your training data, which can lead your model to overfit the data. You’ll be carrying too much noise from your training data for your model to be very useful for your test data.

3. How would you explain Machine Learning to a school-going kid?

• Suppose your friend invites you to his party where you meet total strangers. Since you have no idea about them, you will mentally classify them on the basis of gender, age group, dressing, etc.
• In this scenario, the strangers represent unlabeled data and the process of classifying unlabeled data points is nothing but unsupervised learning.
• Since you didn’t use any prior knowledge about people and classified them on-the-go, this becomes an unsupervised learning problem.

4. What is the difference between supervised and unsupervised machine learning?

Supervised learning requires training labeled data. For example, in order to do classification (a supervised learning task), you’ll need to first label the data you’ll use to train the model to classify data into your labeled groups. Unsupervised learning, in contrast, does not require labeling data explicitly.

5. What do you understand by selection bias?

• It is a statistical error that causes a bias in the sampling portion of an experiment.
• The error causes one sampling group to be selected more often than other groups included in the experiment.
• Selection bias may produce an inaccurate conclusion if the selection bias is not identified.

6. What do you understand by Precision and Recall?

Let me explain you this with an analogy:
• Imagine that, your girlfriend gave you a birthday surprise every year for the last 10 years. One day, your girlfriend asks you: ‘Sweetie, do you remember all the birthday surprises from me?’
• To stay on good terms with your girlfriend, you need to recall all the 10 events from your memory. Therefore, recall is the ratio of the number of events you can correctly recall, to the total number of events.
• If you can recall all 10 events correctly, then, your recall ratio is 1.0 (100%) and if you can recall 7 events correctly, your recall ratio is 0.7 (70%)

7. How is KNN different from k-means clustering?

K-Nearest Neighbors is a supervised classification algorithm, while k-means clustering is an unsupervised clustering algorithm. While the mechanisms may seem similar at first, what this really means is that in order for K-Nearest Neighbors to work, you need labeled data you want to classify an unlabeled point into (thus the nearest neighbor part). K-means clustering requires only a set of unlabeled points and a threshold: the algorithm will take unlabeled points and gradually learn how to cluster them into groups by computing the mean of the distance between different points.
The critical difference here is that KNN needs labeled points and is thus supervised learning, while k-means doesn’t—and is thus unsupervised learning.

8. Explain how a ROC curve works?

The ROC curve is a graphical representation of the contrast between true positive rates and the false positive rate at various thresholds. It’s often used as a proxy for the trade-off between the sensitivity of the model (true positives) vs the fall-out or the probability it will trigger a false alarm (false positives).

9. What is ROC curve and what does it represent?

Receiver Operating Characteristic curve (or ROC curve) is a fundamental tool for diagnostic test evaluation and is a plot of the true positive rate (Sensitivity) against the false positive rate (Specificity) for the different possible cut-off points of a diagnostic test.
• It shows the tradeoff between sensitivity and specificity (any increase in sensitivity will be accompanied by a decrease in specificity).
• The closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test.
• The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.
• The slope of the tangent line at a cutpoint gives the likelihood ratio (LR) for that value of the test.
• The area under the curve is a measure of test accuracy.

leave your comment


Your email address will not be published. Required fields are marked *