Interview questions for Machine Learning

In this Machine Learning Interview Questions blog, I have collected the most frequently asked questions by interviewers.

1. What do you understand by Machine learning?

Machine learning is the form of Artificial Intelligence that deals with system programming and automates data analysis to enable computers to learn and act through experiences without being explicitly programmed.
For example, Robots are coded in such a way that they can perform the tasks based on data they collect from sensors. They automatically learn programs from data and improve with experiences.

2. What are the five popular algorithms of Machine Learning?

The five most popular algorithms that are used in machine learning are support vector machines, decision trees, neural networks, nearest neighbor and probabilistic networks.

3. Differentiate between inductive learning and deductive learning?

  • In inductive learning, the model learns by examples from a set of observed instances to draw a generalized conclusion. On the other side, in deductive learning, the model first applies the conclusion, and then the conclusion is drawn.
  • Inductive learning is the method of using observations to draw conclusions.
  •  Deductive learning is the method of using conclusions to form observations.
    For example, if we have to explain to a kid that playing with fire can cause burns. There are two ways we can explain this to a kid; we can show training examples of various fire accidents or images of burnt people and label them as “Hazardous”. In this case, a kid will understand with the help of examples and not play with the fire. It is the form of Inductive machine learning. The other way to teach the same thing is to let the kid play with the fire and wait to see what happens. If the kid gets a burn, it will teach the kid not to play with fire and avoid going near it. It is the form of deductive learning.

4. How is KNN different from k-means clustering?

The first between the two is that while KNN is a supervised classification algorithm, k-means clustering is an unsupervised classification algorithm. For the KNN to work, labelled data is needed to classify an unlabelled point into, while for k-means clustering, you only need a set of unlabelled points and threshold.

5. What is the difference between Data Mining and Machine Learning?

Data mining can be described as the process in which the structured data tries to abstract knowledge or interesting unknown patterns. During this process, machine learning algorithms are used.
Machine learning represents the study, design, and development of the algorithms which provide the ability to the processors to learn without being explicitly programmed.

6. Explain how a ROC curve works?

ROC is the difference between true positive rates and false-positive rates at a given threshold represented graphically. On a graph, the difference between the rates forms a curve which is why it’s called ROC curve.

7. What is the meaning of Overfitting in Machine learning?

Overfitting can be seen in machine learning when a statistical model describes random error or noise instead of the underlying relationship. Overfitting is usually observed when a model is excessively complex. It happens because of having too many parameters concerning the number of training data types. The model displays poor performance, which has been overfitted.

8. How would you explain Machine Learning to a school-going kid?

  •  Suppose your friend invites you to his party where you meet total strangers. Since you have no idea about them, you will mentally classify them on the basis of gender, age group, dressing, etc.
  •  In this scenario, the strangers represent unlabeled data and the process of classifying unlabeled data points is nothing but unsupervised learning.
  •  Since you didn’t use any prior knowledge about people and classified them on-the-go, this becomes an unsupervised learning problem.

9. Explain false negative, false positive, true negative and true positive with a simple example?

  • Let’s consider a scenario of a fire emergency:
    True Positive: If the alarm goes on in case of a fire.
    Fire is positive and prediction made by the system is true.
  •  False Positive: If the alarm goes on, and there is no fire.
    System predicted fire to be positive which is a wrong prediction, hence the prediction is false.
  •  False Negative: If the alarm does not ring but there was a fire.
    System predicted fire to be negative which was false since there was fire.
  •  True Negative: If the alarm does not ring and there was no fire.
    The fire is negative and this prediction was true.

10. What is the method to avoid over fitting?

Overfitting occurs when we have a small dataset, and a model is trying to learn from it. By using a large amount of data, overfitting can be avoided. But if we have a small database and are forced to build a model based on that, then we can use a technique known as cross-validation. In this method, a model is usually given a dataset of a known data on which training data set is run and dataset of unknown data against which the model is tested. The primary aim of cross-validation is to define a dataset to “test” the model in the training phase. If there is sufficient data, ‘Isotonic Regression’ is used to prevent overfitting.

11. How does Machine Learning differ from Deep Learning?

  • Machine learning is all about algorithms which are used to parse data, learn from that data, and then apply whatever they have learned to make informed decisions.
  •  Deep learning is a part of machine learning, which is inspired by the structure of the human brain and is particularly useful in feature detection.

12. What is the difference between Gini Impurity and Entropy in a Decision Tree?

  • Gini Impurity and Entropy are the metrics used for deciding how to split a Decision Tree.
  •  Gini measurement is the probability of a random sample being classified correctly if you randomly pick a label according to the distribution in the branch.
  •  Entropy is a measurement to calculate the lack of information. You calculate the Information Gain (difference in entropies) by making a split. This measure helps to reduce the uncertainty about the output label.

13. What is the difference between Entropy and Information Gain?

  • Entropy is an indicator of how messy your data is. It decreases as you reach closer to the leaf node.
  •  The Information Gain is based on the decrease in entropy after a dataset is split on an attribute. It keeps on increasing as you reach closer to the leaf node.

14. Mention the difference between Data Mining and Machine learning?

Data mining is the extraction of knowledge or interesting patterns in the form of unstructured data.
Machine learning is the study, design and application of algorithms that help computer grow and learn without any programming.

15. Describe ‘Training set’ and ‘training Test’?

In various areas of information of machine learning, a set of data is used to discover the potentially predictive relationship, which is known as ‘Training Set’. The training set is an example that is given to the learner. Besides, the ‘Test set’ is used to test the accuracy of the hypotheses generated by the learner. It is the set of instances held back from the learner. Thus, the training set is distinct from the test set.

leave your comment

Your email address will not be published. Required fields are marked *