Skip to content

Top 10 Machine Learning Algorithms

Machine learning algorithms are computer programs that learn hidden patterns in data to provide insights. Multiple ML algorithms are available that utilize different mathematical models to generate insights or predict values from unseen data.

Today, we will get an overview of the most popular and most commonly used ML algorithms. We can broadly classify ML algorithms into the following categories.

  1. Regression
  2. Classification
  3. Clustering
  4. Association

We suggest you also read this article on Machine Learning Lifecycle to develop ML Algorithms.

List of Most Common Machine Learning Algorithms

Here is the list of the most popular and commonly used algorithms:
  1. Linear Regression
  2. Logistic Regression
  3. Support Vector Machines (SVM)
  4. K-Nearest Neighbor (KNN)
  5. Decision Tree
  6. Ensemble Learning
  7. Random Forest
  8. Principal Component Analysis: PCA
  9. K-means Clustering
  10. XG Boost
This image shows a list of machine learning algorithms

We can group these algorithms into the following three categories:

  1. Supervised Machine Learning
  2. Unsupervised Machine Learning
  3. Reinforcement Learning

Linear Regression

Regression is a statistical technique to establish a relationship between the dependent (y) and the independent (X) variable. This relationship can be linear, parabolic, or something else in nature.

But in simple linear regression, the relation between dependent (y) and independent (X) variables is linear. In other words, we can define the relationship using a straight line.

Types of Linear Regression

We can further classify Linear Regression into two types:

  1. Simple Linear Regression
  2. Multiple Linear Regression
This image shows types of Type of Linear Regression algorithms

Let’s consider this dataset where X is independent, and y is the dependent variable. This is a supervised machine learning problem because the output/ dependent variable is known.

This image represents the Linear Regression algorithm implementation on data.

The first step in ML problem-solving is to visualize the data. When we plot a graph between the independent (X) and dependent (y) variables and find the relationship is linear. We become confident on solving this problem using simple linear regression.

In simple linear regression, we will find the best line that ensures a minimum error (Root Sum of Square, Root Mean Square) in all training data. Afterward, we can predict the unknown y values for given values of X from the best-fit line.

Equation of Line for Linear Regression

The basic equation of a simple linear regression model with one independent and one dependent variable can be represented in the following ways.

y = m X + C

y = b0 + b1 X

hθ(x) = θ0 + θ1 X

Where:
y = Dependent Variable;
X= Independent Variable;
b0 and b1 or θ0 and θ1 are linerar Regression model parameters or weights.

In simple linear regression, we try to find the best values for m and c (constant) while ensuring minimum error. We can use the sklearn LinearRegression library for linear regression tasks.

Polynomial Regression

In some regression problems, we get complex data where a simple line cannot fit on all data points. In these cases, you can try polynomial regression to create an n-degree polynomial that fits the training data.

Polynomial regression gives a non-linear relationship between dependent and independent variables.

Equation of Line for polynomial regression

y = θ1  X + θ2  X² + c

y = dependent variable; X = independent variable, c = Constant 

We can use the sklearn Linear Regression library for polynomial regression tasks.

How the polynomial model is different from the linear model?

In the linear model, we used only one independent variable X. In polynomial regression, we add more polynomial features X² as an independent variable.

We can import PolynomialFeatures from the sklearn preprocessing library to add more features.

Applications of Regression models

We can use regression algorithms for the following applications

  1. Prepare market strategies by doing market analysis.
  2. Evaluate returns on investments such as marketing budget vs. movie earnings.
  3. Predicting how much a consumer is going to spend.
  4. The relationship between two variables such as a change in resistance with temperature or, a change in water consumption with temperature.

Logistic Regression

Regression models are also used for classification tasks. We can apply logistic regression where positive and negative classes are linearly separable.

This image shows the Logistic Regression Representation
Logistic Regression Representation

Logistic regression is a supervised ML algorithm. Its application is to estimate the probability of an instance belonging to which class. We can declare the instance class if the estimated probability of an unknown instance belonging to a particular class is greater than 50%.

Compared to linear regression, logistic regression output is between “0” to “1”. “0” being the lowest probability, whereas “1” is the highest.

Types of Logistic Regression

This image shows various types of logistic regression algorithms.

How does logistic regression work?

Assumption: We will denote positive points as “1”  and negative points as -1. 0 is the boundary line.
Logistic regression uses a sigmoid function to ensure a large outliner does not impact the best-fit line.

Support Vector Machine

Support vector machine is a supervised ML algorithm that we can use for linear and non-linear classification, regression, and outliner detection tasks.

SVM: Machine Learning Algorithms for Classification

We can apply the SVM algorithm for classification tasks on small and medium-sized complex data sets.

This image shows the difference between linear and SVM classification

Unlike linear classification, SVM classification does not classify two classes just by linear line. SVM classifier tries to fit the widest possible street represented by two dashed lines to differentiate two classes. Therefore, SVM classifiers are also known as Margin classifiers.

  • We can call this path (space between two dashed lines) a street.
  • The SVM classifier will not be affected if we add more test points outside this street.
  • These two points that define the street are known as support vectors.

SVM: Machine Learning Algorithms for Regression

During the classification task, we try to get the widest possible street while ensuring there is no observation on the street.

This image shows the difference between linear and SVM Regression

The regression task is just the opposite of classification. In SVM regression, we try to fit as many instances as possible on the street.

Applications of Support vector Machine Learning Algorithms

Here is the list of applications of SVM algorithms have applications:

  • Face detection
  • Linear and Non-Linear Regression
  • Linear and Non-Linear Classification
  • Image classification
  • Diseases Classification such as type of cancer
  • Handwriting Recognition
  • Intrusion Detection
  • Outlinear Detection

K-Nearest Neighbor (KNN)

K-nearest neighbor or KNN is a non-parametric supervised ML algorithm that we can use for classification and regression problems. Its primary application is in classification.

In KNN, we try to identify the nearest neighbor to assign a label or class to an unknown instance. We need to calculate the distance between an unknown instance and other known instances to identify the nearest neighbor.

From distance metrics, we create a decision boundary. Euclidean distance, Manhattan distance, Minkowski distance, and Hamming distance are popular distance measures in KNN.

Applications of KNN Machine Learning Algorithms

K-Nearest Neighbor has the following applications.

  • Data Preprocessing such as filling in the missing values
  • Recommendation Engine
  • Predicting stock prices
  • Predicting the risk of heart attack or cancer
  • Pattern Recognition.

Decision Tree

Decision trees are versatile ML algorithms that we can use for regression and classification tasks. They are also base components for Random Forest algorithms.

This image shows the representation of decision tree.

Each Decision tree consists of multiple nodes. 

  • A decision tree starts with a root node. It is the topmost node where the decision tree starts. 
  • The decision node is where the data branches out into possible outcomes. These outcomes give additional nodes (Decision node and Leaf node). Each decision node consists of parameters such as Condition, gini score, samples, value, and predicted class.
  • A leaf node is where we conclude the final decision. It does not consist of any conditions and does not ask for any questions because it makes a decision and does not ask for any questions.

Sample: Number of training instances

Value: Number of training instances from each class

Gini: It measures the impurity if the data is pure gint-0. A node is pure if all instances in a node belong to the same category.

Applications of Decision Tree: ML Algorithm

Decision trees have the following applications.

  • Evaluate customers with high default risk.
  • Customer retention by analyzing their behavior.
  • Preparing a marketing strategy.

Ensemble Learning

Voting is one of the oldest techniques to make a decision. During voting, we aggregate answers from multiple people. There are chances this voting answer is better than an expert answer. 

Voting classifiers are one of the popular classifiers in ensemble learning where we train multiple machine learning classifiers on the same or different data.

For any unknown input, we get the predicted value from multiple classifiers and input multiple classifier results to the voting classifier. The voting classifier aggregates the results from multiple classifiers and gives output.

Steps in Ensemble Machine Learning Algorithm

Here is the list of steps in Ensemble Learning Implementation.

Step-1: Train multiple classifiers on the training data

For example, when our goal is to classify an email: if an email is spam or not spam.

In ensemble learning, we will train multiple classifiers such as Logistic Regression, SVM classifier, Random forest, or any other classifier on the training data.

Step 2: Predict values using all trained classifiers for new unknown instances.
The next step is to predict values for unknown instances using all trained classifiers. For a particular email, if Logistic regression and random forest predict the email is spam, whereas the SVM classifier predicts it is not spam.
Step 3: Aggregating the prediction of each classifier
The next step is to aggregate the results for all classifiers ( Two classifiers are predicting the email as spam, and one classifier is predicting it as not spam). The ensemble algorithm gives output as the email is spam because the majority of votes say the email is spam.

Random Forest

Random forest is an ensemble learning technique that utilizes multiple decision trees to predict values from unknown instances. It combines the output from multiple decision trees to get one result.
This image shows the working of Random Forest algorithms

Applications of Random Forest

Random Forest has applications to evaluate customers with high default risk, analyze drug responses, and make product recommendations for the e-commerce industry.

Principal Component Analysis: PCA

Sometimes, we have thousands of features for each training instance. The higher the number of features:

  • Slower will the training.
  • And getting an optimized solution becomes difficult.

PCA can reduce the number of features while keeping all information within. For example, in MNIST data, outside pixels are white in most of the images. These white pixels do not add much value during predictions. Therefore, we can drop them. 

We humans can understand 2D and 3D data easily. It is very difficult to visualize data greater than 3 degree of freedom. Therefore, we can use PCA to reduce the number of features and visualize the data.

What is PCA?

This image shows the Steps in Principal Component Analysis

In Principal Component Analysis, we identify a hyper-plane that is close to data. Afterward, we project the data on the hyper-plane.

For example, if we have two features X1 and X2. We try to find a hyperplane that minimizes the distance. Afterward, we project the data on the hyper-plane. This new data preserves the maximum amount of variance.

K-means Clustering

K-means is an unsupervised machine-learning algorithm. Its application is to group similar items into one cluster. 

It clusters the instances into one cluster ensuring the sum of the squared distance between the instance and cluster is minimal.

Example of K-means Clustering

We will consider an example of a customer where we have customer income and their spending to understand k-means clustering.

We can define customers into the following four categories.

  1. Low-income high spending
  2. Low-income low spending
  3. High-income high spending
  4. High-income low spending

Let’s try to understand how this k means clustering works.

Step 1: We are considering the number of clusters K are known. In this case, there are a total of 4 clusters.
Step 2: Select K random points from the data as cluster centroid. (Mark centroid with a different color)
Step 3: Assign all the points to the closest cluster centroid by measuring the distance from the instance to the centroid.
Step 4: Update the centroids ensuring the sum of the squared distance between the instance and the cluster is minimum.

Conclusion

Machine learning is changing the way we do business. It has applications in industries such as medical, manufacturing, hospitality, and entertainment. Multiple Machine Learning algorithms are available, and each has a different application. We need to select the best algorithm for an application. The best way is to try multiple algorithms.

We suggest you read this article on the steps in Machine learning Lifecycle.

Thanks for Reading! Enjoy Learning

Leave a Reply

Your email address will not be published. Required fields are marked *