Best Machine Learning Algorithms for Beginners in 2020

| | ,

Sharing is Caring

Past decade has seen exponential growth in the use of Machine Learning. The things our grandparents thought could only be done by an intellectual human are now being done by machines without human interference. Such is the power of Machine Learning.

For those of you who don’t understand what machine learning is

Machine learning is a method of data analysis that automates analytical model building. It is a branch of technology that allows systems to learn from data, identify patterns and make decisions with minimal human intervention.

Machine learning make it possible for humans to identify connection in between elements which othervise would have been impossible for a human being.

Today machine learning helps us to

  • Identify the market trend
  • Predict future events
  • Medical Discoveries
  • In cybersecurity
  • In the near future, ML is even supposed to be extensively used in battlefields

Hence we can say that the time of the machines is here.

Now with Machine Learning being the hottest topic of the decade and eventually the century. There are a ton of ML aspirants out there who want to grab this opportunity. Cause

If you dont learn how to built a machine, a machine built by someone else will take your job.


Before we get into the story, it is extremely important to set the base correct hence you can head over to this article to learn more about Artificial intelligence and Machine Learning.

First lets get our feets wet in the pond of Supervised and Unsupervised Learning.

Supervised Learning

Supervised learning is when you have some input variables, say x and an output variable y, and you use an algorithm to learn the mapping function from the input to the output as y = f(x)

The goal of supervised learning is to approximate the mapping function that when you have new input data (x), you can predict the output variables (Y) for that data with the same accuracy.

It is called supervised learning because the process of learning for the algorithm is by learning from the training dataset can be thought of as an analogy to a teacher supervising the learning process of some students.

Unsupervised Learning

Unsupervised learning is when you only have input data x and no corresponding output variables.

The goal of unsupervised learning is to construct the underlying structure or distribution in the data in order to learn more about the data.

These are called unsupervised learning because unlike supervised learning, there are no correct methods of solving a problem or no teachers to supervise. Algorithms learn about the data themselves, devise to discover and present the interesting structure in data in the best possible way.

So let’s get into the best Machine Learning algorithms that we have heard about a hundred times, but read with clarity this time about its applications and powers, in no particular order of importance.

Algorithms that we have heard about a hundred times, but read with clarity this time about its applications and powers, in no particular order of importance.

1. Linear Regression

Linear Regression
Linear Regression Line

Linear Regression is a machine learning algorithm based on supervised learning. Linear Regression models the relationship between a dependent variable and one or more independent variables. Linear regression is mainly used when working with scalar and exploratory variables.

Linear Regression finds its application to determine the extent to which there exists a linear relationship between a dependent variable (scalar) and one or more independent variables (exploratory). A single independent variable is used to predict the value of a dependent variable.

Real Life Applications of Linear Regression

  1. Risk Management in financial services or insurance domain
  2. Predictive Analytics
  3. Econometric
  4. Epidemiology
  5. Weather data analysis
  6. Customer survey results analysis
  1. Customer survey results analysis

2. Logistic Regression

Logistic Regression is used when the dependent variable is binary. It is a go-to method for binary classification problems in statistics. First, it is quintessential to understand when to use linear regression and when to use logistic regression.

What is the difference between Linear and Logistic Regression?

Linear regression is used when the dependent variable is continuous and the nature of the regression line is linear.

Logistic regression is used when the dependent variable is binary in nature.

When to use Logistic regression?

It is a special case of linear regression where the target variable is categorical in nature. It uses a log of odds as the dependent variable.

The sigmoid function, also called the logistic function, gives an ‘S’ shaped curve that can take any real-valued number and map it into a value between 0 and 1.

Logistic Regression
  1. If the curve goes to positive infinity, y predicted will become 1
  2. If the curve goes to negative infinity, y predicted will become 0
  3. If the output of the sigmoid function is more than 0.5, we can classify the outcome as 1 or YES, and if it is less than 0.5, we can classify it like 0 or NO
  4. If the output is 0.75, we can say in terms of probability as: There is a 75 percent chance that patient will suffer from cancer.
Logistic regression

Thus, Logistic Regression predicts the probability of occurrence of a binary event utilizing a sigmoid function.

Real life applications of Logistic Regression

  1. Cancer Detection
  2. Trauma and Injury Severity Score
  3. Image Segmentation and Categorization
  4. Geographic Image Processing
  5. Handwriting recognition
  6. Prediction whether a person is depressed based on a bag of words from the corpus

3. Support Vector Machine

Machine learning largely involves predicting and classifying data. To do so, have a set of machine learning algorithms ti implement depending on the dataset. One of these ML algorithms is SVM. The idea being simple: create a line or a hyperplane which separates the data into multiple classes.

SVM Hyperplane and the classes
SVM Hyperplane and the classes

Support Vector Machine (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges. However, it is mostly used in classification problems. SVM transforms your data base on it, finds an optimal boundary between the possible outputs.

Support Vector Machine performs classification by finding the hyperplane that maximizes the margin between the two classes.

The vectors that define the hyperplane are called the support vectors.

The SVM Algorithm

  1. Define an optimal hyperplane with a maximized margin
  2. Map data to a high dimensional space where it is easier to classify with linear decision surfaces
  3. Reformulate problem so that data is mapped implicitly into this space

Real Life Applications of SVM

  1. Face detection — classify between face and non-face areas on images
  2. Text and hypertext categorization
  3. Classification of images
  4. Bioinformatics — protein, genes, biological or cancer classification.
  5. Handwriting recognition
  6. Drug Discovery for Therapy

In recent times, SVM has played a very important role in cancer detection and its therapy with its application in classification.

with its application in classification.

4. Decision Trees

Source: Wikipedia

A decision tree is a decision support tool that uses a tree-like model of decision-making process and the possible consequences. It covers event outcomes, resource costs, and utility of decisions. Decision Trees resemble an algorithm or a flowchart that contains only conditional control statements.

A decision tree is drawn upside down with the root node at top. Each decision tree has 3 key parts: a root node, leaf nodes, branches.

In a decision tree, each internal node represents a test or an event. Say, a heads or a tail in a coin flip. Each branch represents the outcome of the test and each leaf node represents a class label — a decision taken after computing all attributes. The paths from root to leaf nodes represent the classification rules.

Decision trees can be a powerful machine learning algorithm for classification and regression. Classification tree works on the target to classify if it was a heads or a tail. Regression trees are represented in a similar manner, but they predict continuous values like house prices in a neighborhood.

The best part about decision trees:

  1. Handle both numerical and categorical data
  2. Handle multi-output problems
  3. Decision trees require relatively less effort in data preparation
  4. Nonlinear relationships between parameters do not affect tree performance

Real life applications of Decision Trees

  1. Selecting a flight to travel
  2. Predicting high occupancy dates for hotels
  3. Number of drug stores nearby was particularly effective for a client X
  4. Cancer vs non-cancerous cell classification where cancerous cells are rare say 1%
  5. Suggest a customer what car to buy

5. Random Forests

Random Forests in machine learning is an ensemble learning technique about classification, regression and other operations that depend on a multitude of decision trees at the training time. They are fast, flexible, represent a robust approach to mining high-dimensional data and are an extension of classification and regression decision trees we talked about above.

Ensemble learning, in general, can be defined as a model that makes predictions by combining individual models. The ensemble model tends to be more flexible with less bias and less variance. Ensemble Learning has two popular methods as:

  • Bagging: Each individual tree to randomly sample from the dataset and trained by s random subset of data, resulting in different trees
  • Boosting: Each individual tree /model learns from mistakes made by the previous model and improves

Random forest run times are quite fast. They are pretty efficient in dealing with missing and incorrect data. On the negatives, they cannot predict beyond the defined range in the training data, and that they may over-fit data sets that are particularly noisy.

A random forest should have a number of trees between 64–128 trees.

  1. stimated loss or profit while purchasing a particular stock

6. K-nearest neighbors

K- nearest neighbor (kNN) is a simple supervised machine learning algorithm that can be used to solve both classification and regression problems.

kNN stores available inputs and classifies new inputs based on a similar measure i.e. the distance function. KNN has found its major application in statistical estimation and pattern recognition.

What does kNN work?

KNN works by finding the distances between a query and all inputs in the data. Next, it selects a specified number of inputs, say K, closest to the query. And then it votes for the most frequent label (in the case of classification) or averages the labels (in the case of regression).

The kNN Algorithm:

  1. Load the data
  2. Initialize k to a chosen number of neighbors in the data
  3. For each example in the data, calculate the distance between the query example and the current input from the data
  4. Add that distance to the index of input to make an ordered collection
  5. Sort the ordered collection of distances and indices in ascending order grouped by distances
  6. Pick the first K entries from the sorted collection
  7. Get the labels of the selected K entries
  8. If regression, return the mean of the K labels; If classification, return the mode of the K labels

Real world applications of kNN

  1. Fingerprint detection
  2. Forecasting stock market
  3. Currency exchange rate
  4. Bank bankruptcies
  5. Credit rating
  6. Loan management
  7. Money laundering analyses
  8. Estimate the amount of glucose in the blood of a diabetic person from the IR absorption spectrum of that person’s blood.
  9. Identify the risk factors for a cancer based on clinical & demographic variables

For more awesome content subscribe to our newsletter.

[jetpack_subscription_form show_only_email_and_button=”true” custom_background_button_color=”#313131″ custom_text_button_color=”#cf2e2e” submit_button_text=”Subscribe” submit_button_classes=”wp-block-button__link has-text-color has-vivid-red-color has-background has-very-dark-gray-background-button-color” show_subscribers_total=”false” ]

Or Join us on following Social Network

6. K-means clustering

K-means clustering is one of the simplest and very popular unsupervised machine learning algorithms.

Did we not talk about something so similar above?

Difference between k-nearest neighbors and k-means clustering

K-means algorithm identifies k number of centroids, and then allocates every data point to the nearest cluster, while keeping the centroids as small as possible. The ‘means’ in the K-means refers to averaging of the data; that is, finding the centroid.

K-means algorithm starts with a first group of randomly selected centroids, which are used as the beginning points for every cluster, and then performs iterative (repetitive) calculations to optimize the positions of the centroids. It halts creating and optimizing clusters when either the centroids have stabilized or a defined number of iterations have been achieved.

The K-means clustering algorithm:

  1. Specify the number of clusters K.
  2. Initialize centroids by first shuffling the dataset and then randomly selecting K data points for the centroids without replacement
  3. Keep iterating until the centroids are stabilized
  4. Compute the sum of the squared distance between data points and all centroids
  5. Assign each data point to the closest cluster (centroid)
  6. Compute the centroids for the clusters by taking the average of the data points that belong to each cluster.

Real-World applications of K-means Clustering

  1. Identifying fake news
  2. Spam detection and filtering
  3. Classify books or movies by genre
  4. Popular transport routes while town planning.

8. Naive Bayes

Naive Bayes is super effective, commonly-used machine learning classifier. Naive Bayes is in its own a family of algorithms including algortihms for both supervised and unsupervised learning.

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is not a single algorithm but a family of algorithms where all of them share a common principle, i.e. every pair of features being classified is independent of each other.

In order to understand Naive Bayes, let us recall Bayes rule:

What is so “naive: in Naive Bayes?

Naive Bayes (NB) is naive because it makes the assumption that attributes of a measurement are independent of each other. We can simply take one attribute as independent quantity and determine proportion of previous measurements that belong to that class having the same value for this attribute only.

Naive Bayes is used primarily to predict the probability of different classes based on multiple attributes. It is mostly used in text classification while mining the data. If you look at the applications of Naive Bayes, the projects you always wanted to do can be best done by this family of algorithms.

Real-world applications of Naive Bayes

  1. Classify a news article about technology, politics, or sports
  2. Sentiment analysis on social media
  3. Facial recognition software
  4. Recommendation Systems as in Netflix, Amazon
  5. Spam filtering

Keep training Keep crunching!!


Google introduces a new Python Course : Scholarships for 2,500

PinePhone: Smartphone for Linux and Security aficionados and professionals


Leave a Reply

Blogarama - Blog Directory