Marginal Books

UDEMY Course:  The Mahalanobis Distance Test for Outliers

Using a 7-Step Procedure, learn how calculate and validate the Mahalanobis Distance Test

The Objectives of the Course

In most of the Machine Learning methods and algorithms that analyze datasets, there is a need to investigate how “close” or “far” the items of the dataset are to each other. This can allow analysts to look for outliers, anomalies, classify data items into clusters, establish if there are associations between the items or not and such issues.

To do that, Machine Learning methods rely on the use of a mathematical concept: the distance between items in a dataset. We are used to consider distance as the length between two points. Mathematicians have a wider use of the term. A customer dataset consisting of 1000s of customers will have a set of M attributes about each customer. M can be in the 2 digit range. If M = 1, 2 or 3, we can visualize the distance between points in terms of charts. This stops being possible for M > 3. The distance becomes a mathematical expression consisting of a vector for each item in the dataset where the vector is a set of the instances or values of the attributes for each item in the dataset.

What makes this more interesting is that there are various ways distances can be calculated: Euclidian, Manhattan, Minkovsky and Chebyshev distances. The Euclidian is the most common. However, with time, Machine Learning methods using the Euclidian Distance resulted in anomalies in the results giving invalid answers to the use of the distances.

Since the Euclidian Distance is calculated in multivariate space by multiplying the Transpose of the dataset, PT with the dataset P. This is where Mr. Prasad Mahalanobis with his genius in statistics, came up with the idea: why not transform that dataset P before multiplying it by its transpose. This resolve a large number of issues with the Euclidian Distance.

The objective of the course is to present a 7-Step procedure used to calculate the Mahalanobis Distances and from the resulting matrix, identify the outliers. Identification will be based on specifying a significance level (such as 0.1%, 1% and 5%).

The course will also provide support lectures that are required as pre-requisites or knowledge and practices needed to apply the 7 steps.


Details of the Course

Charges: $19.99 is price if you use this link and coupon:

However, on a regular basis, UDEMY will post the course at different prices (up or down!). Review the site for such prices. 

Click to go to the 
Course ​Page at UDEMY