The Objectives of the Course
A) The Purpose of the Course
Most courses on this subject are aimed at Machine Learning and Data Science
experts. Often, they are presented for use with specialized development
platforms or even as part of advanced off-the-shelf applications. On the other
hand, the Bayes' Theorem and its applications are based on statistical
principles and concept not often clearly explained.
The purpose of this course is educational. The techniques, algorithms and
procedures presented in this course aim more at making machine learning methods
based on the Theorem easier to understand as opposed to getting used.
The Bayes' Theorem is is one of those theorems where we can apply the proverb:
“Still water is deep”.
The Theorem was developed in an article by Thomas Bayes
in 1763. In due course, it found itself being used in a wide variety of
statistical applications. The Theorem itself was an application of inference.
From there on, and specifically with the advent of Machine Learning algorithms,
the Theorem was extended to be the core of a wide variety of applications such
as Classification, Networks and Optimization.
The Theorem and its applications are best developed using specialized
programming environments. This is due to the mere fact that the applications of
the Theorem require the handling of large data and performance intensive
environments.
B) So, why do we Present a Course based on Excel?
Analysts require the use and the development of such applications have the
following environments available to them:
· Off the shelf applications, ready-made and commercially available.
· Open source or free integrated development environments (IDE) that host a
large
number of scientific and statistical libraries to use in such
applications
In both cases, the Analyst is faced with an insurmountable learning curve,
often not climbable at all. Whether the objective is to use off-the-shelf
products or to develop their own applications, learning the methods in a
machine learning environment is not possible via these two environments.
The course will then use Excel specifically for educational purposes and not as
a machine learning tool. Excel is known by everyone, and if not, it is easy to
learn. Excel is highly flexible in terms of exposing how things work. The
course will then exploit such facilities to expose to the Analyst in a common
sense and step-by-step manner the basis and procedures of these algorithms.
B) What Does the Course Cover?
The course is made up of 5 major sections preceded by a short introduction.
Section 1
: Introducing the Course
This section consists of one lecture that presents the objectives of the
course, its structure and resources as well as what to expect and what not to
expect.
Section 2
: An In-Depth Presentation of Probability Rules and Practices
The section starts with lectures that run through a detailed exposure to the
fundamentals and practices of probability rules. Bayes' Theorem is highly
linked with such rules and it will not be possible for analysts embarking on
its use (and the understanding of its extensions) to learn and use these
algorithms without a deep understanding of probability.
The section uses common sense to clarify often obscure concepts in probability.
Many examples are presented and explained in detail.
Section 3: The Use of the Confusion Matrix for Evaluating Bayesian
Results
Some might wonder why we are introducing the Confusion Matrix and its useful
KPI’s in this course. The answer is that in both Sections 3 and 4, we will need
to evaluate our results in terms of precision, accuracy, error rates, etc. The
Confusion Matrix is a contingency table consisting of four results extracted
from comparing the algorithm’s outcome with the historically known outcome of
the classes in a Test Table. Four measurements consist of True Positive, True
Negative, False Positive and False Negative. These four counts can be used in a
variety of ways to measure such KPI’s as accuracy, precision, error rates and
such. (The confusion matrix is also used in a variety of other classification
machine learning methods: logistic regression, decision trees, etc.)
Section 4
: The Fundamental Application of Bayes’ Theorem
this section presents the Theorem of Bayes first running through a common-sense
example. This is followed by the derivation of the Theorem and a clear
explanation of the terms used in the Bayes' Theorem formula. A set of 8 major
workouts present the use of the Theorem in different formats (vertical and
horizontal tables, decision trees and graphic solutions). The last 3 workouts
output the results of the workouts to a Confusion Matrix and shows how that can
be used to evaluate the results of the Theorem.
Section 5
: How to Use the Naïve Bayes Classifiers
this is the heart of the course. It presents a wide variety of algorithms whose
purpose is the supervised classification of data. The Naïve Bayes Classifiers
are a family of algorithms based on the Bayes’ Theorem. They differ in various
ways from each other. They are listed below.
Amongst the lectures detailing these algorithms with clear examples are
“support” lectures that present topics that are needed as a support to these
algorithms.
After starting with two lectures that present the fundamentals of Naïve Bayes
Classifiers and the required theory, the course proceeds with a set of lectures
consisting of 8 Naïve Bayes Classifier variants:
1) Categorical Naïve Bayes Classifiers
2) Gaussian and Continuous Naïve Bayes Classifiers
3) Non-Gaussian Continuous Naïve Bayes Classifiers
4) Bernoulli Naïve Bayes Classifier
5) Multinomial Naïve Bayes Classifier
6) Weighted Naive Bayes Classification
7) Complement Naïve Bayes Classification
8) Kernel Distance Estimation and Naive Bayes Classification
To support the presentations above, the course will interleave the following
detailed presentations consisting of methods, topics and procedures:
1) Laplace Smoothing Correction
2) Extensions to Continuous Features: checking for normality, checking for
independence of features, smoothing corrections for Gaussian features
3) Two Discrete Distributions - Bernoulli and Categorical
4) Two Discrete Distributions - Binomial and Multinomial
5) Entropy and Information and how used in Naïve Bayes Classification
6) Kononenko Information Gain and Evaluation of Classifiers
7) Log Odds Ratio and Nomograms used in Bayes Classification
8) Kernel Distance Estimation - Estimating the Bandwidth h.
Resources
All lectures will be supported by a variety of resources:
· Solved and documented workouts in Excel
· Dedicated workbooks that animate and describe various probability
distributions