Glossary of ML & AI Terms


The rate of correct (or incorrect) predictions made by a model over any given set of data. This is generally determined by dividing the “total number of correct predictions” by “the total number of examples”.


Anomaly Detection  

Identifying suspicious elements within a given stream of data. These are located based on how they differs from the rest of the dataset in relevant criteria.


A minimum or starting figure used to test the efficacy of your later models. This is usually based on a reasonable hypothesis that takes into account relevant factors, or on the results of simpler models previously deployed with similar functions.

Binary Classification

When a Machine Learning model outputs one of two mutually exclusive classifications. An example would be if a model analysed a chess board to determine which pieces were “black” or “white”.


Churn Factor (Churn Prediction)  

The process of a customer choosing to switch product or service provider is classified as churn. Simply put, an organisation’s Churn Factor is the rate at which this occurs, and Churn Prediction, is the preemptive process of understanding when this is going to happen based on historic and current data.


Computer Vision

Training computers to process, analyse and make sense of digital imagery or video.  


Confirmation Bias

The act of intentionally seeking out results that consolidate pre-existing beliefs or hypotheses. In Machine Learning this can extend to the process itself, with developers preparing their data in a way that can skew any eventual outcomes.

Cross Validation

Cross-validation is a technique to evaluate predictive Machine Learning models by splitting an original sample into one set to train the model, and a test set to evaluate it against. 


Data Augmentation 

Supplementing datasets with more data, to ensure that there is an abundance of information that a model can learn from. An example would be training a model to distinguish between normal email and spam email, but lacking enough to kick-start the learning process, so introducing additional emails from elsewhere into the dataset to allow the model to train effectively on.  


Data Cleansing

The process of improving the data quality, usually by removing or correcting data values that are incorrect. This is usually done before a Machine Learning project, although throughout the knowledge discovery process it may become apparent that further cleansing is important to improve data quality.

Data Collection

The entire process of collecting relevant information in anticipation of a Machine Learning project.


Data Sourcing

Locating adequate avenues for the collection of data. Usually a fundamental process in any Machine Learning project as good sourcing methods can result in a higher quality of results later on.



Applicable after a model has trained on an existing dataset, this refers to how effective it will be at producing accurate predictions on future datasets.


Intelligent Social Media Integration

The act of incorporating data from social media into existing databases in order to take advantage of a larger dataset when drawing insights.



One iteration signifies a single update of a model’s weighting during the training process. It is a fundamental part of the learning process as it allows for the perfect calibration of a model based on live feedback.  


Labeled Dataset

Refers to a group of data that consists of elements that are clearly defined or tagged. For example, pictures all containing one person or item, or a list of emails defined as “spam” and “normal”.

Machine Learning

A subset of Artificial Intelligence based on the idea that systems can continously learn from data, understand sequences and eventually extrapolate and act on this knowledge without human involvement.


Machine Learning Pipeline

Processes that allow for the automation of Machine Learning workflows.



Models are created through training an algorithm on a particular set of data.


Natural Language Processing

Using computers to collate and analyse language and speech. It represents a computer’s ability to understand, interpret and even emulate language through an analysis of its underlying qualities.


Neural Network

Named as such because of it’s resemblance to the human brain. Consists of neurons (or connections), across multiple layers, that transfer streams of information by an input and output process.  


Anomalous results, that either deviate too much from the mean, or that are widely different to other results produced. These can be problematic when training data sets accurately and thus have to be taken notice of when creating models.



When referring to a model’s performance in Machine Learning it usually signifies the level of accuracy. More specifically, are the results correct or not.


Prediction Bias

In Machine Learning terms this quite simply refers to the differences in averages between the machine’s predictions and the labels in the dataset.


Recommendation Engine

A system most commonly used to offer customers alternative purchasing options. This can often come from historic purchasing patterns of a specific customer, or can be as a result of browsing habits or information inputted by the consumer.


Reinforcement Learning

Training models on continuous iteration, and strengthening its results through trial and error.


Sentiment Analysis

Determining any given group’s attitude towards something based on the language they use when describing it. This can apply to multiple items, including companies, film and products, and is typically classified on a scale from “positive” to “negative”. It should be noted that advancements in Machine Learning allow for the ability to offer more nuanced classifications.


Supervised Learning

Training models by exposing them to datasets that are already labeled correctly. From this the Machine can readily identify patterns and apply these to new datasets in the future.  


Unsupervised Learning

Training a model to find patterns in an unlabeled dataset. The models recognize common features and can extrapolate this knowledge when exposed to further datasets.



This refers to the consistency of predictions being produced by a given model. Low variance indicates that there is not much deviation in predictions after each iteration, whereas high variance indicates that the models are varying a lot in terms of results.

Video Segmentation 

The act of partitioning a video into different sections, often used to allow for more accurate and precise analysis of video content.

Subscribe for Updates

Stay up to date with latest articles about AI and Machine learning tools.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.