Accuracy
The rate of correct (or incorrect) predictions made by a model over any given set of data. This is generally determined by dividing the “total number of correct predictions” by “the total number of examples”.
Anomaly Detection
Identifying suspicious elements within a given stream of data. These are located based on how they differs from the rest of the dataset in relevant criteria.
Baseline
A minimum or starting figure used to test the efficacy of your later models. This is usually based on a reasonable hypothesis that takes into account relevant factors, or on the results of simpler models previously deployed with similar functions.
Binary Classification
When a Machine Learning model outputs one of two mutually exclusive classifications. An example would be if a model analysed a chess board to determine which pieces were “black” or “white”.
Churn Factor (Churn Prediction)
The process of a customer choosing to switch product or service provider is classified as churn. Simply put, an organisation’s Churn Factor is the rate at which this occurs, and Churn Prediction, is the preemptive process of understanding when this is going to happen based on historic and current data.
Computer Vision
Training computers to process, analyse and make sense of digital imagery or video.
Confirmation Bias
The act of intentionally seeking out results that consolidate pre-existing beliefs or hypotheses. In Machine Learning this can extend to the process itself, with developers preparing their data in a way that can skew any eventual outcomes.
Cross Validation
Cross-validation is a technique to evaluate predictive Machine Learning models by splitting an original sample into one set to train the model, and a test set to evaluate it against.
Data Augmentation
Supplementing datasets with more data, to ensure that there is an abundance of information that a model can learn from. An example would be training a model to distinguish between normal email and spam email, but lacking enough to kick-start the learning process, so introducing additional emails from elsewhere into the dataset to allow the model to train effectively on.
Data Cleansing
The process of improving the data quality, usually by removing or correcting data values that are incorrect. This is usually done before a Machine Learning project, although throughout the knowledge discovery process it may become apparent that further cleansing is important to improve data quality.
Data Collection
The entire process of collecting relevant information in anticipation of a Machine Learning project.
Data Sourcing
Locating adequate avenues for the collection of data. Usually a fundamental process in any Machine Learning project as good sourcing methods can result in a higher quality of results later on.
Generalisation
Applicable after a model has trained on an existing dataset, this refers to how effective it will be at producing accurate predictions on future datasets.
Intelligent Social Media Integration
The act of incorporating data from social media into existing databases in order to take advantage of a larger dataset when drawing insights.
Iteration
One iteration signifies a single update of a model’s weighting during the training process. It is a fundamental part of the learning process as it allows for the perfect calibration of a model based on live feedback.
Labeled Dataset
Refers to a group of data that consists of elements that are clearly defined or tagged. For example, pictures all containing one person or item, or a list of emails defined as “spam” and “normal”.
Machine Learning
A subset of Artificial Intelligence based on the idea that systems can continously learn from data, understand sequences and eventually extrapolate and act on this knowledge without human involvement.
Machine Learning Pipeline
Processes that allow for the automation of Machine Learning workflows.
Model
Models are created through training an algorithm on a particular set of data.
Natural Language Processing
Using computers to collate and analyse language and speech. It represents a computer’s ability to understand, interpret and even emulate language through an analysis of its underlying qualities.
Neural Network
Named as such because of it’s resemblance to the human brain. Consists of neurons (or connections), across multiple layers, that transfer streams of information by an input and output process.
Outliers
Anomalous results, that either deviate too much from the mean, or that are widely different to other results produced. These can be problematic when training data sets accurately and thus have to be taken notice of when creating models.
Performance
When referring to a model’s performance in Machine Learning it usually signifies the level of accuracy. More specifically, are the results correct or not.
Prediction Bias
In Machine Learning terms this quite simply refers to the differences in averages between the machine’s predictions and the labels in the dataset.
Recommendation Engine
A system most commonly used to offer customers alternative purchasing options. This can often come from historic purchasing patterns of a specific customer, or can be as a result of browsing habits or information inputted by the consumer.
Reinforcement Learning
Training models on continuous iteration, and strengthening its results through trial and error.
Sentiment Analysis
Determining any given group’s attitude towards something based on the language they use when describing it. This can apply to multiple items, including companies, film and products, and is typically classified on a scale from “positive” to “negative”. It should be noted that advancements in Machine Learning allow for the ability to offer more nuanced classifications.
Supervised Learning
Training models by exposing them to datasets that are already labeled correctly. From this the Machine can readily identify patterns and apply these to new datasets in the future.
Unsupervised Learning
Training a model to find patterns in an unlabeled dataset. The models recognize common features and can extrapolate this knowledge when exposed to further datasets.
Variance
This refers to the consistency of predictions being produced by a given model. Low variance indicates that there is not much deviation in predictions after each iteration, whereas high variance indicates that the models are varying a lot in terms of results.
Video Segmentation
The act of partitioning a video into different sections, often used to allow for more accurate and precise analysis of video content.
Stay up to date with latest articles about AI and Machine learning tools.