The Ultimate Machine Learning Glossary: Key Terms & Concepts Explained
machine learning glossary is your go-to resource. We’ve broken down the most important terms and concepts used in ML, data science, and artificial intelligence in a beginner-friendly yet comprehensive way.
Whether you're a data scientist, software engineer, or tech enthusiast, this glossary will help you speak the language of machine learning fluently.
General Machine Learning Terms
Machine Learning (ML)
Machine learning is a subset of artificial intelligence (AI) that enables computers to learn from data without being explicitly programmed. It focuses on developing algorithms that improve automatically through experience.
Supervised Learning
Supervised learning involves training a model on labeled data, where the input comes with the correct output. Common examples include classification (e.g., spam detection) and regression (e.g., predicting house prices).
Unsupervised Learning
In unsupervised learning, the model works with unlabeled data to discover hidden patterns, such as grouping similar customers (clustering) or reducing data dimensions (dimensionality reduction).
Reinforcement Learning
Reinforcement learning is a type of machine learning where an agent learns to make decisions by performing actions and receiving rewards or penalties. It's often used in robotics and game AI.
Data and Feature Engineering
Dataset
A dataset is a structured collection of data used for training, validating, or testing machine learning models.
Features
Features are the input variables or attributes used by models to make predictions. In a dataset about houses, features could include size, location, and number of rooms.
Label
A label is the output or target variable in supervised learning. For example, the label for a house might be its price.
Feature Engineering
Feature engineering is the process of selecting, transforming, or creating new features to improve model performance.
Modeling and Algorithms
Model
A model is a mathematical representation that maps input data to outputs. In machine learning, models are trained to recognize patterns in data and make predictions or decisions.
Overfitting
Overfitting happens when a model performs well on training data but poorly on new, unseen data. It means the model has memorized rather than generalized.
Underfitting
Underfitting occurs when a model is too simple to learn the patterns in the data, resulting in poor performance even on training data.
Neural Network
A neural network is a type of model inspired by the human brain. It consists of layers of interconnected nodes ("neurons") and is the foundation of deep learning.
Evaluation Metrics in Machine Learning
Accuracy
Accuracy measures how many predictions the model got right out of all predictions. It’s useful when classes are balanced.
Precision
Precision tells you what proportion of positive identifications were actually correct.
Recall
Recall measures how many actual positives were correctly identified by the model.
F1 Score
The F1 score is the harmonic mean of precision and recall. It balances both metrics and is especially useful for imbalanced datasets.
Confusion Matrix
A confusion matrix is a table that shows the performance of a classification model by breaking down predictions into true positives, false positives, true negatives, and false negatives.
Training and Optimization
Training
Training is the process of feeding a model data so it can learn from it. This is where the model adjusts its internal parameters to minimize error.
Validation
Validation involves using a separate portion of the data to tune hyperparameters and prevent overfitting during training.
Testing
Testing evaluates the final model on a completely separate dataset to assess its real-world performance.
Gradient Descent
Gradient descent is an optimization algorithm that updates a model's parameters to minimize the loss function.
Learning Rate
The learning rate controls how much the model updates in response to the error it sees. Too high can overshoot; too low can slow learning.
Advanced Machine Learning Concepts
Transfer Learning
Transfer learning allows you to use a pre-trained model on a new but similar problem, saving time and resources. It’s widely used in NLP and computer vision.
Hyperparameter Tuning
This is the process of optimizing parameters that aren’t learned during training (like learning rate, tree depth, etc.) to improve model performance.
Bias-Variance Tradeoff
This concept describes the tension between a model that is too simple (high bias) and one that is too complex (high variance). The goal is to find the right balance for generalization.
Ensemble Methods
Ensemble methods combine predictions from multiple models to produce more accurate results. Examples include Random Forest, Bagging, and XGBoost.