Source: Dataiku
Published: April 2020
Key Data Science Concepts
Circulated: June 1, 2020
Machine Learning: programming systems to perform a task without coding rule-based instructions
Deep Learning: a subset of ML where systems can learn hidden patterns from data
Model: a representation of the real world using mathematics
Algorithm: a set of rules used to make a calculation
Training set: a dataset used to find potentially predictive relationships used to create a model
Test set: a dataset with the same structure as the training set used to measure the performance of models
Training: the process of creating a model from the training set
Target: a dependent variable that is the output that a model predicts (e.g., price of a house)
Feature: an independent variable that is a measurable piece of data (e.g., # of bathrooms in a house)
Overfitting: a model that corresponds too closely with a particular set of data (i.e., training set) and may fail to fit additional data (i.e., test set)