Key Data Science Concepts

Source: Dataiku Published: April 2020

Source: Dataiku
Published: April 2020

Key Data Science Concepts

Circulated: June 1, 2020

  • Machine Learning: programming systems to perform a task without coding rule-based instructions

  • Deep Learning: a subset of ML where systems can learn hidden patterns from data

  • Model: a representation of the real world using mathematics

  • Algorithm: a set of rules used to make a calculation

  • Training set: a dataset used to find potentially predictive relationships used to create a model

  • Test set: a dataset with the same structure as the training set used to measure the performance of models

  • Training: the process of creating a model from the training set

  • Target: a dependent variable that is the output that a model predicts (e.g., price of a house)

  • Feature: an independent variable that is a measurable piece of data (e.g., # of bathrooms in a house)

  • Overfitting: a model that corresponds too closely with a particular set of data (i.e., training set) and may fail to fit additional data (i.e., test set)