DIGITAL WIKI
Model
A mathematical relationship between our objective and data. It can be a formula, a process, an approach or a machine learning relationship
PS
Problem Solving or Brainstorming the possible solutions to a problem through a meeting with industry experts
AI
Artificial intelligence or making an intelligent decision on the go which a human would have taken other wise
Random Forest
Taking some columns and some data points from the data to make a machine learning model creates a decision tree. A collection of decision tree forms a random forest. Each tree in a forest has a different combination of columns and data points
Regression
The process of fitting any kind of model to any kind of data so as to minimize the error
R-Squared
A measure between 0 and 1 to show how many different values are present in data and how many of them are captured by the model. High R-squared means model is accurate. Having too much of data or having a bad model will result in decrease in R-squared
Machine Learning
A technical approach to understanding the relationship between data and objective
R
An open-source programming language and environment for statistical computing and graph generation
Python
A programming language available since 1994 that is popular with people doing data science. Python is noted for ease of use among beginners and great power when used by advanced users
DE
Data Engineer, responsible for data architecture pipeline, data quality checks and data ingestion
Pipeline
Sequence of steps needed for getting data and generating model output
Ingestion
Reading data and writing it in a format so that R/Python model can process it for further output
Data Points
Rows from data
Pivot
A summary of data, such as checking the number of days in the data. The most notable feature of pivot tables is that you can arrange them dynamically. This is usually done in excel but also possible with R/Python
Variance
How much a list of numbers varies from the mean (average) value
Standard Deviation
The square root of the variance and better statistic than variance. An observation more than three standard deviations away from the mean can be considered quite rare
Outliers
Any data which is so high or so low or so abnormal that it can be considered an unexpected value. This come from data issues and such data can be removed
Data Science
A field that works with and analyzes large amounts of data to provide meaningful information that can be used to make decisions and solve problems
Data Scientist
Someone with expertise in programming languages such as R and Python who helps businesses collect, compile, interpret, format, model, make predictions about, and manipulate all kinds of data in all manner of ways
Big data
Some data which is very large in size, comes at a very high speed and can be of variety of types (characterized by 3Vs)
Overfitting
A situation when a model has fitted on the data perfectly but is still unable to predict outcomes well. This happens because either the real world situations are not similar to the data or the model is not able to predict values outside the boundary of data provided
Training
The process or time a computer/server takes to fit on the model
Correlation
Measure of how much one set of values depends on another. Correlation does not mean causation. High correlation means that things go hand in hand but the relationship might be coincidental. For example, India winning a cricket match has a high correlation with rains but that relation is coincidental
Clustering
Grouping of similar points/sites/data together. Generally, all members belonging to the same group have something in common
A/B testing
Process of comparing two variations of the same variable to find out which one performs best
Alexa
Amazon’s home assistant device that uses voice commands to do various things like answer questions, turn on the projector, give weather updates and play podcasts
Dashboard
A view (on excel, PowerBI or image) that contains and displays aggregate data about the performance of a project
Heat map
A graphical representation to show magnitude in data. For example, high values can be represented by red, low by green and all others by amber
PowerBI
A Microsoft tool to show dashboards and charts with filters as per convenience
A digital document format that provides a digital image of text or graphics. PDFs are usually NOT editable. A major advantage of PDS over excel/word document is that its format remains same when opened in any computers