Statistics for Machine Learning

Hey everyone, hope you are doing great. In this article, we are going to cover the Statistics you need to know to start Machine Learning. Here goes:

There are two types of statistics you need to know for ml:

  1. Descriptive statistics: This branch of statistics deals with summarizing and describing data using measures such as mean, median, mode, variance, and standard deviation.

  2. Inferential statistics: This branch of statistics deals with using sample data to make inferences about a larger population. It includes techniques such as hypothesis testing, confidence intervals, and regression analysis.

Types of Descriptive statistics

  • Measure of Central Tendency

    1. Mean: The average of a set of values. It is calculated by adding up all the values and dividing them by the total number of values.

    2. Median: The middle value of a set of values. It is calculated by ordering the values and selecting the value in the middle (or the average of the two middle values if there is an even number of values).

    3. Mode: The most common value in a set of values.

  • Measure of Dispersion

    1. Variance: A measure of the spread or dispersion of a set of values. It is calculated as the average of the squared differences between each value and the mean.

    2. Standard Deviation: A measure of the spread or dispersion of a set of values around the mean. It is calculated as the square root of the variance.

  • Distribution: Normal distribution, Skewed distribution, Bimodal distribution, Exponential distribution etc. (We would be talking about them in detail in the next article.

Before starting with Distribution you would want to understand the classification of data. Here's the chart that will give you a better understanding.

Quantitative: It refers to the numerical data

  1. Discrete: It is a type of data where the value of data is independent. Ex-: The number of students in a class, The number of goals scored in a soccer game.

  2. Continuous: Continuous data is a type of quantitative data that can take on any value within a given range, and can be measured on a continuous scale. Ex-: Height and weight of a person, Time taken to complete a task etc

Qualitative: Qualitative data is a type of non-numerical data that describes or explains a phenomenon through characteristics, attributes, or qualities.

  1. Ordinal: Ordinal data is a type of categorical data that represents a ranking or order of values or categories. Ex-: Education level, salary etc.

  2. Nominal: Nominal data is a type of categorical data that represents a set of values or categories that do not have any natural order or hierarchy. Ex-: Gender, marital status etc.

I hope you got some value out of it. Next we will see Distribution in detail.

Thank you :)