**Introduction**: To learn about data science or machine learning , it is basic necessity to know about statistics. Nowadays if anybody is attending data science or data analyst interview , it is obvious to face some questions related to statistics. If someone is new to data science world, it is very difficult for him/her to relate statistical application using python code.

In this blog I wrote python code with key notes related to descriptive statistics.

**What is Statistics? **To know about , we first know about to term . **Population **and **Sample**. **Population **is total data set collected for analysis , denoted as **N**. **Sample** is subset of population, denoted as **n**. **Statistics **is mathematical analysis and representation about this **sample** data. **Parameter **is the mathematical representation of **population **data.

**For example**, one company wants to conduct a survey on employee satisfaction for the entire company. You were tasked with contacting your project members about their opinion and then submitting them to the HR manager. Is it population or sample data? What should be the name of this presented value?

**Answer**: It is Sample data and presented value is called Statistics. Because you took only one project members’ data which is a small part of whole company data.

**Types of Data** Now we have to understand how many types of data are there. There are **two **types of data, **categorical **and **numerical**. **Numerical **data is divided into **discrete **and **continuous**.

Now let’s start with python coding.

**Declare all required packages** in jupyter notebook

importpandasaspdimportnumpyasnpimportmatplotlib.pyplotaspltimportseabornassns%matplotlibinlinefrommatplotlib.tickerimportPercentFormattercate_list=["Apple","Banana","Orange"] cate_list Output: ['Apple', 'Banana', 'Orange'] cate_dict={1:"Apple",2:"Banana",3:"Orange"} cate_dict Output: {1: 'Apple', 2: 'Banana', 3: 'Orange'} num_list=[1,2,6,7,4] num_list Output: [1, 2, 6, 7, 4] print(np.random.rand(5)) print(np.random.randint(1,16,10))# Type of Data : Categorical and Numerical# Type of Numerical data : Discrete and Continuous.#Age is discrete data, Month is continuous data.# Convert dictionary to data framedata={'Age': [20,25,20,35,40,45,50,55,60,65,70,75],'Month':[1,2,3,4,5,6,7,8,9,10,11,12]} df1=pd.DataFrame.from_dict(data) df1

**Levels of Measurement** There are two types of levels of measurement, Qualitative and Quantitative. Two qualitative levels: nominal and ordinal. There are two quantitative levels: interval and ratio.

# Nominal Datanominal_dict={'Gender': ['Female','Male'],'Hair_Color': ['Black','White']} df2=pd.DataFrame.from_dict(nominal_dict) df2 Output:`Gender Hair_Color`

0 Female Black 1 Male White# Ordinal Dataordinal_dict={'Rating': ['Satified','Avg Satisfied','Not Satisfied']} df3=pd.DataFrame.from_dict(ordinal_dict) df3 Output:`Rating`

0 Satified 1 Avg Satisfied 2 Not Satisfied

#Interval Datainterval_data={'Income':[25000,30000,40000]} df4=pd.DataFrame.from_dict(interval_data) df4 Output:`Income`

0 25000 1 30000 2 40000# Ratio Data : measurement of heights.ratio_data={'Height':[160,167,170]} df5=pd.DataFrame.from_dict(ratio_data) df5 Output:`Height`

0 160 1 167 2 170

Conclusion : Till now we talked about different types of variables, in next blog, will discuss about **Central Tendency (end part of descriptive statistics) with Pythons**.

## 2 comments