Module 7: Missing Data

What is a Missing Value?

  • If in any row or column in a dataframe, the value is not available, it is said as the missing value.

  • So defining missing data: Missing data (or missing values) is defined as the data value that are not stored in a column or row.

  • Consider this small dataset which has some missing values in it (shown in the red box).

  • Well this is a small dataset so we can easily observe the missing values here. But real data are very large in size and you cannot easily see the missing values in the dataframe.

  • Pandas provides isnull(), isna() functions to detect missing values. Both of them do the same thing.

  • df.isna() or df.isnull() returns the dataframe with boolean values indicating missing values.

  • We can get column wise count of all the missing values using the aggregation function sum()

  • Both the columns 'Names' and 'Marks%' have one missing values in each. 'Regd. No' has no missing values so the value is 0.

  • Pandas also provides fillna() method to fill the missing values. fillna() provides many different strategy to fill the missing values.

  • Let's say we want to fill the missing values in 'Names' column with 'unknown'.

  • But the changes are not made inplace.

  • The dataframe still contains missing values in the 'Names' column.

  • We can pass 'inplace' parameter as True in fillna() method. It will make the changes in the original dataframe.

  • There is still a missing value in 'Marks%' column. Let's say we want to fill the missing value in this column with the mean of the marks scored by other people.

  • Now, our dataframe has no missing values.

Last updated

Was this helpful?