Module 7: Missing Data
What is a Missing Value?
If in any row or column in a dataframe, the value is not available, it is said as the missing value.
So defining missing data: Missing data (or missing values) is defined as the data value that are not stored in a column or row.
Consider this small dataset which has some missing values in it (shown in the red box).
Well this is a small dataset so we can easily observe the missing values here. But real data are very large in size and you cannot easily see the missing values in the dataframe.
Pandas provides isnull(), isna() functions to detect missing values. Both of them do the same thing.
df.isna() or df.isnull() returns the dataframe with boolean values indicating missing values.

We can get column wise count of all the missing values using the aggregation function sum()
Both the columns 'Names' and 'Marks%' have one missing values in each. 'Regd. No' has no missing values so the value is 0.
Pandas also provides fillna() method to fill the missing values. fillna() provides many different strategy to fill the missing values.
Let's say we want to fill the missing values in 'Names' column with 'unknown'.
But the changes are not made inplace.
The dataframe still contains missing values in the 'Names' column.
We can pass 'inplace' parameter as True in fillna() method. It will make the changes in the original dataframe.

There is still a missing value in 'Marks%' column. Let's say we want to fill the missing value in this column with the mean of the marks scored by other people.
Now, our dataframe has no missing values.
Last updated
Was this helpful?