Inspecting Data
How to inspect data in Pandas
Once you've loaded data into a Pandas DataFrame, it's important to inspect it before performing any analysis. Pandas provides several methods to view, summarize, and check data quality.
Viewing data
1. Viewing the First Few Rows (head()
)
Used to preview the first N rows of the dataset (default is 5).
import pandas as pd
data = {
"Name": ["Jasmeet", "Rob", "Charlie", "David", "Emma"],
"Age": [25, 30, 35, 40, 29],
"City": ["New York", "Los Angeles", "Chicago", "Houston", "Seattle"]
}
df = pd.DataFrame(data)
# Display first 3 rows
print(df.head(3))
Output:
Name Age City
0 Jasmeet 25 New York
1 Rob 30 Los Angeles
2 Charlie 35 Chicago
- Use
df.head(10)
to display the first 10 rows.
2. Viewing the Last Few Rows (tail()
)
Similar to head()
, but shows the last N rows.
print(df.tail(2))
Output:
Name Age City
3 David 40 Houston
4 Emma 29 Seattle
Checking DataFrame Information
Use info()
to get an overview of the number of rows, columns, data types, and missing values.
print(df.info())
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 5 non-null object
1 Age 5 non-null int64
2 City 5 non-null object
dtypes: int64(1), object(2)
memory usage: 248.0 bytes
- Helps detect missing values and data types.
Checking Shape
shape
returns the number of rows and columns as a tuple (rows, columns)
.
print(df.shape)
Output:
(5, 3) # 5 rows, 3 columns
Checking Column Names
columns
lists all column names in the DataFrame.
print(df.columns)
Output:
Index(['Name', 'Age', 'City'], dtype='object')
Checking Summary Statistics
describe()
provides statistical insights (count, mean, std, min, max, etc.) for numerical columns.
print(df.describe())
Output:
Age
count 5.000000
mean 31.800000
std 6.797075
min 25.000000
25% 29.000000
50% 30.000000
75% 35.000000
max 40.000000
- Works only for numerical columns by default.
- For categorical data, use
df.describe(include="object")
.
Checking Data Types
dtypes
shows the data type of each column.
print(df.dtypes)
Output:
Name object
Age int64
City object
dtype: object
Checking for Missing Values
isnull().sum()
counts the number of missing values in each column.
print(df.isnull().sum())
Output:
Name 0
Age 0
City 0
dtype: int64
- If there are missing values, you can handle them using
fillna()
ordropna()
.
Viewing Unique Values
Use unique()
to check unique values in a specific column.
print(df["City"].unique())
Output:
['New York' 'Los Angeles' 'Chicago' 'Houston' 'Seattle']
Checking Value Counts
value_counts()
counts occurrences of each unique value in a column.
print(df["City"].value_counts())
Output:
New York 1
Los Angeles 1
Chicago 1
Houston 1
Seattle 1
Name: City, dtype: int64
Key Inspection Methods
Method | Purpose |
---|---|
df.head(n) | Show first n rows (default 5). |
df.tail(n) | Show last n rows. |
df.info() | Summary of data types & missing values. |
df.shape | Get (rows, columns) count. |
df.columns | List all column names. |
df.describe() | Get statistics for numerical columns. |
df.dtypes | Check data types of columns. |
df.isnull().sum() | Count missing values. |
df["col"].unique() | Get unique values of a column. |
df["col"].value_counts() | Count occurrences of each value. |