Inspecting Data

How to inspect data in Pandas

Once you've loaded data into a Pandas DataFrame, it's important to inspect it before performing any analysis. Pandas provides several methods to view, summarize, and check data quality.

Viewing data

1. Viewing the First Few Rows (head()) Used to preview the first N rows of the dataset (default is 5).

import pandas as pd

data = {
    "Name": ["Jasmeet", "Rob", "Charlie", "David", "Emma"],
    "Age": [25, 30, 35, 40, 29],
    "City": ["New York", "Los Angeles", "Chicago", "Houston", "Seattle"]
}

df = pd.DataFrame(data)

# Display first 3 rows
print(df.head(3))

Output:

   Name      Age   City
0  Jasmeet   25    New York
1  Rob       30    Los Angeles
2  Charlie   35    Chicago
  • Use df.head(10) to display the first 10 rows.

2. Viewing the Last Few Rows (tail()) Similar to head(), but shows the last N rows.

print(df.tail(2))

Output:

    Name    Age     City
3   David   40      Houston
4   Emma    29      Seattle

Checking DataFrame Information

Use info() to get an overview of the number of rows, columns, data types, and missing values.

print(df.info())

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   City    5 non-null      object
dtypes: int64(1), object(2)
memory usage: 248.0 bytes
  • Helps detect missing values and data types.

Checking Shape

shape returns the number of rows and columns as a tuple (rows, columns).

print(df.shape)

Output:

(5, 3)  # 5 rows, 3 columns

Checking Column Names

columns lists all column names in the DataFrame.

print(df.columns)

Output:

Index(['Name', 'Age', 'City'], dtype='object')

Checking Summary Statistics

describe() provides statistical insights (count, mean, std, min, max, etc.) for numerical columns.

print(df.describe())

Output:

             Age
count   5.000000
mean   31.800000
std     6.797075
min    25.000000
25%    29.000000
50%    30.000000
75%    35.000000
max    40.000000
  • Works only for numerical columns by default.
  • For categorical data, use df.describe(include="object").

Checking Data Types

dtypes shows the data type of each column.

print(df.dtypes)

Output:

Name     object
Age      int64
City     object
dtype:   object

Checking for Missing Values

isnull().sum() counts the number of missing values in each column.

print(df.isnull().sum())

Output:

Name    0
Age     0
City    0
dtype: int64
  • If there are missing values, you can handle them using fillna() or dropna().

Viewing Unique Values

Use unique() to check unique values in a specific column.

print(df["City"].unique())

Output:

['New York' 'Los Angeles' 'Chicago' 'Houston' 'Seattle']

Checking Value Counts

value_counts() counts occurrences of each unique value in a column.

print(df["City"].value_counts())

Output:

New York       1
Los Angeles    1
Chicago        1
Houston        1
Seattle        1
Name: City, dtype: int64

Key Inspection Methods

MethodPurpose
df.head(n)Show first n rows (default 5).
df.tail(n)Show last n rows.
df.info()Summary of data types & missing values.
df.shapeGet (rows, columns) count.
df.columnsList all column names.
df.describe()Get statistics for numerical columns.
df.dtypesCheck data types of columns.
df.isnull().sum()Count missing values.
df["col"].unique()Get unique values of a column.
df["col"].value_counts()Count occurrences of each value.
No questions available.