Loading data

Loading data into Data Frames

Pandas provides several methods to load data from different file formats into a DataFrame for analysis. Below are common ways to load data into Pandas.

Loading Data from a CSV File

CSV (Comma-Separated Values) files are the most commonly used format for storing tabular data.

Reading a CSV File

import pandas as pd

df = pd.read_csv("data.csv")  # Load CSV file into DataFrame
print(df.head())  # Display the first 5 rows

Common Parameters:

  • sep=";": Use if the file uses a different delimiter (e.g., semicolon instead of a comma).
  • header=None: Use if the file has no column names.
  • names=['A', 'B', 'C']: Assign custom column names.
  • index_col=0: Use the first column as the index.
  • usecols=['Name', 'Age']: Load only selected columns.
  • dtype={"Age": int}: Specify data types.
# Example : Reading specific columns
df = pd.read_csv("data.csv", usecols=['Name', 'Age'])
print(df)

Output

    Name    Age
0   Jasmeet   25
1   Chris     30
2   Charlie 35
3   David   40

Loading Data from an Excel File

Reading an Excel File

df = pd.read_excel("data.xlsx", sheet_name="Sheet1")
print(df.head())

Common Parameters:

  • sheet_name="Sheet1": Read a specific sheet.
  • usecols="A:C": Read only selected columns.

Loading Data from a JSON File

JSON (JavaScript Object Notation) is widely used for storing structured data.

Reading a JSON File

df = pd.read_json("data.json")
print(df.head())

Common Parameters:

  • orient="records": Use if JSON is formatted as a list of dictionaries.

Loading Data from a SQL Database

Reading from a SQL Table

import sqlite3

conn = sqlite3.connect("database.db")  # Connect to SQLite database
df = pd.read_sql("SELECT * FROM customers", conn)
print(df.head())

Common Parameters:

  • index_col="id": Use a specific column as the index.

Loading Data from a Web URL

Reading CSV from a Web URL

url = "https://example.com/data.csv"
df = pd.read_csv(url)
print(df.head())

Loading Data from a Python Dictionary

Reading from a Dictionary

data = {
    "Name": ["Jasmeet", "Chris", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}

df = pd.DataFrame(data)
print(df)

Summary

File FormatMethod
CSVpd.read_csv("file.csv")
Excelpd.read_excel("file.xlsx")
JSONpd.read_json("file.json")
SQLpd.read_sql("SQL Query", connection)
Web URLpd.read_csv("http://example.com/data.csv")
Dictionarypd.DataFrame(data_dict)
No questions available.