描述性统计

Pandas 提供了丰富的统计方法用于数据分析。

基本统计

python

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50],
    'C': [5, 4, 3, 2, 1]
})

# 求和
print(df.sum())

# 平均值
print(df.mean())

# 中位数
print(df.median())

# 标准差
print(df.std())

# 方差
print(df.var())

# 最小值
print(df.min())

# 最大值
print(df.max())

综合统计

python

# describe 方法
print(df.describe())

# 包含所有列（包括非数值列）
print(df.describe(include='all'))

# 只包含特定类型
print(df.describe(include=['object']))

分位数

python

# 四分位数
print(df.quantile([0.25, 0.5, 0.75]))

# 自定义分位数
print(df.quantile([0.1, 0.5, 0.9]))

累计统计

python

# 累计和
print(df.cumsum())

# 累计积
print(df.cumprod())

# 累计最小值
print(df.cummin())

# 累计最大值
print(df.cummax())

其他统计方法

python

df = pd.DataFrame({
    'A': [1, 2, np.nan, 4, 5],
    'B': [10, np.nan, 30, 40, 50]
})

# 计数（非空值）
print(df.count())

# 非空值数量
print(df.notna().sum())

# 空值数量
print(df.isna().sum())

# 唯一值数量
print(df['A'].nunique())

# 众数
print(df['A'].mode())

轴向统计

python

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

# 按行求和
print(df.sum(axis=1))

# 按行求平均
print(df.mean(axis=1))

描述性统计 ​

基本统计 ​

综合统计 ​

分位数 ​

累计统计 ​

其他统计方法 ​

轴向统计 ​

相关性分析 ​

描述性统计

基本统计

综合统计

分位数

累计统计

其他统计方法

轴向统计

相关性分析