Practical Data Visualization with Seaborn: Analyzing Superstore Sales Data

In our previous post, we explored the fundamentals of seaborn. Now, let’s put those concepts into practice with real-world data. We’ll use the Tableau Superstore dataset—a rich sales dataset containing 9,994 transaction records with information about customers, products, sales, and profits across different regions of the United States. This dataset provides an excellent playground for demonstrating how seaborn can transform raw business data into actionable insights.

Setting the Stage: Understanding Our Data

The Superstore dataset represents a retail company’s sales transactions from 2015 to 2018. Each row captures details about an order including the customer segment (Consumer, Corporate, or Home Office), product category, geographic information, and financial metrics. With total sales exceeding $2.2 million and an overall profit margin of 12.47%, this dataset offers numerous opportunities to uncover business patterns and performance drivers.

Before diving into visualizations, A) Update the XLS to XLSX – newer pandas doesn’t support the old Excel and B) let’s set up our environment and load the data:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the Superstore dataset
df = pd.read_excel('superstore.xlsx')

# Set seaborn's aesthetic style
sns.set_theme(style="whitegrid")

# Quick data exploration
print(f"Dataset shape: {df.shape}")
print(f"Total orders: {df['Order ID'].nunique():,}")
print(f"Date range: {df['Order Date'].min()} to {df['Order Date'].max()}")

Revealing Category Performance Across Regions

One of the first questions any retail analyst might ask is how different product categories perform across various regions. Seaborn makes this multi-dimensional analysis straightforward with grouped visualizations:

# Prepare aggregated data
category_region = df.groupby(['Category', 'Region'])['Sales'].sum().reset_index()

# Create grouped bar plot
plt.figure(figsize=(10, 6))
sns.barplot(data=category_region, x='Category', y='Sales', hue='Region', palette='Set2')
plt.title('Sales Performance by Category and Region', fontsize=14, fontweight='bold')
plt.xlabel('Product Category')
plt.ylabel('Total Sales ($)')
plt.legend(title='Region', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.tight_layout()
plt.show()

This visualization immediately reveals that Technology products generate the highest sales across all regions, with the West region showing particularly strong performance. Office Supplies, while having lower total sales, maintains consistent performance across regions—suggesting a stable, predictable revenue stream.

Understanding Profit Distribution with Violin Plots

While sales figures grab attention, profitability determines business sustainability. Violin plots excel at showing both the distribution and density of profit across customer segments:

plt.figure(figsize=(10, 6))
sns.violinplot(data=df, x='Segment', y='Profit', inner='box')
plt.title('Profit Distribution by Customer Segment', fontsize=14, fontweight='bold')
plt.xlabel('Customer Segment')
plt.ylabel('Profit ($)')
plt.axhline(y=0, color='red', linestyle='--', alpha=0.5, label='Break-even')
plt.legend()
plt.show()

The violin plot reveals a crucial insight: all three customer segments have transactions that result in losses (profits below zero), but the Consumer segment shows the widest distribution of both profits and losses. The embedded box plot within each violin provides quartile information, showing that despite some loss-making transactions, the median profit for all segments remains positive.

Tracking Temporal Patterns with Line Plots

Understanding how sales evolve over time is crucial for identifying trends, seasonal patterns, and growth opportunities. Seaborn’s line plots, combined with data aggregation, make time series analysis intuitive:

# Prepare monthly aggregated data
df['Year-Month'] = df['Order Date'].dt.to_period('M')
monthly_sales = df.groupby('Year-Month')['Sales'].sum().reset_index()
monthly_sales['Year-Month'] = monthly_sales['Year-Month'].dt.to_timestamp()

# Create trend visualization
plt.figure(figsize=(12, 5))
sns.lineplot(data=monthly_sales, x='Year-Month', y='Sales', marker='o')
plt.title('Monthly Sales Trend', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Total Sales ($)')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

The resulting visualization shows clear growth over the four-year period, with notable spikes typically occurring in September and November—likely corresponding to back-to-school and holiday shopping seasons. This pattern suggests opportunities for inventory planning and targeted marketing campaigns.

Uncovering Seasonal Patterns

To better understand seasonality, we can aggregate sales by month across all years:

# Extract month and calculate average sales
df['Month'] = df['Order Date'].dt.month
monthly_avg = df.groupby('Month')['Sales'].mean().reset_index()

# Visualize seasonal pattern
plt.figure(figsize=(10, 5))
sns.barplot(data=monthly_avg, x='Month', y='Sales', palette='coolwarm')
plt.title('Average Sales by Month (Seasonal Pattern)', fontsize=14, fontweight='bold')
plt.xlabel('Month')
plt.ylabel('Average Sales ($)')
plt.show()

This analysis confirms strong performance in September and November, with March also showing above-average sales. January and February show the lowest average sales, suggesting potential for promotional campaigns during these slower periods.

The Discount-Profit Paradox

One of the most important business questions involves the relationship between discounts and profitability. Let’s investigate how discount levels impact profit:

# Categorize discounts
df['Discount_Category'] = pd.cut(df['Discount'], 
                                 bins=[-0.01, 0, 0.2, 0.5], 
                                 labels=['No Discount', 'Low (0-20%)', 'High (20-50%)'])

# Visualize discount impact
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x='Discount_Category', y='Profit')
plt.title('Impact of Discount on Profit', fontsize=14, fontweight='bold')
plt.xlabel('Discount Category')
plt.ylabel('Profit ($)')
plt.axhline(y=0, color='red', linestyle='--', alpha=0.5, label='Break-even')
plt.legend()
plt.show()

The visualization reveals a striking pattern: higher discounts correlate with lower profits and increased variability. Products sold without discounts show the highest median profit and least variance, while high discounts (20-50%) frequently result in losses. This suggests the company might benefit from reevaluating its discounting strategy.

Correlation Analysis with Heatmaps

Understanding relationships between numerical variables helps identify business drivers. Seaborn’s heatmap provides an intuitive correlation matrix:

# Select numerical columns
numeric_cols = ['Sales', 'Quantity', 'Discount', 'Profit']
correlation = df[numeric_cols].corr()

# Create heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation, annot=True, cmap='RdBu_r', center=0, 
            square=True, fmt='.2f', linewidths=1)
plt.title('Correlation Between Key Metrics', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

The heatmap reveals several insights: Sales and Profit show strong positive correlation (0.48), confirming that revenue growth generally drives profitability. However, the negative correlation between Discount and Profit (-0.22) reinforces our earlier finding about the dangers of excessive discounting. Interestingly, Quantity shows weak correlation with Profit, suggesting that volume alone doesn’t guarantee profitability.

Identifying Top Performers

For strategic decision-making, identifying the most profitable product sub-categories helps focus resources:

# Find top 10 most profitable sub-categories
top_subcats = df.groupby('Sub-Category')['Profit'].sum().nlargest(10)

# Visualize top performers
plt.figure(figsize=(10, 6))
sns.barplot(x=top_subcats.values, y=top_subcats.index, palette='viridis')
plt.title('Top 10 Most Profitable Sub-Categories', fontsize=14, fontweight='bold')
plt.xlabel('Total Profit ($)')
plt.ylabel('Sub-Category')
plt.tight_layout()
plt.show()

Copiers emerge as the profit leader by a significant margin, followed by Phones and Accessories. This concentration of profits in a few sub-categories suggests opportunities for focused marketing and potential risks if these categories face competition or market changes.

Regional Profit Margin Analysis

Finally, let’s examine profit margins across regions to understand geographic performance:

# Calculate regional metrics
region_performance = df.groupby('Region').agg({
    'Sales': 'sum',
    'Profit': 'sum'
}).reset_index()
region_performance['Profit Margin'] = (region_performance['Profit'] / 
                                       region_performance['Sales']) * 100

# Visualize profit margins
plt.figure(figsize=(8, 5))
bars = plt.bar(region_performance['Region'], 
               region_performance['Profit Margin'], 
               color='lightcoral')
plt.axhline(y=region_performance['Profit Margin'].mean(), 
            color='blue', linestyle='--', 
            label=f"Average: {region_performance['Profit Margin'].mean():.1f}%")
plt.title('Profit Margin by Region', fontsize=14, fontweight='bold')
plt.xlabel('Region')
plt.ylabel('Profit Margin (%)')
plt.legend()

# Add value labels on bars
for bar, value in zip(bars, region_performance['Profit Margin']):
    plt.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.1, 
             f'{value:.1f}%', ha='center', va='bottom')
plt.show()

All regions maintain healthy profit margins above 11%, with the East region slightly outperforming others. The relatively consistent margins across regions suggest operational efficiency is well-distributed geographically.

Key Takeaways and Business Insights

Through these seaborn visualizations, we’ve uncovered several actionable insights from the Superstore data:

Strategic Insights:

  • Technology products drive the highest sales, but profit concentration in Copiers and Phones creates both opportunity and risk
  • The West region shows the strongest sales performance, while the East region achieves the highest profit margins
  • Seasonal patterns are pronounced, with September and November showing peak performance

Operational Recommendations:

  • The negative correlation between discounts and profitability suggests a need to reevaluate pricing strategies
  • All customer segments contribute profitably, but the Consumer segment shows the highest variability
  • The consistency of profit margins across regions indicates well-managed operations

Technical Achievements:

  • Seaborn enabled us to create publication-quality visualizations with minimal code
  • Statistical plots like violin plots and correlation heatmaps revealed patterns that simple bar charts might miss
  • The combination of different plot types provided a comprehensive view of business performance

Moving Forward with Seaborn

This practical exploration demonstrates seaborn’s power in real-world business analytics. By combining different visualization types—from distribution plots to time series to correlation matrices—we’ve built a comprehensive understanding of the Superstore’s business dynamics. The library’s statistical focus and elegant defaults allowed us to focus on insights rather than formatting details.

As you apply seaborn to your own datasets, remember that the choice of visualization should align with your analytical goals. Use distribution plots to understand variability, time series for trends, correlation matrices for relationships, and categorical plots for comparisons. The key is letting the data guide your visualization choices while leveraging seaborn’s statistical capabilities to uncover deeper insights.

The complete code for these visualizations is available for download, allowing you to experiment with different parameters and adapt these techniques to your own data. Whether you’re analyzing sales data, customer behavior, or operational metrics, seaborn provides the tools to transform numbers into narratives that drive business decisions.

Posted in

Leave a Reply

Discover more from Adman Analytics

Subscribe now to keep reading and get access to the full archive.

Continue reading