Web Development
📊

Streamlit

An open-source Python library that transforms data scripts into shareable web applications in minutes, with no front-end experience required.

Beginner UI Data Visualization Python Dashboard Deployment

Alternative To

  • • Gradio
  • • Dash
  • • Flask
  • • Shiny

Difficulty Level

Beginner

Suitable for users with basic technical knowledge. Easy to set up and use.

Overview

Streamlit is an open-source Python framework that enables data scientists and machine learning engineers to create interactive web applications directly from Python scripts. With its simple, declarative approach to UI development, Streamlit eliminates the need for front-end expertise, allowing developers to transform data scripts into fully functional web applications in minutes rather than days or weeks.

The framework is designed with a “script runs from top to bottom” philosophy, making the development process intuitive for Python users. As you write and save your code, Streamlit automatically updates the web application, creating a rapid development cycle that accelerates prototyping and deployment. This approach has made Streamlit particularly popular among data professionals who want to quickly share insights, build dashboards, or create interactive demonstrations of machine learning models without getting bogged down in web development complexities.

Key Features

FeatureDescription
Pure Python DevelopmentBuild complete web applications using only Python, without HTML, CSS, or JavaScript
Live ReloadingChanges to your script are automatically reflected in the app when you save the file
Caching SystemBuilt-in caching mechanism to optimize performance for data-heavy applications
Rich Widget LibraryExtensive collection of UI components including sliders, buttons, selectboxes, and more
Data VisualizationNative support for various plotting libraries (Matplotlib, Plotly, Altair, etc.)
Layout OptionsFlexible layout capabilities with columns, containers, expanders, and sidebars
File Uploading/DownloadingEasy handling of file uploads and downloads
Session StatePersistent state management across reruns
ThemingCustomizable themes and appearance
Component EcosystemExtensible with custom components from the community
Cloud DeploymentFree hosting for public apps via Streamlit Community Cloud
AuthenticationUser authentication capabilities for secure applications
Multipage AppsSupport for building applications with multiple pages

Technical Details

Architecture

Streamlit follows a unique architecture that differs from traditional web frameworks:

  1. Script Execution Model:

    • The entire Python script is re-executed from top to bottom on each interaction
    • This creates a reactive programming model without explicit event handlers
    • Changes to widgets trigger script reruns, updating the UI automatically
  2. Server Components:

    • Tornado web server handles HTTP requests
    • WebSocket connections maintain real-time communication between browser and server
    • Server-side caching system optimizes performance for expensive computations
  3. Frontend:

    • React-based frontend renders the UI components
    • Components are automatically generated from Python function calls
    • Bidirectional data flow between Python backend and JavaScript frontend

Version Information

The current stable version of Streamlit is 1.42.0 (as of March 2025), which requires Python 3.8 or higher. Streamlit has evolved significantly since its initial public release:

VersionRelease DateMajor Features
1.42.0February 2025Authentication, dataframe improvements
1.30.0August 2024Multipage apps, improved performance
1.20.0March 2024Enhanced theming, better mobile support
1.10.0September 2023Session state improvements, new widgets
1.0.0July 2022First stable release with production-ready features
0.1.0October 2019Initial public release

Component System

Streamlit’s component system is designed for simplicity and expressiveness:

  • Core Widgets: Text inputs, buttons, sliders, checkboxes, radio buttons, selectboxes
  • Media Components: Image, video, audio display and capture
  • Data Display: Dataframes, tables, metrics, JSON viewers
  • Layout Elements: Columns, containers, expanders, tabs, sidebars
  • Status Elements: Progress bars, spinners, balloons, success/error/info messages
  • Chart Components: Native support for various plotting libraries

Each component is invoked as a simple Python function call, with the state automatically managed by Streamlit’s reactive execution model.

Why Use Streamlit

Compared to Gradio

  • Broader Focus: While Gradio excels at ML model demos, Streamlit is designed for general data applications
  • More Layout Control: Offers more sophisticated layout options for complex dashboards
  • Richer Ecosystem: Larger community with more components and extensions
  • Data-Centric: Better optimized for data exploration and visualization workflows

Compared to Dash

  • Simpler Learning Curve: No need to understand callback patterns or React concepts
  • Faster Development: Significantly less code required for basic applications
  • Automatic Reactivity: No need to manually define callbacks for interactivity
  • Easier Deployment: Streamlit Community Cloud offers free, one-click deployment

Compared to Flask

  • No Web Development Knowledge Required: No HTML, CSS, or JavaScript needed
  • Built-in Components: Pre-built UI components for common data science tasks
  • Automatic State Management: No need to manually handle HTTP requests or sessions
  • Live Reloading: Changes are immediately reflected without manual server restarts

Compared to Shiny (R)

  • Python Ecosystem: Leverages the entire Python data science stack
  • Simpler Programming Model: More straightforward than Shiny’s reactive programming
  • Less Boilerplate: Requires less code for similar functionality
  • Better Performance: Generally faster for complex applications

System Requirements

Minimum Requirements

  • Python: 3.8 or higher
  • CPU: 2+ cores
  • RAM: 4GB+
  • Storage: 1GB+
  • Operating System: Windows, macOS, or Linux
  • Browser: Chrome, Firefox, Edge, or Safari (latest versions)
  • Python: 3.10 or higher
  • CPU: 4+ cores
  • RAM: 8GB+
  • Storage: 5GB+
  • Operating System: Windows 10/11, macOS 12+, or Ubuntu 20.04+
  • Browser: Chrome or Firefox (latest versions)

Dependencies

Streamlit has several key dependencies:

  • NumPy and Pandas (data manipulation)
  • Pillow (image processing)
  • Tornado (web server)
  • Protobuf (data serialization)
  • Watchdog (file system monitoring)
  • Toml (configuration)
  • Python-dateutil (date utilities)
  • Typing-extensions (type hints)

Installation Guide

Basic Installation

The simplest way to install Streamlit is via pip:

pip install streamlit

To verify the installation and see a demo app:

streamlit hello

This will open a browser window with Streamlit’s demo application.

Installation in a Virtual Environment

For a more isolated environment, use a virtual environment:

# Create a virtual environment
python -m venv streamlit-env

# Activate the environment (Windows)
streamlit-env\Scripts\activate

# Activate the environment (macOS/Linux)
source streamlit-env/bin/activate

# Install Streamlit
pip install streamlit

Installation with Conda

If you’re using Anaconda or Miniconda:

# Create a new conda environment
conda create -n streamlit-env python=3.10

# Activate the environment
conda activate streamlit-env

# Install Streamlit
pip install streamlit

Development Installation

For contributing to Streamlit or using the latest development version:

# Clone the repository
git clone https://github.com/streamlit/streamlit.git
cd streamlit

# Install in development mode
pip install -e ".[development]"

Practical Exercise: Creating a Data Dashboard

Let’s build a simple but functional data dashboard using Streamlit. This example will demonstrate how to:

  1. Load and display data
  2. Add interactive filters
  3. Create visualizations
  4. Organize the layout
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt

# Set page configuration
st.set_page_config(
    page_title="Sales Dashboard",
    page_icon="📊",
    layout="wide",
    initial_sidebar_state="expanded"
)

# Add a title and description
st.title("📊 Sales Dashboard")
st.markdown("An interactive dashboard to analyze sales data across regions and product categories.")

# Create some sample data
@st.cache_data
def load_data():
    # Generate random sales data
    np.random.seed(42)
    dates = pd.date_range(start="2024-01-01", end="2024-12-31", freq="D")
    regions = ["North", "South", "East", "West"]
    categories = ["Electronics", "Clothing", "Food", "Home Goods"]

    data = []
    for date in dates:
        for region in regions:
            for category in categories:
                sales = np.random.randint(100, 5000)
                profit = sales * np.random.uniform(0.1, 0.4)
                data.append({
                    "Date": date,
                    "Region": region,
                    "Category": category,
                    "Sales": sales,
                    "Profit": profit
                })

    return pd.DataFrame(data)

# Load the data
df = load_data()

# Add sidebar filters
st.sidebar.header("Filters")

# Date range filter
date_range = st.sidebar.date_input(
    "Select Date Range",
    value=[df["Date"].min().date(), df["Date"].max().date()],
    min_value=df["Date"].min().date(),
    max_value=df["Date"].max().date()
)

# Region filter
selected_regions = st.sidebar.multiselect(
    "Select Regions",
    options=df["Region"].unique(),
    default=df["Region"].unique()
)

# Category filter
selected_categories = st.sidebar.multiselect(
    "Select Categories",
    options=df["Category"].unique(),
    default=df["Category"].unique()
)

# Apply filters
filtered_df = df[
    (df["Date"].dt.date >= date_range[0]) &
    (df["Date"].dt.date <= date_range[1]) &
    (df["Region"].isin(selected_regions)) &
    (df["Category"].isin(selected_categories))
]

# Create dashboard layout
col1, col2 = st.columns(2)

# Key metrics
with st.container():
    st.subheader("Key Metrics")
    metric1, metric2, metric3, metric4 = st.columns(4)

    with metric1:
        total_sales = filtered_df["Sales"].sum()
        st.metric("Total Sales", f"${total_sales:,.2f}")

    with metric2:
        total_profit = filtered_df["Profit"].sum()
        st.metric("Total Profit", f"${total_profit:,.2f}")

    with metric3:
        profit_margin = (total_profit / total_sales) * 100
        st.metric("Profit Margin", f"{profit_margin:.2f}%")

    with metric4:
        avg_daily_sales = filtered_df.groupby("Date")["Sales"].sum().mean()
        st.metric("Avg. Daily Sales", f"${avg_daily_sales:,.2f}")

# Sales by region chart
with col1:
    st.subheader("Sales by Region")
    region_sales = filtered_df.groupby("Region")["Sales"].sum().reset_index()
    fig_region = px.bar(
        region_sales,
        x="Region",
        y="Sales",
        color="Region",
        title="Total Sales by Region"
    )
    st.plotly_chart(fig_region, use_container_width=True)

# Sales by category chart
with col2:
    st.subheader("Sales by Category")
    category_sales = filtered_df.groupby("Category")["Sales"].sum().reset_index()
    fig_category = px.pie(
        category_sales,
        values="Sales",
        names="Category",
        title="Sales Distribution by Category"
    )
    st.plotly_chart(fig_category, use_container_width=True)

# Sales trend over time
st.subheader("Sales Trend Over Time")
time_series = filtered_df.groupby("Date")[["Sales", "Profit"]].sum().reset_index()
fig_time = px.line(
    time_series,
    x="Date",
    y=["Sales", "Profit"],
    title="Sales and Profit Trends",
    labels={"value": "Amount ($)", "variable": "Metric"}
)
st.plotly_chart(fig_time, use_container_width=True)

# Data table
with st.expander("View Detailed Data"):
    st.dataframe(filtered_df, use_container_width=True)

# Add download button
csv = filtered_df.to_csv(index=False).encode('utf-8')
st.download_button(
    label="Download Data as CSV",
    data=csv,
    file_name="sales_data.csv",
    mime="text/csv"
)

# Footer
st.markdown("---")
st.caption("Dashboard created with Streamlit • Data refreshes daily")

To run this dashboard, save the code to a file (e.g., sales_dashboard.py) and execute:

streamlit run sales_dashboard.py

Advanced Example: Machine Learning Model Explorer

Here’s a more advanced example that demonstrates how to create an interactive machine learning model explorer:

import streamlit as st
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris, load_wine, load_breast_cancer, load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import accuracy_score, r2_score, confusion_matrix
import plotly.express as px
import plotly.figure_factory as ff
import matplotlib.pyplot as plt
import seaborn as sns

# Page configuration
st.set_page_config(
    page_title="ML Model Explorer",
    page_icon="🤖",
    layout="wide"
)

# Title and description
st.title("🤖 Machine Learning Model Explorer")
st.markdown("""
This app allows you to explore different datasets and train a Random Forest model with customizable parameters.
Adjust the settings in the sidebar and see how the model performance changes.
""")

# Sidebar for dataset selection and model parameters
st.sidebar.header("Settings")

# Dataset selection
dataset_name = st.sidebar.selectbox(
    "Select Dataset",
    ("Iris", "Wine", "Breast Cancer", "Diabetes (Regression)")
)

# Load the selected dataset
@st.cache_data
def get_dataset(name):
    if name == "Iris":
        data = load_iris()
        task = "classification"
    elif name == "Wine":
        data = load_wine()
        task = "classification"
    elif name == "Breast Cancer":
        data = load_breast_cancer()
        task = "classification"
    else:  # Diabetes
        data = load_diabetes()
        task = "regression"

    X = pd.DataFrame(data.data, columns=data.feature_names)
    y = pd.Series(data.target, name="target")

    return X, y, task, data.target_names if task == "classification" else None

X, y, task, target_names = get_dataset(dataset_name)

# Display dataset info
st.sidebar.subheader("Dataset Information")
st.sidebar.write(f"Shape: {X.shape}")
st.sidebar.write(f"Task: {task.capitalize()}")

# Model parameters
st.sidebar.subheader("Model Parameters")

n_estimators = st.sidebar.slider("Number of trees", 10, 500, 100, 10)
max_depth = st.sidebar.slider("Maximum depth", 1, 30, 10)
min_samples_split = st.sidebar.slider("Minimum samples to split", 2, 20, 2)
test_size = st.sidebar.slider("Test size", 0.1, 0.5, 0.2, 0.05)
random_state = st.sidebar.slider("Random state", 0, 100, 42)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=test_size, random_state=random_state
)

# Train model based on task type
@st.cache_data
def train_model(X_train, y_train, task, params):
    if task == "classification":
        model = RandomForestClassifier(**params)
    else:
        model = RandomForestRegressor(**params)

    model.fit(X_train, y_train)
    return model

model_params = {
    "n_estimators": n_estimators,
    "max_depth": max_depth,
    "min_samples_split": min_samples_split,
    "random_state": random_state
}

model = train_model(X_train, y_train, task, model_params)

# Make predictions
y_pred = model.predict(X_test)

# Calculate performance metrics
if task == "classification":
    accuracy = accuracy_score(y_test, y_pred)
    cm = confusion_matrix(y_test, y_pred)
    performance_metric = accuracy
    metric_name = "Accuracy"
else:
    r2 = r2_score(y_test, y_pred)
    performance_metric = r2
    metric_name = "R² Score"

# Main content
col1, col2 = st.columns([2, 1])

# Model performance
with col2:
    st.subheader("Model Performance")
    st.metric(metric_name, f"{performance_metric:.4f}")

    # Feature importance
    st.subheader("Feature Importance")
    feature_importance = pd.DataFrame({
        'Feature': X.columns,
        'Importance': model.feature_importances_
    }).sort_values('Importance', ascending=False)

    fig_importance = px.bar(
        feature_importance.head(10),
        x='Importance',
        y='Feature',
        orientation='h',
        title="Top 10 Features by Importance"
    )
    st.plotly_chart(fig_importance, use_container_width=True)

# Data visualization
with col1:
    st.subheader("Data Visualization")

    # Select features for visualization
    if X.shape[1] > 1:
        viz_features = st.multiselect(
            "Select features for visualization",
            options=X.columns.tolist(),
            default=X.columns.tolist()[:2]
        )

        if len(viz_features) >= 2:
            # Scatter plot for classification, or scatter with color as target for regression
            if task == "classification":
                fig = px.scatter(
                    pd.concat([X, y], axis=1),
                    x=viz_features[0],
                    y=viz_features[1],
                    color=y.name,
                    color_continuous_scale=px.colors.qualitative.G10 if task == "classification" else "Viridis",
                    title=f"{viz_features[0]} vs {viz_features[1]} by Class"
                )
            else:
                fig = px.scatter(
                    pd.concat([X, y], axis=1),
                    x=viz_features[0],
                    y=viz_features[1],
                    color=y.name,
                    color_continuous_scale="Viridis",
                    title=f"{viz_features[0]} vs {viz_features[1]} by Target Value"
                )

            st.plotly_chart(fig, use_container_width=True)

    # Confusion matrix for classification
    if task == "classification":
        st.subheader("Confusion Matrix")

        # Create confusion matrix heatmap
        fig_cm = ff.create_annotated_heatmap(
            z=cm,
            x=[f"Predicted {i}" for i in range(len(target_names))],
            y=[f"Actual {i}" for i in range(len(target_names))],
            colorscale="Blues"
        )
        fig_cm.update_layout(title="Confusion Matrix")
        st.plotly_chart(fig_cm, use_container_width=True)

    # Prediction vs Actual for regression
    if task == "regression":
        st.subheader("Prediction vs Actual")

        pred_vs_actual = pd.DataFrame({
            'Actual': y_test,
            'Predicted': y_pred
        })

        fig_reg = px.scatter(
            pred_vs_actual,
            x='Actual',
            y='Predicted',
            title="Predicted vs Actual Values"
        )

        # Add perfect prediction line
        min_val = min(pred_vs_actual['Actual'].min(), pred_vs_actual['Predicted'].min())
        max_val = max(pred_vs_actual['Actual'].max(), pred_vs_actual['Predicted'].max())
        fig_reg.add_shape(
            type="line",
            x0=min_val,
            y0=min_val,
            x1=max_val,
            y1=max_val,
            line=dict(color="red", dash="dash")
        )

        st.plotly_chart(fig_reg, use_container_width=True)

# Raw data
with st.expander("View Raw Data"):
    st.dataframe(pd.concat([X, y], axis=1), use_container_width=True)

# Footer
st.markdown("---")
st.caption("Machine Learning Model Explorer • Built with Streamlit")

To run this ML explorer, save the code to a file (e.g., ml_explorer.py) and execute:

streamlit run ml_explorer.py

Resources