Web Development

📊

Streamlit

An open-source Python library that transforms data scripts into shareable web applications in minutes, with no front-end experience required.

Beginner UI Data Visualization Python Dashboard Deployment

GitHub Repository Official Website

Alternative To

• Gradio
• Dash
• Flask
• Shiny

Difficulty Level

Beginner

Suitable for users with basic technical knowledge. Easy to set up and use.

Overview

Streamlit is an open-source Python framework that enables data scientists and machine learning engineers to create interactive web applications directly from Python scripts. With its simple, declarative approach to UI development, Streamlit eliminates the need for front-end expertise, allowing developers to transform data scripts into fully functional web applications in minutes rather than days or weeks.

The framework is designed with a “script runs from top to bottom” philosophy, making the development process intuitive for Python users. As you write and save your code, Streamlit automatically updates the web application, creating a rapid development cycle that accelerates prototyping and deployment. This approach has made Streamlit particularly popular among data professionals who want to quickly share insights, build dashboards, or create interactive demonstrations of machine learning models without getting bogged down in web development complexities.

Key Features

Feature	Description
Pure Python Development	Build complete web applications using only Python, without HTML, CSS, or JavaScript
Live Reloading	Changes to your script are automatically reflected in the app when you save the file
Caching System	Built-in caching mechanism to optimize performance for data-heavy applications
Rich Widget Library	Extensive collection of UI components including sliders, buttons, selectboxes, and more
Data Visualization	Native support for various plotting libraries (Matplotlib, Plotly, Altair, etc.)
Layout Options	Flexible layout capabilities with columns, containers, expanders, and sidebars
File Uploading/Downloading	Easy handling of file uploads and downloads
Session State	Persistent state management across reruns
Theming	Customizable themes and appearance
Component Ecosystem	Extensible with custom components from the community
Cloud Deployment	Free hosting for public apps via Streamlit Community Cloud
Authentication	User authentication capabilities for secure applications
Multipage Apps	Support for building applications with multiple pages

Technical Details

Architecture

Streamlit follows a unique architecture that differs from traditional web frameworks:

Script Execution Model:
- The entire Python script is re-executed from top to bottom on each interaction
- This creates a reactive programming model without explicit event handlers
- Changes to widgets trigger script reruns, updating the UI automatically
Server Components:
- Tornado web server handles HTTP requests
- WebSocket connections maintain real-time communication between browser and server
- Server-side caching system optimizes performance for expensive computations
Frontend:
- React-based frontend renders the UI components
- Components are automatically generated from Python function calls
- Bidirectional data flow between Python backend and JavaScript frontend

Version Information

The current stable version of Streamlit is 1.42.0 (as of March 2025), which requires Python 3.8 or higher. Streamlit has evolved significantly since its initial public release:

Version	Release Date	Major Features
1.42.0	February 2025	Authentication, dataframe improvements
1.30.0	August 2024	Multipage apps, improved performance
1.20.0	March 2024	Enhanced theming, better mobile support
1.10.0	September 2023	Session state improvements, new widgets
1.0.0	July 2022	First stable release with production-ready features
0.1.0	October 2019	Initial public release

Component System

Streamlit’s component system is designed for simplicity and expressiveness:

Core Widgets: Text inputs, buttons, sliders, checkboxes, radio buttons, selectboxes
Media Components: Image, video, audio display and capture
Data Display: Dataframes, tables, metrics, JSON viewers
Layout Elements: Columns, containers, expanders, tabs, sidebars
Status Elements: Progress bars, spinners, balloons, success/error/info messages
Chart Components: Native support for various plotting libraries

Each component is invoked as a simple Python function call, with the state automatically managed by Streamlit’s reactive execution model.

Why Use Streamlit

Compared to Gradio

Broader Focus: While Gradio excels at ML model demos, Streamlit is designed for general data applications
More Layout Control: Offers more sophisticated layout options for complex dashboards
Richer Ecosystem: Larger community with more components and extensions
Data-Centric: Better optimized for data exploration and visualization workflows

Compared to Dash

Simpler Learning Curve: No need to understand callback patterns or React concepts
Faster Development: Significantly less code required for basic applications
Automatic Reactivity: No need to manually define callbacks for interactivity
Easier Deployment: Streamlit Community Cloud offers free, one-click deployment

Compared to Flask

No Web Development Knowledge Required: No HTML, CSS, or JavaScript needed
Built-in Components: Pre-built UI components for common data science tasks
Automatic State Management: No need to manually handle HTTP requests or sessions
Live Reloading: Changes are immediately reflected without manual server restarts

Compared to Shiny (R)

Python Ecosystem: Leverages the entire Python data science stack
Simpler Programming Model: More straightforward than Shiny’s reactive programming
Less Boilerplate: Requires less code for similar functionality
Better Performance: Generally faster for complex applications

System Requirements

Minimum Requirements

Python: 3.8 or higher
CPU: 2+ cores
RAM: 4GB+
Storage: 1GB+
Operating System: Windows, macOS, or Linux
Browser: Chrome, Firefox, Edge, or Safari (latest versions)

Recommended for Data-Intensive Applications

Python: 3.10 or higher
CPU: 4+ cores
RAM: 8GB+
Storage: 5GB+
Operating System: Windows 10/11, macOS 12+, or Ubuntu 20.04+
Browser: Chrome or Firefox (latest versions)

Dependencies

Streamlit has several key dependencies:

NumPy and Pandas (data manipulation)
Pillow (image processing)
Tornado (web server)
Protobuf (data serialization)
Watchdog (file system monitoring)
Toml (configuration)
Python-dateutil (date utilities)
Typing-extensions (type hints)

Installation Guide

Basic Installation

The simplest way to install Streamlit is via pip:

pip install streamlit

To verify the installation and see a demo app:

streamlit hello

This will open a browser window with Streamlit’s demo application.

Installation in a Virtual Environment

For a more isolated environment, use a virtual environment:

# Create a virtual environment
python -m venv streamlit-env

# Activate the environment (Windows)
streamlit-env\Scripts\activate

# Activate the environment (macOS/Linux)
source streamlit-env/bin/activate

# Install Streamlit
pip install streamlit

Installation with Conda

If you’re using Anaconda or Miniconda:

# Create a new conda environment
conda create -n streamlit-env python=3.10

# Activate the environment
conda activate streamlit-env

# Install Streamlit
pip install streamlit

Development Installation

For contributing to Streamlit or using the latest development version:

# Clone the repository
git clone https://github.com/streamlit/streamlit.git
cd streamlit

# Install in development mode
pip install -e ".[development]"

Practical Exercise: Creating a Data Dashboard

Let’s build a simple but functional data dashboard using Streamlit. This example will demonstrate how to:

Load and display data
Add interactive filters
Create visualizations
Organize the layout

import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt

# Set page configuration
st.set_page_config(
    page_title="Sales Dashboard",
    page_icon="📊",
    layout="wide",
    initial_sidebar_state="expanded"
)

# Add a title and description
st.title("📊 Sales Dashboard")
st.markdown("An interactive dashboard to analyze sales data across regions and product categories.")

# Create some sample data
@st.cache_data
def load_data():
    # Generate random sales data
    np.random.seed(42)
    dates = pd.date_range(start="2024-01-01", end="2024-12-31", freq="D")
    regions = ["North", "South", "East", "West"]
    categories = ["Electronics", "Clothing", "Food", "Home Goods"]

    data = []
    for date in dates:
        for region in regions:
            for category in categories:
                sales = np.random.randint(100, 5000)
                profit = sales * np.random.uniform(0.1, 0.4)
                data.append({
                    "Date": date,
                    "Region": region,
                    "Category": category,
                    "Sales": sales,
                    "Profit": profit
                })

    return pd.DataFrame(data)

# Load the data
df = load_data()

# Add sidebar filters
st.sidebar.header("Filters")

# Date range filter
date_range = st.sidebar.date_input(
    "Select Date Range",
    value=[df["Date"].min().date(), df["Date"].max().date()],
    min_value=df["Date"].min().date(),
    max_value=df["Date"].max().date()
)

# Region filter
selected_regions = st.sidebar.multiselect(
    "Select Regions",
    options=df["Region"].unique(),
    default=df["Region"].unique()
)

# Category filter
selected_categories = st.sidebar.multiselect(
    "Select Categories",
    options=df["Category"].unique(),
    default=df["Category"].unique()
)

# Apply filters
filtered_df = df[
    (df["Date"].dt.date >= date_range[0]) &
    (df["Date"].dt.date <= date_range[1]) &
    (df["Region"].isin(selected_regions)) &
    (df["Category"].isin(selected_categories))
]

# Create dashboard layout
col1, col2 = st.columns(2)

# Key metrics
with st.container():
    st.subheader("Key Metrics")
    metric1, metric2, metric3, metric4 = st.columns(4)

    with metric1:
        total_sales = filtered_df["Sales"].sum()
        st.metric("Total Sales", f"${total_sales:,.2f}")

    with metric2:
        total_profit = filtered_df["Profit"].sum()
        st.metric("Total Profit", f"${total_profit:,.2f}")

    with metric3:
        profit_margin = (total_profit / total_sales) * 100
        st.metric("Profit Margin", f"{profit_margin:.2f}%")

    with metric4:
        avg_daily_sales = filtered_df.groupby("Date")["Sales"].sum().mean()
        st.metric("Avg. Daily Sales", f"${avg_daily_sales:,.2f}")

# Sales by region chart
with col1:
    st.subheader("Sales by Region")
    region_sales = filtered_df.groupby("Region")["Sales"].sum().reset_index()
    fig_region = px.bar(
        region_sales,
        x="Region",
        y="Sales",
        color="Region",
        title="Total Sales by Region"
    )
    st.plotly_chart(fig_region, use_container_width=True)

# Sales by category chart
with col2:
    st.subheader("Sales by Category")
    category_sales = filtered_df.groupby("Category")["Sales"].sum().reset_index()
    fig_category = px.pie(
        category_sales,
        values="Sales",
        names="Category",
        title="Sales Distribution by Category"
    )
    st.plotly_chart(fig_category, use_container_width=True)

# Sales trend over time
st.subheader("Sales Trend Over Time")
time_series = filtered_df.groupby("Date")[["Sales", "Profit"]].sum().reset_index()
fig_time = px.line(
    time_series,
    x="Date",
    y=["Sales", "Profit"],
    title="Sales and Profit Trends",
    labels={"value": "Amount ($)", "variable": "Metric"}
)
st.plotly_chart(fig_time, use_container_width=True)

# Data table
with st.expander("View Detailed Data"):
    st.dataframe(filtered_df, use_container_width=True)

# Add download button
csv = filtered_df.to_csv(index=False).encode('utf-8')
st.download_button(
    label="Download Data as CSV",
    data=csv,
    file_name="sales_data.csv",
    mime="text/csv"
)

# Footer
st.markdown("---")
st.caption("Dashboard created with Streamlit • Data refreshes daily")

To run this dashboard, save the code to a file (e.g., sales_dashboard.py) and execute:

streamlit run sales_dashboard.py

Advanced Example: Machine Learning Model Explorer

Here’s a more advanced example that demonstrates how to create an interactive machine learning model explorer:

import streamlit as st
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris, load_wine, load_breast_cancer, load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import accuracy_score, r2_score, confusion_matrix
import plotly.express as px
import plotly.figure_factory as ff
import matplotlib.pyplot as plt
import seaborn as sns

# Page configuration
st.set_page_config(
    page_title="ML Model Explorer",
    page_icon="🤖",
    layout="wide"
)

# Title and description
st.title("🤖 Machine Learning Model Explorer")
st.markdown("""
This app allows you to explore different datasets and train a Random Forest model with customizable parameters.
Adjust the settings in the sidebar and see how the model performance changes.
""")

# Sidebar for dataset selection and model parameters
st.sidebar.header("Settings")

# Dataset selection
dataset_name = st.sidebar.selectbox(
    "Select Dataset",
    ("Iris", "Wine", "Breast Cancer", "Diabetes (Regression)")
)

# Load the selected dataset
@st.cache_data
def get_dataset(name):
    if name == "Iris":
        data = load_iris()
        task = "classification"
    elif name == "Wine":
        data = load_wine()
        task = "classification"
    elif name == "Breast Cancer":
        data = load_breast_cancer()
        task = "classification"
    else:  # Diabetes
        data = load_diabetes()
        task = "regression"

    X = pd.DataFrame(data.data, columns=data.feature_names)
    y = pd.Series(data.target, name="target")

    return X, y, task, data.target_names if task == "classification" else None

X, y, task, target_names = get_dataset(dataset_name)

# Display dataset info
st.sidebar.subheader("Dataset Information")
st.sidebar.write(f"Shape: {X.shape}")
st.sidebar.write(f"Task: {task.capitalize()}")

# Model parameters
st.sidebar.subheader("Model Parameters")

n_estimators = st.sidebar.slider("Number of trees", 10, 500, 100, 10)
max_depth = st.sidebar.slider("Maximum depth", 1, 30, 10)
min_samples_split = st.sidebar.slider("Minimum samples to split", 2, 20, 2)
test_size = st.sidebar.slider("Test size", 0.1, 0.5, 0.2, 0.05)
random_state = st.sidebar.slider("Random state", 0, 100, 42)

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=test_size, random_state=random_state
)

# Train model based on task type
@st.cache_data
def train_model(X_train, y_train, task, params):
    if task == "classification":
        model = RandomForestClassifier(**params)
    else:
        model = RandomForestRegressor(**params)

    model.fit(X_train, y_train)
    return model

model_params = {
    "n_estimators": n_estimators,
    "max_depth": max_depth,
    "min_samples_split": min_samples_split,
    "random_state": random_state
}

model = train_model(X_train, y_train, task, model_params)

# Make predictions
y_pred = model.predict(X_test)

# Calculate performance metrics
if task == "classification":
    accuracy = accuracy_score(y_test, y_pred)
    cm = confusion_matrix(y_test, y_pred)
    performance_metric = accuracy
    metric_name = "Accuracy"
else:
    r2 = r2_score(y_test, y_pred)
    performance_metric = r2
    metric_name = "R² Score"

# Main content
col1, col2 = st.columns([2, 1])

# Model performance
with col2:
    st.subheader("Model Performance")
    st.metric(metric_name, f"{performance_metric:.4f}")

    # Feature importance
    st.subheader("Feature Importance")
    feature_importance = pd.DataFrame({
        'Feature': X.columns,
        'Importance': model.feature_importances_
    }).sort_values('Importance', ascending=False)

    fig_importance = px.bar(
        feature_importance.head(10),
        x='Importance',
        y='Feature',
        orientation='h',
        title="Top 10 Features by Importance"
    )
    st.plotly_chart(fig_importance, use_container_width=True)

# Data visualization
with col1:
    st.subheader("Data Visualization")

    # Select features for visualization
    if X.shape[1] > 1:
        viz_features = st.multiselect(
            "Select features for visualization",
            options=X.columns.tolist(),
            default=X.columns.tolist()[:2]
        )

        if len(viz_features) >= 2:
            # Scatter plot for classification, or scatter with color as target for regression
            if task == "classification":
                fig = px.scatter(
                    pd.concat([X, y], axis=1),
                    x=viz_features[0],
                    y=viz_features[1],
                    color=y.name,
                    color_continuous_scale=px.colors.qualitative.G10 if task == "classification" else "Viridis",
                    title=f"{viz_features[0]} vs {viz_features[1]} by Class"
                )
            else:
                fig = px.scatter(
                    pd.concat([X, y], axis=1),
                    x=viz_features[0],
                    y=viz_features[1],
                    color=y.name,
                    color_continuous_scale="Viridis",
                    title=f"{viz_features[0]} vs {viz_features[1]} by Target Value"
                )

            st.plotly_chart(fig, use_container_width=True)

    # Confusion matrix for classification
    if task == "classification":
        st.subheader("Confusion Matrix")

        # Create confusion matrix heatmap
        fig_cm = ff.create_annotated_heatmap(
            z=cm,
            x=[f"Predicted {i}" for i in range(len(target_names))],
            y=[f"Actual {i}" for i in range(len(target_names))],
            colorscale="Blues"
        )
        fig_cm.update_layout(title="Confusion Matrix")
        st.plotly_chart(fig_cm, use_container_width=True)

    # Prediction vs Actual for regression
    if task == "regression":
        st.subheader("Prediction vs Actual")

        pred_vs_actual = pd.DataFrame({
            'Actual': y_test,
            'Predicted': y_pred
        })

        fig_reg = px.scatter(
            pred_vs_actual,
            x='Actual',
            y='Predicted',
            title="Predicted vs Actual Values"
        )

        # Add perfect prediction line
        min_val = min(pred_vs_actual['Actual'].min(), pred_vs_actual['Predicted'].min())
        max_val = max(pred_vs_actual['Actual'].max(), pred_vs_actual['Predicted'].max())
        fig_reg.add_shape(
            type="line",
            x0=min_val,
            y0=min_val,
            x1=max_val,
            y1=max_val,
            line=dict(color="red", dash="dash")
        )

        st.plotly_chart(fig_reg, use_container_width=True)

# Raw data
with st.expander("View Raw Data"):
    st.dataframe(pd.concat([X, y], axis=1), use_container_width=True)

# Footer
st.markdown("---")
st.caption("Machine Learning Model Explorer • Built with Streamlit")

To run this ML explorer, save the code to a file (e.g., ml_explorer.py) and execute:

streamlit run ml_explorer.py

Resources

Official Documentation
GitHub Repository
Streamlit Community Cloud - Free hosting platform for Streamlit apps
Streamlit Gallery - Collection of example applications
Streamlit Components - Directory of community-built extensions
Streamlit Forum - Community forum for questions and discussions
Streamlit Blog - Official blog with tutorials and updates
Streamlit Cheat Sheet - Quick reference for common functions
Streamlit YouTube Channel - Video tutorials and demonstrations
Awesome Streamlit - Curated list of Streamlit resources