Streamlit
An open-source Python library that transforms data scripts into shareable web applications in minutes, with no front-end experience required.
Alternative To
- • Gradio
- • Dash
- • Flask
- • Shiny
Difficulty Level
Suitable for users with basic technical knowledge. Easy to set up and use.
Overview
Streamlit is an open-source Python framework that enables data scientists and machine learning engineers to create interactive web applications directly from Python scripts. With its simple, declarative approach to UI development, Streamlit eliminates the need for front-end expertise, allowing developers to transform data scripts into fully functional web applications in minutes rather than days or weeks.
The framework is designed with a “script runs from top to bottom” philosophy, making the development process intuitive for Python users. As you write and save your code, Streamlit automatically updates the web application, creating a rapid development cycle that accelerates prototyping and deployment. This approach has made Streamlit particularly popular among data professionals who want to quickly share insights, build dashboards, or create interactive demonstrations of machine learning models without getting bogged down in web development complexities.
Key Features
Feature | Description |
---|---|
Pure Python Development | Build complete web applications using only Python, without HTML, CSS, or JavaScript |
Live Reloading | Changes to your script are automatically reflected in the app when you save the file |
Caching System | Built-in caching mechanism to optimize performance for data-heavy applications |
Rich Widget Library | Extensive collection of UI components including sliders, buttons, selectboxes, and more |
Data Visualization | Native support for various plotting libraries (Matplotlib, Plotly, Altair, etc.) |
Layout Options | Flexible layout capabilities with columns, containers, expanders, and sidebars |
File Uploading/Downloading | Easy handling of file uploads and downloads |
Session State | Persistent state management across reruns |
Theming | Customizable themes and appearance |
Component Ecosystem | Extensible with custom components from the community |
Cloud Deployment | Free hosting for public apps via Streamlit Community Cloud |
Authentication | User authentication capabilities for secure applications |
Multipage Apps | Support for building applications with multiple pages |
Technical Details
Architecture
Streamlit follows a unique architecture that differs from traditional web frameworks:
Script Execution Model:
- The entire Python script is re-executed from top to bottom on each interaction
- This creates a reactive programming model without explicit event handlers
- Changes to widgets trigger script reruns, updating the UI automatically
Server Components:
- Tornado web server handles HTTP requests
- WebSocket connections maintain real-time communication between browser and server
- Server-side caching system optimizes performance for expensive computations
Frontend:
- React-based frontend renders the UI components
- Components are automatically generated from Python function calls
- Bidirectional data flow between Python backend and JavaScript frontend
Version Information
The current stable version of Streamlit is 1.42.0 (as of March 2025), which requires Python 3.8 or higher. Streamlit has evolved significantly since its initial public release:
Version | Release Date | Major Features |
---|---|---|
1.42.0 | February 2025 | Authentication, dataframe improvements |
1.30.0 | August 2024 | Multipage apps, improved performance |
1.20.0 | March 2024 | Enhanced theming, better mobile support |
1.10.0 | September 2023 | Session state improvements, new widgets |
1.0.0 | July 2022 | First stable release with production-ready features |
0.1.0 | October 2019 | Initial public release |
Component System
Streamlit’s component system is designed for simplicity and expressiveness:
- Core Widgets: Text inputs, buttons, sliders, checkboxes, radio buttons, selectboxes
- Media Components: Image, video, audio display and capture
- Data Display: Dataframes, tables, metrics, JSON viewers
- Layout Elements: Columns, containers, expanders, tabs, sidebars
- Status Elements: Progress bars, spinners, balloons, success/error/info messages
- Chart Components: Native support for various plotting libraries
Each component is invoked as a simple Python function call, with the state automatically managed by Streamlit’s reactive execution model.
Why Use Streamlit
Compared to Gradio
- Broader Focus: While Gradio excels at ML model demos, Streamlit is designed for general data applications
- More Layout Control: Offers more sophisticated layout options for complex dashboards
- Richer Ecosystem: Larger community with more components and extensions
- Data-Centric: Better optimized for data exploration and visualization workflows
Compared to Dash
- Simpler Learning Curve: No need to understand callback patterns or React concepts
- Faster Development: Significantly less code required for basic applications
- Automatic Reactivity: No need to manually define callbacks for interactivity
- Easier Deployment: Streamlit Community Cloud offers free, one-click deployment
Compared to Flask
- No Web Development Knowledge Required: No HTML, CSS, or JavaScript needed
- Built-in Components: Pre-built UI components for common data science tasks
- Automatic State Management: No need to manually handle HTTP requests or sessions
- Live Reloading: Changes are immediately reflected without manual server restarts
Compared to Shiny (R)
- Python Ecosystem: Leverages the entire Python data science stack
- Simpler Programming Model: More straightforward than Shiny’s reactive programming
- Less Boilerplate: Requires less code for similar functionality
- Better Performance: Generally faster for complex applications
System Requirements
Minimum Requirements
- Python: 3.8 or higher
- CPU: 2+ cores
- RAM: 4GB+
- Storage: 1GB+
- Operating System: Windows, macOS, or Linux
- Browser: Chrome, Firefox, Edge, or Safari (latest versions)
Recommended for Data-Intensive Applications
- Python: 3.10 or higher
- CPU: 4+ cores
- RAM: 8GB+
- Storage: 5GB+
- Operating System: Windows 10/11, macOS 12+, or Ubuntu 20.04+
- Browser: Chrome or Firefox (latest versions)
Dependencies
Streamlit has several key dependencies:
- NumPy and Pandas (data manipulation)
- Pillow (image processing)
- Tornado (web server)
- Protobuf (data serialization)
- Watchdog (file system monitoring)
- Toml (configuration)
- Python-dateutil (date utilities)
- Typing-extensions (type hints)
Installation Guide
Basic Installation
The simplest way to install Streamlit is via pip:
pip install streamlit
To verify the installation and see a demo app:
streamlit hello
This will open a browser window with Streamlit’s demo application.
Installation in a Virtual Environment
For a more isolated environment, use a virtual environment:
# Create a virtual environment
python -m venv streamlit-env
# Activate the environment (Windows)
streamlit-env\Scripts\activate
# Activate the environment (macOS/Linux)
source streamlit-env/bin/activate
# Install Streamlit
pip install streamlit
Installation with Conda
If you’re using Anaconda or Miniconda:
# Create a new conda environment
conda create -n streamlit-env python=3.10
# Activate the environment
conda activate streamlit-env
# Install Streamlit
pip install streamlit
Development Installation
For contributing to Streamlit or using the latest development version:
# Clone the repository
git clone https://github.com/streamlit/streamlit.git
cd streamlit
# Install in development mode
pip install -e ".[development]"
Practical Exercise: Creating a Data Dashboard
Let’s build a simple but functional data dashboard using Streamlit. This example will demonstrate how to:
- Load and display data
- Add interactive filters
- Create visualizations
- Organize the layout
import streamlit as st
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt
# Set page configuration
st.set_page_config(
page_title="Sales Dashboard",
page_icon="📊",
layout="wide",
initial_sidebar_state="expanded"
)
# Add a title and description
st.title("📊 Sales Dashboard")
st.markdown("An interactive dashboard to analyze sales data across regions and product categories.")
# Create some sample data
@st.cache_data
def load_data():
# Generate random sales data
np.random.seed(42)
dates = pd.date_range(start="2024-01-01", end="2024-12-31", freq="D")
regions = ["North", "South", "East", "West"]
categories = ["Electronics", "Clothing", "Food", "Home Goods"]
data = []
for date in dates:
for region in regions:
for category in categories:
sales = np.random.randint(100, 5000)
profit = sales * np.random.uniform(0.1, 0.4)
data.append({
"Date": date,
"Region": region,
"Category": category,
"Sales": sales,
"Profit": profit
})
return pd.DataFrame(data)
# Load the data
df = load_data()
# Add sidebar filters
st.sidebar.header("Filters")
# Date range filter
date_range = st.sidebar.date_input(
"Select Date Range",
value=[df["Date"].min().date(), df["Date"].max().date()],
min_value=df["Date"].min().date(),
max_value=df["Date"].max().date()
)
# Region filter
selected_regions = st.sidebar.multiselect(
"Select Regions",
options=df["Region"].unique(),
default=df["Region"].unique()
)
# Category filter
selected_categories = st.sidebar.multiselect(
"Select Categories",
options=df["Category"].unique(),
default=df["Category"].unique()
)
# Apply filters
filtered_df = df[
(df["Date"].dt.date >= date_range[0]) &
(df["Date"].dt.date <= date_range[1]) &
(df["Region"].isin(selected_regions)) &
(df["Category"].isin(selected_categories))
]
# Create dashboard layout
col1, col2 = st.columns(2)
# Key metrics
with st.container():
st.subheader("Key Metrics")
metric1, metric2, metric3, metric4 = st.columns(4)
with metric1:
total_sales = filtered_df["Sales"].sum()
st.metric("Total Sales", f"${total_sales:,.2f}")
with metric2:
total_profit = filtered_df["Profit"].sum()
st.metric("Total Profit", f"${total_profit:,.2f}")
with metric3:
profit_margin = (total_profit / total_sales) * 100
st.metric("Profit Margin", f"{profit_margin:.2f}%")
with metric4:
avg_daily_sales = filtered_df.groupby("Date")["Sales"].sum().mean()
st.metric("Avg. Daily Sales", f"${avg_daily_sales:,.2f}")
# Sales by region chart
with col1:
st.subheader("Sales by Region")
region_sales = filtered_df.groupby("Region")["Sales"].sum().reset_index()
fig_region = px.bar(
region_sales,
x="Region",
y="Sales",
color="Region",
title="Total Sales by Region"
)
st.plotly_chart(fig_region, use_container_width=True)
# Sales by category chart
with col2:
st.subheader("Sales by Category")
category_sales = filtered_df.groupby("Category")["Sales"].sum().reset_index()
fig_category = px.pie(
category_sales,
values="Sales",
names="Category",
title="Sales Distribution by Category"
)
st.plotly_chart(fig_category, use_container_width=True)
# Sales trend over time
st.subheader("Sales Trend Over Time")
time_series = filtered_df.groupby("Date")[["Sales", "Profit"]].sum().reset_index()
fig_time = px.line(
time_series,
x="Date",
y=["Sales", "Profit"],
title="Sales and Profit Trends",
labels={"value": "Amount ($)", "variable": "Metric"}
)
st.plotly_chart(fig_time, use_container_width=True)
# Data table
with st.expander("View Detailed Data"):
st.dataframe(filtered_df, use_container_width=True)
# Add download button
csv = filtered_df.to_csv(index=False).encode('utf-8')
st.download_button(
label="Download Data as CSV",
data=csv,
file_name="sales_data.csv",
mime="text/csv"
)
# Footer
st.markdown("---")
st.caption("Dashboard created with Streamlit • Data refreshes daily")
To run this dashboard, save the code to a file (e.g., sales_dashboard.py
) and execute:
streamlit run sales_dashboard.py
Advanced Example: Machine Learning Model Explorer
Here’s a more advanced example that demonstrates how to create an interactive machine learning model explorer:
import streamlit as st
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris, load_wine, load_breast_cancer, load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import accuracy_score, r2_score, confusion_matrix
import plotly.express as px
import plotly.figure_factory as ff
import matplotlib.pyplot as plt
import seaborn as sns
# Page configuration
st.set_page_config(
page_title="ML Model Explorer",
page_icon="🤖",
layout="wide"
)
# Title and description
st.title("🤖 Machine Learning Model Explorer")
st.markdown("""
This app allows you to explore different datasets and train a Random Forest model with customizable parameters.
Adjust the settings in the sidebar and see how the model performance changes.
""")
# Sidebar for dataset selection and model parameters
st.sidebar.header("Settings")
# Dataset selection
dataset_name = st.sidebar.selectbox(
"Select Dataset",
("Iris", "Wine", "Breast Cancer", "Diabetes (Regression)")
)
# Load the selected dataset
@st.cache_data
def get_dataset(name):
if name == "Iris":
data = load_iris()
task = "classification"
elif name == "Wine":
data = load_wine()
task = "classification"
elif name == "Breast Cancer":
data = load_breast_cancer()
task = "classification"
else: # Diabetes
data = load_diabetes()
task = "regression"
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target, name="target")
return X, y, task, data.target_names if task == "classification" else None
X, y, task, target_names = get_dataset(dataset_name)
# Display dataset info
st.sidebar.subheader("Dataset Information")
st.sidebar.write(f"Shape: {X.shape}")
st.sidebar.write(f"Task: {task.capitalize()}")
# Model parameters
st.sidebar.subheader("Model Parameters")
n_estimators = st.sidebar.slider("Number of trees", 10, 500, 100, 10)
max_depth = st.sidebar.slider("Maximum depth", 1, 30, 10)
min_samples_split = st.sidebar.slider("Minimum samples to split", 2, 20, 2)
test_size = st.sidebar.slider("Test size", 0.1, 0.5, 0.2, 0.05)
random_state = st.sidebar.slider("Random state", 0, 100, 42)
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=test_size, random_state=random_state
)
# Train model based on task type
@st.cache_data
def train_model(X_train, y_train, task, params):
if task == "classification":
model = RandomForestClassifier(**params)
else:
model = RandomForestRegressor(**params)
model.fit(X_train, y_train)
return model
model_params = {
"n_estimators": n_estimators,
"max_depth": max_depth,
"min_samples_split": min_samples_split,
"random_state": random_state
}
model = train_model(X_train, y_train, task, model_params)
# Make predictions
y_pred = model.predict(X_test)
# Calculate performance metrics
if task == "classification":
accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
performance_metric = accuracy
metric_name = "Accuracy"
else:
r2 = r2_score(y_test, y_pred)
performance_metric = r2
metric_name = "R² Score"
# Main content
col1, col2 = st.columns([2, 1])
# Model performance
with col2:
st.subheader("Model Performance")
st.metric(metric_name, f"{performance_metric:.4f}")
# Feature importance
st.subheader("Feature Importance")
feature_importance = pd.DataFrame({
'Feature': X.columns,
'Importance': model.feature_importances_
}).sort_values('Importance', ascending=False)
fig_importance = px.bar(
feature_importance.head(10),
x='Importance',
y='Feature',
orientation='h',
title="Top 10 Features by Importance"
)
st.plotly_chart(fig_importance, use_container_width=True)
# Data visualization
with col1:
st.subheader("Data Visualization")
# Select features for visualization
if X.shape[1] > 1:
viz_features = st.multiselect(
"Select features for visualization",
options=X.columns.tolist(),
default=X.columns.tolist()[:2]
)
if len(viz_features) >= 2:
# Scatter plot for classification, or scatter with color as target for regression
if task == "classification":
fig = px.scatter(
pd.concat([X, y], axis=1),
x=viz_features[0],
y=viz_features[1],
color=y.name,
color_continuous_scale=px.colors.qualitative.G10 if task == "classification" else "Viridis",
title=f"{viz_features[0]} vs {viz_features[1]} by Class"
)
else:
fig = px.scatter(
pd.concat([X, y], axis=1),
x=viz_features[0],
y=viz_features[1],
color=y.name,
color_continuous_scale="Viridis",
title=f"{viz_features[0]} vs {viz_features[1]} by Target Value"
)
st.plotly_chart(fig, use_container_width=True)
# Confusion matrix for classification
if task == "classification":
st.subheader("Confusion Matrix")
# Create confusion matrix heatmap
fig_cm = ff.create_annotated_heatmap(
z=cm,
x=[f"Predicted {i}" for i in range(len(target_names))],
y=[f"Actual {i}" for i in range(len(target_names))],
colorscale="Blues"
)
fig_cm.update_layout(title="Confusion Matrix")
st.plotly_chart(fig_cm, use_container_width=True)
# Prediction vs Actual for regression
if task == "regression":
st.subheader("Prediction vs Actual")
pred_vs_actual = pd.DataFrame({
'Actual': y_test,
'Predicted': y_pred
})
fig_reg = px.scatter(
pred_vs_actual,
x='Actual',
y='Predicted',
title="Predicted vs Actual Values"
)
# Add perfect prediction line
min_val = min(pred_vs_actual['Actual'].min(), pred_vs_actual['Predicted'].min())
max_val = max(pred_vs_actual['Actual'].max(), pred_vs_actual['Predicted'].max())
fig_reg.add_shape(
type="line",
x0=min_val,
y0=min_val,
x1=max_val,
y1=max_val,
line=dict(color="red", dash="dash")
)
st.plotly_chart(fig_reg, use_container_width=True)
# Raw data
with st.expander("View Raw Data"):
st.dataframe(pd.concat([X, y], axis=1), use_container_width=True)
# Footer
st.markdown("---")
st.caption("Machine Learning Model Explorer • Built with Streamlit")
To run this ML explorer, save the code to a file (e.g., ml_explorer.py
) and execute:
streamlit run ml_explorer.py
Resources
- Official Documentation
- GitHub Repository
- Streamlit Community Cloud - Free hosting platform for Streamlit apps
- Streamlit Gallery - Collection of example applications
- Streamlit Components - Directory of community-built extensions
- Streamlit Forum - Community forum for questions and discussions
- Streamlit Blog - Official blog with tutorials and updates
- Streamlit Cheat Sheet - Quick reference for common functions
- Streamlit YouTube Channel - Video tutorials and demonstrations
- Awesome Streamlit - Curated list of Streamlit resources