Practice

1084 problems across SQL, Python, Data Modeling, and Architecture

SQL854 problems

The Missing Refunds

Find orders from the last 30 days with no matching refund record using the anti-join pattern.

Easy12m

The Duplicate Detection Sprint

Find duplicate email addresses with occurrence counts and signup date ranges using GROUP BY and HAVING.

Easy10m

Weekend Warriors

Compare event volumes on weekdays vs weekends using CASE WHEN pivot pattern.

Easy12m

The Dormant Accounts

Find paying customers inactive for 90+ days with their subscription tier and lifetime spend.

Easy15m

30-Day Page View Counts

Product analytics wants a quick engagement snapshot.

Easy17m

Above Average

Easy5m

Above Average Interactions

Find users exceeding average activity count.

Easy10m

Above Category Average

Find rows whose value exceeds a joined reference.

Easy10m

Active API Tokens

The admin team needs all API tokens that have actually been used.

Easy5m

Active Campaigns

Easy5m

Active Token Owners in 2026

The security team is auditing API token usage.

Easy5m

Active Users With April Transactions

The monetization team wants to know how many active users made at least one transaction in April 2026.

Easy7m

Activity Histogram

Two level GROUP BY to build a frequency histogram.

Easy10m

Ad Clickers

Easy5m

Ad Revenue 2026

Finance is pulling annual ad revenue.

Easy5m

Alert Hotspots by Service and Severity

The infrastructure team is mapping alert hotspots.

Easy5m

All Infra Regions

The data catalog team needs a list of all distinct regions present in the infrastructure nodes table.

Easy5m

Annual Cloud Spend

The cloud finance team needs total annual cloud spending.

Easy9m

Annual Cloud Spend Summary

The finance team wants total cloud spending and the number of distinct services billed in each fiscal year.

Easy5m

Annual Pipeline Failures

The data platform team is auditing pipeline failures.

Easy9m

April and May Active Users

The analytics team needs a list of users who were active in April or May.

Easy5m

Auth Endpoints

The platform team needs all API endpoints that contain the word 'health' (case-insensitive).

Easy5m

Authors With Successful Deploys

The delivery team tracks successful deployment authors.

Easy5m

Auth Service Health Checks

The compliance team needs a full audit trail for the 'auth-service'.

Easy5m

Average Brand Campaign Revenue

The ad analytics team needs a quick benchmark.

Easy5m

Average Build Duration by Repo

The platform team is benchmarking build performance.

Easy5m

Average DQ Fail Rate

The data quality team computes average check fail rates per table.

Easy7m

Average GPU Node CPU Usage

The infra team is sizing capacity for GPU nodes.

Easy5m

Average Headcount by Department

HR analytics is benchmarking compensation.

Easy5m

Average High-Range Accuracy

The ML platform team needs the average accuracy for models scoring between 91 and 100 inclusive.

Easy5m

Average Latency by Health Status

For a quarterly reliability review, the team needs the average latency broken out by health status for 'healthy', 'degra

Easy5m

Average Latency by Status

The SRE team wants the average latency for each API call status.

Easy5m

Average Node Utilization

The cloud finance team needs average CPU and memory utilization broken down by region and node type.

Easy5m

Average Rating by Category

The product analytics team wants the average rating for products in each category.

Easy5m

Average Response Time by Hour

The SRE team needs average response time broken down by hour of day, sorted chronologically.

Easy5m

Average Search Endpoint Latency

The performance team wants the average latency for the '/api/search' endpoint.

Easy5m

Average Search Results Per User

The search relevance team is benchmarking query volume per user.

Easy5m

Average Session Duration by Device

The performance team wants to benchmark session lengths by device.

Easy7m

Bargain Bin

Easy5m

Best-Selling Reps Each Month

The marketplace team needs the top 3 sellers by total amount in each product category for January.

Easy13m

Big Spenders

Easy5m

Budget Flag

Join tables and label rows as over or under budget.

Easy10m

Budget-Friendly Products

The product team wants all distinct product names that are budget-friendly, meaning priced between 5 and 20 inclusive.

Easy5m

Campaign Match Rate

The marketing analytics team wants to measure campaign reach.

Easy17m

Campaign Revenue Totals

The finance team needs total ad revenue grouped by campaign.

Easy5m

Cart Sizes

Easy5m

Category Census

Easy5m

Category Sales Summary

The merchandising team is reviewing category performance for 2026.

Easy15m

Category-Specific Product Volume

Sum transactions for a specific payment type.

Easy10m

CDN Image Request Paths

The networking team needs all CDN log entries related to image requests.

Easy5m

CDN-Related DNS Lookups

The DNS team is filtering lookup records.

Easy5m

Character Position in Endpoint

The growth team is debugging URL patterns.

Easy5m

Cheapest Cost Per Region

The cost optimization team is finding the lowest cloud spend per region.

Easy5m

Cheapest Transaction per User

The commerce team wants to identify each customer's cheapest purchase.

Easy5m

Clean Averages

Easy5m

Clean Cache CDN Edges

The network team needs to identify CDN edges that serve cached content with no errors.

Easy5m

Clean Latency Cast

The data quality team found that the latency column in service health records contains some non-numeric strings.

Easy5m

Clicked Ad Impressions

The product analytics team needs all ad impression records where the user clicked.

Easy5m

Click Revenue

Easy5m

Cloud Cost by Team

The FinOps team is reviewing spend by team.

Easy7m

Common Age Buckets

The data quality team suspects duplicate records in the users table.

Easy11m

Completed Priority-1 Jobs

The batch processing team needs all completed priority-1 jobs.

Easy5m

Compute Nodes in Key Regions

The infra team is inventorying compute nodes across key regions.

Easy5m

Content by Specific Users

The content team wants a quick audit of items created by 'alice' or 'bob'.

Easy5m

Content Duration Snapshot

The content team wants a popularity snapshot.

Easy5m

Content Mix

Easy5m

Content Published in 2026

The product analytics team is investigating historical engagement trends and needs all content items published during 20

Easy5m

Content Sorted by Duration

The content team is preparing a catalog sorted by length.

Easy5m

Content Type Distribution

The data catalog team needs a count of records by content type, sorted alphabetically by type.

Easy5m

Content Types by Creator

The content moderation team needs to audit a specific creator.

Easy5m

Cost Efficiency Ratio

Finance is calculating efficiency ratios for cloud services.

Easy5m

Count Distinct Services

The platform team wants to understand the breadth of the service mesh.

Easy5m

Count Nodes in Region

The ops team needs a quick count of how many infrastructure nodes are in the 'us-east-1' region.

Easy5m

CPU Utilization Summary

For the quarterly infrastructure report, the team needs the minimum, average, and maximum CPU utilization across all nod

Easy9m

Customer Full Name Concat

For the internal directory, the HR team needs each customer's first and last name combined into a single full name field

Easy5m

Daily and Weekly Active Users

Count distinct users per day and per week.

Easy10m

Daily Cross-Platform Users

The platform analytics team needs daily unique user counts across mobile and web sessions.

Easy11m

Daily Deployment Count

The ops dashboard needs daily deployment counts.

Easy5m

Department Spend Difference

HR is benchmarking compensation across teams.

Easy9m

Deploy Cadence

Easy5m

Deploy Count by Service

The release management team needs a deploy frequency report showing how many deployments each service has had, sorted fr

Easy5m

Deployed Models by Framework

The ML platform team is cataloging model frameworks by deployment status.

Easy5m

Deployment Duration by Status

The ops team is reviewing deployment outcomes.

Easy7m

Deprecated Model Count

The ML platform team is auditing model versions.

Easy5m

Device Mix

Easy5m

Device Types With Chrome Users

The growth team is profiling power users by device type.

Easy7m

Disabled Feature Flags

The SRE team needs to audit disabled feature flags.

Easy5m

Distinct Blog Referrers

The analytics team needs a deduplicated list of all referrer sources for blog content.

Easy5m

Distinct Product Categories

The product catalog team needs a quick inventory check.

Easy5m

Early 2026 Data Pipelines

The data governance team needs a historical audit.

Easy5m

Email Census

Easy5m

Employees Per Department

Count members grouped by location.

Easy10m

Error Severity Buckets

The product team wants to classify error severity for reporting.

Easy5m

Errors With Service Health

The observability team needs to enrich error data with service health context.

Easy5m

Even-ID February Signups

The analytics team is running a signup cohort analysis and wants to isolate February signups with odd-numbered user IDs.

Easy5m

Even-ID June Signups

HR is auditing onboarding patterns and wants to look at June signups whose user IDs are even numbers.

Easy5m

Event Count on Key Days

The product team is tracking event volume on key days.

Easy7m

Events by Month Across Years

The product analytics team wants to see total event volume for each calendar month (1 through 12), combining all years t

Easy9m

Event Types Spanning Multiple Months

The product team wants to find versatile users who have triggered both 'click' and 'purchase' event types in event_data.

Easy7m

Expensive AWS Services

The procurement team needs a count of distinct AWS services that have at least one cost entry of 200 or more.

Easy5m

Extreme Headcount Departments

HR needs to find employees with extreme compensation.

Easy5m

Failed Payment Deployments

The SRE team is tracking failed deployments.

Easy5m

Features With Missing Values

The data quality team is investigating missing data.

Easy5m

February 2024 Signups

The HR analytics team needs a cohort snapshot of all users who signed up during February 2026.

Easy5m

Filter By Domain

Select rows matching a text suffix pattern.

Easy10m

Filtered User Roster

The people ops team needs a clean roster for the all-hands.

Easy9m

Find Deploy Authors

The HR team needs to find all deploy authors whose last name is 'Chen'.

Easy5m

First Build per Repository

The devops team wants to know when each repository first appeared in the CI pipeline.

Easy9m

First Migration Record

The migrations team wants the very first migration ever applied, identified by the smallest migration ID.

Easy7m

First Run Row Count

The batch operations team needs each job's initial row count.

Easy7m

Flag Check

Easy5m

Full Customer Order List

The ops team needs a full roster of customers alongside their order history, including customers who have never placed a

Easy5m

Gateway Connection Timeouts

The SRE team is hunting for specific error patterns.

Easy5m

Health Check Distribution

The SRE team wants to see the distribution of health check results for auth-service.

Easy5m

Health Checks per Service

The observability team needs a quick service inventory showing how many health checks exist for each service.

Easy5m

Heavy Searchers in August

The search relevance team wants to flag power searchers.

Easy7m

High and Critical Alerts in 2026

The alerting team wants to review stale alerts.

Easy5m

Higher Performing Variant

The A/B testing team wants to know whether the control or treatment variant produces a higher average metric value.

Easy9m

Higher Than Supervisor

Self join to find employees outscoring their manager.

Easy10m

Highest Cost Per Team

The executive dashboard needs peak cost metrics per team.

Easy5m

Highest Latency Endpoints

The SRE team wants the 3 endpoints with the highest peak latency, showing each endpoint and its maximum latency value.

Easy7m

High-Output Creators

The content team is evaluating high-engagement creators.

Easy9m

High Price Products

The procurement team needs a list of all products priced above 100.

Easy5m

High-Rated In-Stock Percentage

The sustainability team needs a quick metric: what percentage of products are both in stock and rated 4 or above?

Easy9m

High-Spend 2025 Campaigns

The marketing team wants to find ad campaigns that generated more than 100,000 in total revenue during 2026.

Easy5m

High-Traffic Endpoints in February

The product team wants to know how many distinct endpoints received 100 or more total API calls during February 2026.

Easy13m

High-Value Electronics

Easy5m

High Volume Batch Jobs

The data platform team needs all batch jobs that processed more than 3 million rows.

Easy5m

Holiday Promo Campaign Click Year

The growth team wants to find the year in which the influencer ad campaign surpassed 2000 total clicks.

Easy7m

Holiday Sale Campaign Revenue

Product analytics wants to spotlight the holiday sale campaign.

Easy7m

Inactive Unverified Users

The engagement team wants to find dormant trial users.

Easy5m

Initial Count

Easy5m

In-Stock Product Count

The inventory team needs a count of how many products are currently in stock.

Easy5m

Issuance Disparity

Max minus min metric per group, ranked by gap.

Easy10m

Japan Revenue for April

The regional sales team is pulling last month's numbers for the APAC region.

Easy13m

Joined Employee Details

Combine two related tables with a join.

Easy10m

Largest Group

Find the group with the highest member count.

Easy10m

Last Five Batch Jobs

The data pipeline team needs a quick tail check.

Easy5m

Last Migration Record

The migrations team wants the most recently applied migration, identified by the largest migration ID.

Easy7m

Last Server Activity

The infra team wants to know the last time each server reported a log entry.

Easy5m

Latency vs Regional Average

Engineering leadership wants to compare each service's latency against the regional average.

Easy9m

Latest Metric Values

The data platform team suspects stale records in employee metrics.

Easy11m

Latest Session Per User

For each user in user_sessions, find their most recent session start date.

Easy5m

Latest Version Per Service

The deployment dashboard needs the latest version deployed for each service.

Easy5m

Log Entries by Level

The SRE dashboard needs a count of server log entries at each log level, sorted by level alphabetically.

Easy5m

Log Levels

Easy5m

Log Priority

Easy5m

Log Volume by Day of Week

The SRE team needs incident frequency by day of week.

Easy11m

Longest Active Membership Streak

The compensation benchmarking team wants the five highest distinct metric values from employee metrics.

Easy7m

Longest Deploy With Full Identifier

The deploy report needs full identification for the deployment(s) with the longest duration.

Easy7m

Long Searches Containing 'er'

The search team is analyzing query patterns.

Easy5m

Low-Byte CDN Responses

The CDN team is investigating low-traffic edges.

Easy5m

Low-Engagement User Count

The product team is sizing a migration cohort.

Easy9m

Lowest Average Price Category

The cost optimization team wants to find the cheapest service category.

Easy9m

Low Latency API Calls

The SRE team wants to validate that certain endpoints are performing well.

Easy5m

Low Severity DQ Checks

The security team needs all data quality checks classified as 'low' severity.

Easy5m

Low Throughput Pipelines

The data quality team wants to identify low-throughput data pipelines.

Easy5m

Low Uptime Services

The SRE team is investigating underperforming services.

Easy5m

Max Value Per Location

Find the highest metric in each group.

Easy10m

Memory-Heavy Pods

The capacity team is scoping memory-intensive workloads.

Easy5m

Merge-Triggered Builds 2026

The release engineering team wants to know how many CI builds were triggered by merges in 2026.

Easy9m

Message Length

Easy5m

Messages Containing Keyword

The content moderation team is scanning for flagged terms.

Easy5m

Messages From Specific Users

The support team is investigating specific user complaints.

Easy5m

Metric Range Per Group

Find the max-min spread within each group.

Easy10m

Metric Value Quarter Complement

The data integrity team is spot-checking records where two metrics coincidentally match.

Easy5m

Mid-CPU Nodes

The capacity team needs to scope mid-tier nodes.

Easy5m

Mid-Range Cost Allocations

The compensation team needs a filtered report of cost allocation entries where the amount falls between 500 and 10,000 i

Easy5m

Mid-Tier Batch Jobs

The leaderboard team needs all batch jobs that rank between positions 8 and 10 by rows processed.

Easy7m

Missing Email for Non-Active Users

The notification platform team found an anomaly: some users from 2026 have no email on file but their account status is

Easy5m

Mobile Event Counts

The product analytics team is measuring engagement by device.

Easy7m

Monthly Active Users per Endpoint

For the platform health dashboard, count the distinct active users per endpoint during January 2026.

Easy11m

Monthly Category Totals

Sum amounts by category and month.

Easy10m

Monthly Deployment Count

The ops team tracks monthly deployment frequency.

Easy13m

Monthly Signup Counts

The growth team needs to track user registration trends by month.

Easy13m

Monthly Transaction Counts

Count transactions per user per month.

Easy10m

Monthly Unique Users per Campaign

The engagement team tracks monthly reach per ad campaign.

Easy13m

Morning Warning Logs

The SRE team needs all warning-level server log entries that occurred before noon.

Easy5m

Most Common Export Job Status

The analytics team wants the single most common status among batch jobs whose name contains 'etl'.

Easy11m

Most Recent Token Usage

The security team needs each user's most recent API token activity.

Easy5m

Multi-Column User Sort

Product analytics wants a quick directory of users sorted alphabetically by username, with ties broken by age bucket in

Easy5m

Multi-OS Users

The platform team wants to find users who have logged in from devices running different operating systems across session

Easy11m

Multi-Provider Cost Lookup

The finance team needs cloud cost amounts for the providers AWS, GCP, and Azure.

Easy5m

Multi-Variant Experiments

The experimentation platform team needs to find users who participated in both an 'onboarding' variant and a 'retention'

Easy15m

Never-Ordered Products

The inventory team wants to find products that exist in the catalog but have never been ordered.

Easy5m

Nodes in Target Regions

The infra team is triaging node issues and needs all infrastructure nodes located in 'us-east-1', 'us-west-2', or 'eu-we

Easy5m

Node Summary Per Region

The infrastructure team needs a region-level summary showing the total number of nodes and the number of distinct node t

Easy7m

No Gaps

Easy5m

Non-Bot Acknowledged Alerts

The on-call team is reviewing alert acknowledgments.

Easy5m

Non-Draft Content

The content team is filtering out draft content.

Easy5m

Notifications Opened on Date

The marketing team needs the total number of push notifications that were opened on January 2, 2026.

Easy5m

Nth Highest Salary

Find the 3rd highest amount from an employee table. A classic consulting firm screener.

Easy10m

Nth Largest Value

Select the row with a specific rank position.

Easy10m

NULL Join Behavior

Understand how NULLs affect join results.

Easy10m

Oldest and Newest User Sessions

The growth team wants to spotlight extremes in the user base.

Easy5m

One-Star Product Review Count

The product team wants to see review volume for the lowest-rated products.

Easy5m

Overall Average API Latency

The platform team needs the overall average latency across all API calls.

Easy5m

Peak Activity by Device

The platform team tracks user activity windows per device type.

Easy19m

Peak Ad Revenue Moment

The ad revenue team is looking for the peak earning moment.

Easy5m

Peak Metric Per Department

For the quarterly investor deck, the data team needs peak metrics per department.

Easy5m

Peak Non-Converting Month

The product team wants to find the month with the most users who had sessions but never made a purchase.

Easy17m

Peak Satisfaction

Easy5m

Peak Spending Month

The cost optimization team wants to identify the single billing month with the highest total cloud spend.

Easy9m

Pending Batch Jobs

The batch processing team needs to audit stuck jobs.

Easy5m

Pipeline Run History

The data catalog team needs a lineage summary.

Easy9m

Pipeline Throughput Ratio

Compute current-to-initial value ratio per period.

Easy10m

Platform Check

Easy5m

Platform Team Feature Flags

The data team is auditing feature flag ownership and needs all feature flags owned by a platform team, along with each f

Easy5m

Platform Team Mobile Flags

The feature flags team needs all flags owned by the 'platform' team where the flag name contains the word 'beta'.

Easy5m

Pod Distribution by Restart Count

The reliability team wants to understand low-restart pods.

Easy7m

Popular Categories

Easy5m

Price Check

Easy5m

Production Deployment Count

The SRE team is auditing deployment volume.

Easy5m

Production Deploys From April Onward

The DevOps team wants to count how many deployments went to the production environment during April or later (month 4 on

Easy5m

Product Name Letter Replace

The localization team needs a quick text transform on the product catalog.

Easy5m

Product Name Prefix

For a compact dashboard, the team needs just the first three characters of each product name, sorted by product ID.

Easy5m

Product Page Sale Searches

The search quality team wants to count search queries originating from the products page where the search term contains

Easy5m

Product Revenue Ranking

Easy10m

Products Without Sales

The content team wants to find stale inventory.

Easy5m

Profitable Categories by Price

Finance wants to spotlight the most profitable product categories.

Easy7m

Promo Campaign Cost per Acquisition

The marketing team needs cost-per-acquisition for campaigns advertised via push notifications (campaign names containing

Easy11m

Provider Cost Change H1

The FinOps team tracks cost fluctuations between the start and middle of the year.

Easy17m

Purchase Log

Easy5m

Push Reach

Easy5m

Q2 Search Volume

The product analytics team needs Q2 2026 search volume.

Easy5m

Quarterly Deployment Count

The release engineering team needs deployment counts broken down by quarter.

Easy13m

Recurring Error Types

The SRE team wants to identify recurring error types.

Easy5m

Regional Profits

Easy5m

Regional Status

Easy5m

Regions With 5+ Nodes

The infrastructure team needs to flag busy regions.

Easy5m

Retargeting Campaign Impressions

The campaign team wants all ad impressions tied to retargeting campaigns.

Easy5m

Revenue by Product

Easy5m

Revenue for Specific Users

Finance needs a quick revenue total for users alice and bob.

Easy5m

Reviews Per Reviewer

The engineering manager wants a workload summary.

Easy5m

Running Node Pairs

The infra team needs to find pairs of servers in the same region that are both healthy.

Easy13m

Satisfaction Score by Region

The platform team ran a developer satisfaction survey but forgot to collect region data.

Easy9m

Search Endpoint Status Distribution

The QA team needs the distribution of HTTP status codes for API calls to the health endpoint.

Easy5m

Searches by Users With Email

The product analytics team is reviewing search behavior for a specific user.

Easy5m

Search Terms Starting With G

The search relevance team wants to audit queries.

Easy5m

Second Highest Salary

Find the second largest distinct value using DENSE_RANK.

Easy10m

Second Highest Value

Find the second largest value without LIMIT.

Easy10m

Service Alert Frequency

The platform team wants to know how frequently each service appears in the alert events table.

Easy7m

Services With Most Error Occurrences

The reliability team needs to surface the noisiest services.

Easy9m

Service User Growth Rate

The growth team tracks user expansion by service.

Easy15m

Session Logins Dec 13 to 19

The security team needs to identify users who logged in during a specific window.

Easy5m

Session Pulse

Easy5m

Sessions Per Device Type

The product analytics team is segmenting user activity by device.

Easy5m

Signups by Age Bucket Since April

The hiring team wants to track recent growth.

Easy5m

Signups Jan to Jul 2026

HR is counting new hires during a specific window.

Easy7m

Slow Batch Jobs

Find orders delivered after the promised time.

Easy10m

Slow Failures

Easy5m

Slow Production Deploys

The SRE team needs to find production deployments that took longer than 150 seconds.

Easy5m

Sort Tokens by Scope Character

The data governance team needs to sort API token scopes for a compliance report.

Easy5m

Spend Categories

Easy5m

Status Report

Easy5m

Stock Status

Easy5m

Storage Node Lookup

The infra team is auditing critical nodes and needs to find all nodes with a database-type designation, along with each

Easy5m

Successful Deploy Endpoint Calls

The SRE team needs a count of successful API calls to the deploy endpoint, where success means a 200 status.

Easy5m

Successful Pipeline Runs

The reliability team wants to know which data pipelines have completed successfully and how often.

Easy9m

Successful Production Deploys

The operations team is reviewing successful production deployments that have duration data.

Easy5m

Suspected Bot Sessions

The trust and safety team flags sessions shorter than 5 seconds as potential bot activity.

Easy5m

Targeted Ad Campaigns

The ad ops team needs to find high-value campaign impressions.

Easy5m

Third Largest Batch Job

The data platform team needs to find the third largest batch job by total rows processed.

Easy7m

Threads Excluding User

The social graph team needs to count chat message threads that user 1 is not part of.

Easy5m

Three Lowest Distinct Cloud Cost Amounts

The cost optimization team wants the three lowest distinct cloud cost amounts.

Easy7m

Tiered Transaction Summary

Compute multiple date windowed aggregates in a single query.

Easy10m

Timeout Status Records

The data engineering team suspects some service health records have an unknown status marked as '?

Easy5m

Timeout Warning Logs

The SRE team is reviewing server logs for incident postmortems.

Easy5m

Titles Ending With S

The content catalog team is auditing naming conventions.

Easy5m

Top 100 Batch Jobs Total Output

The data platform team needs the total rows processed by the top 100 batch jobs ranked by rows completed.

Easy11m

Top 10 Batch Jobs

The platform team wants the top 10 batch jobs by rows processed.

Easy7m

Top 10 Model Accuracies

The ML team wants to spotlight top model performance.

Easy5m

Top 10 Slowest Endpoints

For the quarterly product review, rank endpoints by their peak single-request latency in 2026.

Easy19m

Top 5 Slowest DNS Lookups

The network team is auditing the longest DNS lookups.

Easy9m

Top Ad Campaigns by Revenue

The revenue team wants to rank ad campaigns by total earnings.

Easy5m

Top API Token Scopes

Engineering management wants to identify the scope associated with the highest-value API tokens.

Easy7m

Top Average By Region

Rank regions by their average metric value.

Easy10m

Top Deployed Model

The ML team wants to surface their best-performing deployed model.

Easy5m

Top Device by Sessions

The product team wants to know which device type generates the most user sessions.

Easy7m

Top Duration Content Items

The content team wants to find all content items that held the number one position (highest rating) in their category at

Easy9m

Top Five

Easy5m

Top Mobile OS by Session Duration

The growth team is studying engagement by device type and wants to know which mobile operating system has the highest av

Easy9m

Top Performing Models

The ML ops team needs all models considered top-performing, meaning those with accuracy at 95 or above.

Easy5m

Top Product Categories by Sales

The product team wants to identify the highest-grossing categories.

Easy5m

Top-Ranked Wines by Variety

The content recommendation engine needs to suggest content that fits within a user's average session duration.

Easy7m

Top Recent Sellers

Rank products by sales in a recent window.

Easy10m

Top Selling Items

Rank items by total revenue.

Easy10m

Top Shelf

Easy5m

Top Spenders Dense Rank

The product team ranks users by total transaction value.

Easy9m

Total Compute Cloud Cost

The FinOps team needs the total cloud spend for the 'compute' service category.

Easy5m

Total Cost by Category

The FinOps team needs total spend per category.

Easy5m

Total Engineering Cost Allocation

The finance dashboard needs the total allocated budget for the engineering team.

Easy5m

Total Rows by Pipeline Status

The data pipeline team needs row-level context alongside aggregates.

Easy5m

Total User Spend

Finance needs a summary of each customer's total spend.

Easy5m

Transaction Overview

The executive dashboard needs a quick snapshot: the total number of distinct users who have at least one transaction and

Easy5m

Transaction Source Features

The ML team needs all distinct feature names reviewed by the pipeline_v2 source that have a recorded average value.

Easy5m

Transactions With Product Names

Simple select progressing to a join

Easy10m

Trim Endpoints Right

The data quality team found trailing whitespace in endpoint names.

Easy5m

Trim Search Terms Left

The search team discovered leading whitespace in some search terms.

Easy5m

Tutorial Content Count

The content catalog team needs a quick count of how many content items have 'tutorial' somewhere in the title.

Easy5m

Unassigned Records

Find rows with no matching entry in another table.

Easy10m

Unique Hosts by Node Type

The capacity planning team needs a quick headcount.

Easy5m

Unique Searchers

The product team needs a quick count of search engagement.

Easy5m

Unique Searchers Count

The product team wants to know how many unique users have performed a search.

Easy5m

Unique Stream Topics

The data catalog team needs a clean inventory of all streaming topics.

Easy5m

Unique Visitors

Easy5m

Unmatched Categories

LEFT JOIN anti pattern to find categories with no products.

Easy10m

Unreviewed Models

The compliance team needs to flag models that have never been evaluated.

Easy5m

Unused Read Tokens

The security team needs to identify single-scope API tokens that are currently active.

Easy5m

US-East KV Store Entries

The KV store team needs a quick inventory of all entries stored in the us-east-1 region.

Easy5m

User Age Ranking

The platform team wants to rank users by age bucket in descending order.

Easy5m

User Engagement Totals

The product analytics team wants per-user engagement totals.

Easy15m

User Event Type Count

The engagement team needs unique event type counts per user, including users with zero events.

Easy9m

User Roster

Easy5m

User Sessions on Specific Days

The analytics team needs session data for user 4, but only sessions that started on a Saturday or Monday.

Easy5m

Users Per Device Type

For each device type, count the number of distinct users who have at least one session.

Easy13m

Users Who Clicked Ads

The engagement team needs all users who clicked on at least one ad, along with their account status.

Easy7m

Users With Purchase Events

The engagement team needs all users who have triggered at least one purchase event, along with their account status.

Easy7m

Verify Commit ID Uniqueness

The data engineering team suspects duplicate commit IDs in the repo.

Easy5m

View Count Per Page

The content analytics team needs view counts per page.

Easy5m

Weekly Transaction Volume

The ops team needs a weekly transaction volume report.

Easy7m

Whale Watch

Easy5m

Yearly Output

Easy5m

2026 Signup Count

The growth team needs a simple count of how many users signed up in 2026.

Easy5m

Join Behavior With Empty Tables

Predict the row count for INNER, LEFT, RIGHT, and FULL OUTER joins when one table has 0 rows. From an EY walk-in interview.

Easy8m

The Revenue Cliff

Track monthly revenue trends with MoM percentage change and flag drops >10% using LAG window function.

Medium18m

The Phantom Readers

Find active content readers who never made a purchase using anti-join pattern with mixed temporal scopes.

Medium18m

The Day-7 Retention Cohort

Build a weekly cohort retention report showing what percentage of users return after 7 days.

Medium20m

The Latest Transaction Per Product

Join products to transactions and use ROW_NUMBER to find each product's most recent sale.

Medium15m

10 Lowest Uptime Services

The SRE team wants the 10 services with the lowest uptime scores.

Medium18m

2FA Confirmation Rate

The security team tracks 2FA verification via push notifications.

Medium20m

7-Check Rolling Average

Compute a moving average over the last 7 entries.

Medium10m

7-Day Token Retention

The subscription analytics team tracks daily active premium users.

Medium24m

80th Percentile API Latency

The analytics team is benchmarking API latency percentiles.

Medium12m

90th Pctl Model Accuracy Gap

The operations team is measuring prediction accuracy for ML models trained within the first 14 days of 2026.

Medium20m

Above-Average Cloud Spend

The cloud cost optimization team wants to identify services with above-average spending.

Medium18m

Above Average Product Prices

The pricing team wants to find products whose base price (minimum transaction amount) exceeds the average base price acr

Medium14m

Above the Curve

Medium5m

Active Duo

Medium5m

Active Tokens on Target Date

The compliance team needs to identify token owners who were active on November 1, 2026.

Medium10m

Active Users by Session Count

Medium10m

Active vs Regional User Count

The people analytics team wants to know whether there are more active users or more users in the us-east-1 region among

Medium10m

Ad Revenue by Age Bucket

The marketplace team needs total ad revenue broken down by customer country.

Medium12m

After Hours API Calls

The compliance team needs to flag API calls made outside business hours in December 2026.

Medium12m

Alert Count by Severity Tier

The incident response team needs alert counts by severity tier.

Medium14m

Alert Severity Pivot by Service

The SRE team needs a pivot of alert counts by severity for each service.

Medium20m

All Known Endpoints

The platform team needs a unified, deduplicated list of all endpoints that appear in either the API calls table or the r

Medium12m

API Calls With and Without Errors

The data quality team needs a breakdown by endpoint showing how many API calls have an error message versus how many do

Medium12m

API Calls With Matching Status

The platform team suspects duplicate latency patterns.

Medium10m

API Token Churn Rate

The platform team needs the overall churn rate for API tokens.

Medium14m

API Traffic by CDN Edge

The marketplace team wants to profile CDN paths containing 'api' that serve the 'us-east' region.

Medium16m

App Stability by Region

Calculate crash rate using conditional aggregation

Medium10m

Auction Lot Summary

Aggregate bids per listing with highest bidder.

Medium10m

Authors Deploying to Dev and Production

The deployment team wants to find authors who have deployed to all three environments: dev, staging, and production.

Medium12m

Average Accuracy by Framework

The ML platform team wants to benchmark accuracy by framework, but only for deployed models with versions between 2 and

Medium14m

Average API Latency by Year

For the weekly platform review, the team needs the average latency per endpoint per year from the API calls table, sorte

Medium10m

Average Compensation by Department and Status

HR wants the average total compensation by department and account status.

Medium10m

Average Deploy Duration By Environment

Avg trip length per category within a time window.

Medium10m

Average Fulfillment Lag

The logistics team wants to measure fulfillment speed.

Medium14m

Average Initial Call Latency

The API performance team wants to benchmark first-contact latency.

Medium22m

Average Results for Python Searches

The search relevance team is benchmarking click-through.

Medium10m

Average Review Comments by Author

Engineering leadership is reviewing code quality by author.

Medium12m

Average Session Duration

Product analytics needs average session duration per user.

Medium20m

Average Update Call Latency

The API performance team measures latency for follow-up calls.

Medium22m

Avg Alerts by Severity

The people analytics team is studying alert response patterns by severity.

Medium20m

Avg Daily Active Users per Endpoint

The engagement team wants to measure average daily active users per endpoint for January 2026.

Medium16m

Avg Session Duration by Creator

The product analytics team wants the average session duration for users reviewed by each content creator, treating the c

Medium10m

Batch Job Performance Tiers

The ops team is grading batch job performance.

Medium12m

Best Accuracy to Training Time Ratio

The ML platform team wants to find the model with the best accuracy-to-training-time ratio.

Medium16m

Best Day for Ad Revenue

The analytics team wants to find which day of the month is best for ad revenue.

Medium20m

Biggest Deployment Decline

The ops team wants to know which team had the biggest month-over-month drop in deployments from March to April 2026.

Medium26m

Binary Flag Indicators

The feature management team needs a quick view of every flag's on/off state as binary columns.

Medium10m

Bottom Endpoints by POST Volume

The mobile team wants to identify endpoints with the least POST traffic.

Medium14m

Build Health

Medium5m

Builds per Author per Branch

The QA team wants to know how many builds each author triggered, broken down by branch.

Medium10m

Build Success Rate by Trigger

The CI/CD team wants to measure build success rates by trigger type.

Medium14m

Build Success vs Failure by Repo

The CI/CD dashboard needs a build outcome report.

Medium10m

Busiest Pipeline Month

The data engineering team wants to know which month saw the most pipeline runs.

Medium14m

Busiest Route by Passenger Volume

The engagement team wants Friday ad click counts.

Medium18m

Busy Authors

Medium5m

Campaign Click-Through Rates

The analytics team is comparing ad engagement across campaigns.

Medium28m

Campaign Cost Effectiveness

The marketing team needs cost effectiveness for each ad campaign between 2025 and 2026 inclusive.

Medium14m

Campaign Revenue by Click Channel

The marketing team is evaluating campaign effectiveness across ad formats.

Medium20m

Campaigns With Most Clicks

The ad platform team is reviewing campaign effectiveness.

Medium12m

Categories With Mixed Price Tiers

The cross-sell team targets users who engage with multiple content types.

Medium14m

Category Buyers

Medium5m

Category Revenue

Medium5m

CDN Traffic by Day and Hour

The ops analytics team wants to profile CDN traffic patterns.

Medium18m

Cheapest High-Rated Product

The procurement team is looking for the cheapest product with a rating above 4.

Medium12m

Classify Services by Name

The infrastructure team wants to classify each service into categories: 'api_service' if the name contains 'api', 'cache

Medium14m

Clicked Holiday Impressions

The engagement team wants the count of clicked ad impressions belonging to campaigns with 'holiday' in the name, limited

Medium10m

Click Rate

Medium5m

Click vs Non-Click Rates

The search quality team needs two metrics in a single row: the percentage of all search queries where a result was click

Medium18m

Cloud Cost Stats by Provider

Finance needs the minimum, average, and maximum cloud cost per provider, combining data from both the cloud costs table

Medium18m

Cloud Cost Trend Analysis

The procurement team tracks cloud cost trends across billing periods.

Medium38m

Combined Cloud Spend by Region and Service

The finance team needs total cloud spending for each combination of region and service, merging data from both the cloud

Medium18m

Completion Rate

Medium5m

Consistent High-Quantity Revenue

The revenue team wants total transaction revenue for each user and product pair, but only for pairs where every single t

Medium16m

Content Recommendation Engine

The content recommendations team wants to surface pages users haven't discovered yet.

Medium20m

Content Session Counts

The product analytics team needs session metrics per content item.

Medium30m

Cost Density Extremes

The cloud team needs to identify regions with the highest and lowest cost density, defined as total cost divided by the

Medium16m

Cost Share Within Category

The FinOps team wants to see how each cost entry contributes to its category's total.

Medium16m

Creator Stats

Medium5m

Creators With Top-Rated Content

The content team wants to find creators with the most top-rated items.

Medium22m

Cross-Region Customers

The compliance team needs to count how many distinct customers have placed orders in a region different from their home

Medium10m

Cross-Variant User Pairs

The experimentation platform needs to pair users who participated in the same experiment under different variants but on

Medium10m

Cumulative Monthly Revenue Avg

Finance needs a cumulative average of monthly revenue for 2026.

Medium26m

Currently Active Feature Flags

The ops team monitors feature flag state changes.

Medium16m

Customers Without Orders

The CRM team needs a count of customers who have never placed an order.

Medium12m

Custom Message Type Counts

The messaging team wants to count how many times each user received a message with a custom type (anything other than 't

Medium12m

Daily Error Count Change

The SRE team is tracking daily error trends.

Medium18m

Daily Error Resolution Ratio

The trust and safety team needs a daily ratio of removed content to reported content.

Medium22m

Daily Metric Percentage Change

Calculate day over day percentage change in a metric per entity using LAG.

Medium10m

Daily Session and User Counts

The engagement team wants daily session metrics.

Medium12m

Daily Spam Impression Rate

Trust and safety wants the daily spam rate for ad impressions that had an associated page view.

Medium22m

Daily Top Endpoints

The ops dashboard needs the top 3 endpoints by call count for each day.

Medium26m

Data Repo Fix Commits

The data engineering team needs to count commits mentioning 'fix' in repos with 'data' in the name (case-insensitive), e

Medium12m

Days with More Edited Than Unedited Messages

The messaging analytics team needs all records from days when the number of distinct message recipients exceeded the num

Medium16m

Deduplicate and Keep Latest

ROW_NUMBER dedup keeping latest row per key.

Medium10m

Deduplicated Sales Volume by Category

Deduplicate transactions, then sum sales per category.

Medium10m

Department Cost by Status

HR is building a headcount and compensation dashboard.

Medium10m

Department Running Totals

Compute cumulative metric values within each department using window operations.

Medium10m

Department Snapshot

Medium5m

Deploy Author Performance Score

The platform team wants to score deployment authors over the past 12 months using the formula (100 minus average duratio

Medium26m

Deployment Failure Impact

The reliability team wants to measure failure impact by service.

Medium16m

Deployments per Environment

The DevOps team needs a quick environment breakdown showing how many deployments went to each environment.

Medium10m

Deploy Reliability Scores

Engineering is running a deploy reliability competition among teams.

Medium10m

Devices Per Age Bucket

Product analytics needs a breakdown of device diversity among younger users.

Medium16m

Device Type Serving Most Users

The growth team wants to know which device serves the most unique users.

Medium14m

Disabled Flag Ratio

The platform team needs to calculate the ratio of disabled feature flags among all flags that were updated on January 10

Medium14m

Distinct Chat Conversations

The messaging analytics team needs total unique conversations in chat_msgs.

Medium18m

Diverse Shoppers

Medium5m

DQ Fail Rate by Table

The data quality team is benchmarking check pass rates across tables.

Medium18m

DQ Score Spread

The QA team wants to measure the spread in data quality scores.

Medium10m

Duplicate DQ Check Records

The data quality team suspects duplicate check entries.

Medium10m

Duplicated User Event Messages

The stream ops team needs a duplicated report of messages from the 'alerts' topic.

Medium16m

Duplicate Training Runs

The ML platform team is flagging duplicate model training runs.

Medium20m

Early User Activation

The onboarding team wants to track early activation.

Medium30m

Efficient Pipeline Throughput

The SRE team is benchmarking pipeline efficiency.

Medium14m

Employee Seniority Rank

Self join to compare rows within the same table

Medium10m

Endpoint Latency Spread

The SRE team needs to understand latency spread across API endpoints.

Medium18m

Endpoint With Most GET-Only Users

The platform analytics team wants the endpoint with the most users who have exclusively used the GET method.

Medium18m

Engagement by Content Type

The content team needs total engagement broken down by content type, using duration in seconds as the engagement proxy.

Medium12m

Engagement Gap

Medium5m

Error Rate by Region

Error rate per day and region via conditional aggregation.

Medium10m

Exclusive Users per Device Type

The product team measures platform exclusivity.

Medium12m

Experiment Conversion Pivot

The A/B testing platform needs a pivot report of conversion outcomes by variant.

Medium16m

Extract Deploy Versions

The release management team needs to extract the numeric version from deploy log entries for the staging environment.

Medium14m

Extreme API Token Usage

The platform team needs to flag outlier API tokens.

Medium16m

Extreme Category Totals

Find groups with the highest and lowest sums.

Medium10m

Extremely Late Resolutions

The SRE team flags incidents where resolution took more than 20 minutes beyond the predicted time.

Medium32m

Failed Constraint Checks Count

The ops team wants to know how many data quality checks with a constraint-related rule resulted in a failure.

Medium10m

Failure Rate

Medium5m

Fastest CI Build Date

The data team wants to find the build date of the fastest CI build ever recorded, showing the date and the duration.

Medium10m

Fastest Completion Per Day

Rank by speed within daily partitions

Medium10m

Fastest Regions by Latency

The performance team benchmarks API endpoints by region.

Medium18m

Feature Flag Adoption

The platform team is auditing feature flag adoption.

Medium10m

Feature Quality by Source

The ML platform team is profiling feature quality across data sources.

Medium24m

Feature Vote Winner

The product team runs a feature voting system where each user splits a single vote equally among all the flags they vote

Medium26m

Filtered Multi-Join Aggregation

Join three tables with filters and aggregate.

Medium10m

Find the Fifth Largest Cost

The analytics team wants to find the fifth highest cloud cost amount.

Medium14m

First and Last Peak Accuracy Dates

The ML team wants to know when peak accuracy was first and last recorded.

Medium10m

First and Last Timeout Per Service

The reliability team needs the first and last times each service experienced a timeout.

Medium12m

First Deploy Attribution

The growth team measures first-touch attribution by service.

Medium26m

First Half of Page Views

The data team wants to retrieve the first half of all page view records, based on the total row count, ordered by view I

Medium12m

First Time Learners Per Day

Find each day's brand new users

Medium10m

First Touch Attribution

Find each user's earliest channel interaction.

Medium10m

Flaky Domains

Medium5m

Frequent Message Senders

The messaging team wants to identify spammy senders.

Medium12m

Fulfillable Order Percentage

Logistics wants the percentage of orders that can be fulfilled.

Medium20m

Ghost Products

Medium5m

Heavy Ad Exposure

Find users with concentrated or broad ad exposure across campaigns.

Medium10m

Heavy Hitters

Medium5m

Heavy Namespaces

Medium5m

Highest and Lowest Cloud Costs

Finance wants to spotlight the extremes in cloud spending.

Medium12m

Highest Daily Spend

Finance needs the customers with the highest daily spending between March 1 and June 1, 2026.

Medium24m

Highest Node Density Regions

The infrastructure capacity team wants to find the top 3 regions by node density, defined as distinct nodes divided by d

Medium24m

Highest Throughput Pipelines

The data engineering team needs to find which pipelines produce the highest throughput for runs started in 2026.

Medium12m

Inactive Android Control Users

The experimentation platform team is debugging a test cohort.

Medium10m

Inactive Users in Date Range

The growth team needs to identify users who had zero sessions between June 1 and July 1, 2026.

Medium14m

Inactive vs Suspended Engagement

The product team compares engagement between premium and free-tier users.

Medium22m

iOS Adoption by Age Bucket

Product analytics is measuring mobile adoption across age buckets.

Medium26m

iOS Sessions by Device Type

The growth team wants to measure mobile engagement by region.

Medium12m

Job Status Duration

The data pipeline team tracks job state transitions.

Medium32m

Keep Most Recent Record

Deduplicate rows keeping the latest per key

Medium10m

Keyword-Based User Search

The search quality team needs to find users whose search terms contain the words 'deploy', 'release', 'build', or 'ship'

Medium14m

Largest A/B Test by Participants

The experimentation platform team wants to find the test with the most unique participants.

Medium16m

Largest Single Cloud Cost

The FinOps team needs to find the single largest cloud cost entry, showing the service name and the amount.

Medium10m

Latency Gap to 10th Fastest

The SRE team wants to compare server 'web-prod-07' against the 10th-fastest server by average response time.

Medium20m

Latest Migration Output per Author

The data pipeline team needs each author's most recent applied migration balance.

Medium18m

Leading ML Frameworks by Accuracy

The ML platform team wants to identify the most popular training frameworks.

Medium12m

Least Viewed Content

The product team needs to find the least-viewed content for potential deprecation.

Medium16m

Longest Gap Between Token Events

The people ops team tracks hiring and attrition gaps.

Medium22m

Longest Running Pipeline

The platform team wants to find the single data pipeline that ran the longest, returning just that pipeline's name.

Medium10m

Long Messages

Medium5m

Long-Running Feature Flags

The platform team is auditing long-running feature flags.

Medium16m

Lowest Cost Network-Heavy Team

The cost optimization team is investigating teams that spend more on compute than storage.

Medium14m

Lowest Latency per Service

The SRE team wants the lowest latency recorded for each service in the us-east-1 region.

Medium12m

Low Severity Checks in 2026

The compliance team needs the total number of 'low' severity data quality checks that ran in 2026.

Medium10m

Low-Volume Stream Topics

The stream processing team is looking for low-volume topics.

Medium10m

March Revenue by Customer

The revenue team needs March 2026 transaction totals per customer.

Medium14m

Median Null Percentage of Float Features

For the data quality report, the team needs the median null percentage across all ML features that have a data type of '

Medium10m

Mentorship User Pairs

The people ops team is building mentorship pairs.

Medium10m

Metric Count

Medium5m

Metric Value Pairs Over Threshold

The QA team is exploring metric correlations.

Medium10m

Minimum Cost Per Provider

The FinOps team needs the lowest monthly cloud bill for each provider.

Medium10m

Mobile vs Desktop Session Duration

The growth team wants to compare user engagement by device type in 2026.

Medium12m

Models With Variable Accuracy

The ML platform team wants to find models whose accuracy has varied across versions.

Medium12m

Model Training Completion Rate

The ML platform team reviews model evaluation results.

Medium20m

Monthly Cohort Retention

Compute month over month retention rates for user signup cohorts.

Medium10m

Monthly Revenue Comparison

The growth team is tracking month-over-month revenue changes per product.

Medium28m

Monthly Running Total

Cumulative sales per product across months.

Medium10m

Monthly Spend Pivot by Provider

The FinOps team needs a pivot table of monthly cloud spend by provider.

Medium20m

Monthly Transaction Summary

The product analytics team needs a monthly engagement summary for 2026.

Medium16m

Month With Fewest Deploys

The on-call team wants to find which calendar month (across all years) had the fewest deployments.

Medium16m

Most Active Chat Users

The community team wants to recognize power users on the chat platform.

Medium22m

Most Active Recent Committers

The code velocity team wants to find the most active committers over the past 2 years relative to the latest commit in t

Medium18m

Most Active Servers by Log Volume

Infrastructure wants to identify the busiest servers by log volume in 2026.

Medium14m

Most Commented Code Review

Product wants to surface the code review that received the most comments.

Medium12m

Most Common Monday Outcome

The A/B testing team wants to find the most common experiment outcome among experiments created on a Monday.

Medium14m

Most Efficient API Endpoint

The SRE team is evaluating which API endpoint delivers the best throughput efficiency.

Medium10m

Most Frequent Error Types

The platform team wants to know which error types occur most frequently.

Medium10m

Most Ordered Product by Country

The international expansion team needs to identify popular products in specific markets.

Medium18m

Most Popular Content Type

The product team wants to identify the most popular content type.

Medium12m

Most Popular Signup Day

The product team wants to know which day of the week sees the most user signups.

Medium12m

Most Profitable Region Month

Finance wants to find the single most profitable region-month combination in 2026.

Medium22m

Multi-Host Regions by Node Type

The growth team is analyzing regional infrastructure density.

Medium14m

Multi-Table Report

Join three tables into a summary report.

Medium10m

Mutual Channel Connections

The social features team wants to find mutual connections between two specific users.

Medium12m

Negative Outcome Rate for New Users

Calculate the ratio of problematic events for users within their first 14 days.

Medium10m

Net Lines

Medium5m

New Customers Per Day

Count users whose first order falls on each date.

Medium10m

New User Purchases

Medium5m

Nodes by Region and Type

The infra team needs a headcount of nodes broken down by region and node type, sorted alphabetically by region.

Medium10m

Nodes in Key Regions

The infra team is auditing node distribution across six key regions: us-east-1, us-west-2, eu-west-1, eu-central-1, ap-s

Medium10m

Noisiest Tables by DQ Failures

The data quality team wants to surface the noisiest tables.

Medium12m

Noisy Endpoints

Medium5m

Non-Trivial Fatal Errors

The analytics team categorizes error messages by length: 'short' (under 50 chars), 'mid' (50 to 150), and 'long' (over 1

Medium12m

Notification Delivery Ratio

The messaging reliability team needs the delivery ratio: what fraction of all sent push notifications were actually open

Medium12m

Notification Open Rate

The ops team tracks how often push notifications are opened.

Medium14m

Notifications Pivot by Weekday

The product team wants notification volume pivoted by platform and day of week.

Medium22m

Nth Highest Salary Per Department

Find the 3rd-ranked value within each partition.

Medium10m

Opened Notifications in Jan-Feb

The marketing team needs to know how many push notifications were successfully opened during January and February combin

Medium10m

Over-Budget Services

The PMO team needs to flag over-budget data pipelines.

Medium32m

Overlapping User Sessions

Self join to detect overlapping date ranges per user.

Medium10m

Overloaded Infrastructure Nodes

The capacity planning team considers a node overloaded if its CPU exceeds 90% or its memory exceeds 85%.

Medium10m

Pages Viewed by Session Duration

The engagement team wants to understand which session durations drive the most page views.

Medium10m

Pairwise Latency Maximum

The analytics team needs all pairwise latency comparisons.

Medium20m

Peak API Hour

The capacity planning team needs to identify the peak traffic hour of the day.

Medium18m

Peak Hour Power Callers

The support analytics team wants to count how many distinct users made 3 or more API calls during the afternoon rush (15

Medium20m

Peak Latency for 2026-Era Endpoints

The platform team is reviewing peak API load.

Medium16m

Peak Retargeting Revenue Month

The revenue team needs a monthly breakdown for ad campaigns containing 'retarget' in 2026.

Medium20m

Pipeline Completion Rate

The onboarding team tracks how far users get through multi-step flows.

Medium18m

Pipeline Overhead by Environment

The data pipeline team compares processing overhead between production and staging.

Medium16m

Pipeline Recovery by Priority

The data platform team is benchmarking pipeline recovery times.

Medium22m

Pipe Throughput

Medium5m

Pivot Event Counts

Reshape rows into columns by event type.

Medium10m

Platform Speed

Medium5m

Pod CPU to Memory Ratio

The capacity planning team is analyzing resource efficiency.

Medium16m

Power Users

Medium5m

Power Users by Session Activity

The growth team defines a power user as someone with more than 3 sessions and over 100 total pages viewed across all ses

Medium16m

Power Users by Session Count

The growth team wants to find power users who have logged more than 3 sessions.

Medium14m

Price Rank

Medium5m

Priciest Item in Each Category

The product team wants to surface the highest-priced item per category for a pricing review.

Medium12m

Product Ratings vs Sales

Medium5m

Products With Strong Unit Price

The marketplace team wants to identify high-performing budget products.

Medium18m

Profit Tiers

Medium5m

Prolific Authors in Largest Service Teams

The people analytics team wants to identify senior leads in the largest service teams.

Medium20m

Provider Spend Variance Between Halves

The FinOps team wants to compare regional cloud spend between March 11, 2026 and April 11, 2026.

Medium18m

Push Notification Open Rate

The growth team tracks push notification engagement.

Medium26m

Push Notification Status Pivot

The notification analytics team needs a pivot showing push notification counts by status for each user.

Medium14m

Push Opens by Platform and Campaign

The push notification team is measuring opt-in engagement.

Medium10m

Quarterly Consolidated Cloud Costs

The finance team needs to consolidate international cloud costs into a single currency, using regional conversion factor

Medium20m

Rank Users by Search Query Count

The platform team wants to rank users by total search queries.

Medium24m

Rapid Retry Detection

Detect retried API calls within 5 minutes of failure.

Medium10m

Rate Limit Rules Per Endpoint

The rate limiting team wants to document threshold rules per endpoint.

Medium20m

Rating Tiers

Medium5m

Recent Price Drops

Find items with a price decrease or sale event recently.

Medium10m

Regional Order Summary

The regional ops team needs order metrics by region: the number of orders, the number of customers, and the total profit

Medium18m

Regions by Alert Volume

The ops team wants to rank regions by incident volume.

Medium10m

Region With Best Uptime

The reliability team needs the single region with the highest average service uptime.

Medium12m

Region With Most Nodes

The infrastructure team wants to know which region hosts the most nodes.

Medium12m

Repeat Buyers Across Halves

The retention team wants to find power users who made at least 3 transactions in both 2025 and 2026.

Medium16m

Repeated Transactions

Detect same amount transactions within 10 minutes.

Medium10m

Repeat Purchases Within a Week

Find users with orders less than 7 days apart.

Medium10m

Repeat Purchase Window

The retention squad is looking for repeat purchasers.

Medium14m

Repository Commit Ranking

The data engineering team wants to rank repositories by total lines added across all commits.

Medium16m

Repos with More Builds Than Commits

The CI/CD team wants to compare build volume to deploy volume.

Medium12m

Response Buckets

Medium5m

Returning Buyers

The retention team wants to identify returning buyers.

Medium16m

Revenue Per Product With Zeros

Finance is calculating total revenue per product, including products that have never sold.

Medium12m

Reviewer Performance Metrics

The code review analytics team wants performance metrics for each reviewer who has reviewed at least one merged pull req

Medium28m

Reviewers Per Repo Per Year

The code review team wants to know which repos had the most reviewers each year.

Medium16m

Revoked Tokens by Scope

The trust and safety team counts blocked API tokens by scope for December 2026.

Medium26m

Rolling Weekly Total

Rolling 7 day sum per user via window frame.

Medium10m

Rows With Multiple Flag Conditions

The data engineering team is cleaning err_tracks.

Medium12m

Runner-Up Cost Without ORDER BY

The cost team wants to identify the second highest cloud cost amount on record.

Medium10m

Running Tab

Medium5m

Rush Hour API Latency

The capacity planning team wants to analyze rush-hour API behavior in the 'us-east' region.

Medium14m

Same-Day Signup Rate

Percentage of transactions on the signup date.

Medium10m

Same First and Last Reply Target

The messaging platform team wants to find users whose first and last message in a channel on the same day were sent to t

Medium14m

Satisfaction by Platform

The product team is benchmarking user satisfaction across platforms.

Medium16m

Second Highest Cloud Cost

The compensation team wants the second highest distinct cloud cost amount on record.

Medium12m

Second Highest Latency by Method

Engineering leadership wants the second-highest latency API endpoint in each HTTP method group.

Medium18m

Senior to Junior Ratio

HR analytics is studying workforce age distribution per department.

Medium14m

Servers Returning to Origin

The infra team tracks server migrations between regions.

Medium24m

Server With Most Errors

The SRE team needs to find the single server with the most logged errors.

Medium14m

Service Budget per Head

The PMO wants to know budget-per-head for each data pipeline.

Medium18m

Service Component Classification

The observability team is classifying services by naming pattern.

Medium18m

Service Provider Statistics

Per endpoint stats: count, avg latency, high latency count.

Medium10m

Service Reliability Tiers

The product team wants to classify services by reliability tier based on uptime: 99.

Medium16m

Services at Median Uptime

The SRE team needs to identify services performing exactly at the median uptime.

Medium10m

Service Uptime Minutes

Calculate uptime from status change events.

Medium10m

Session Overview

Medium5m

Session Rank

Medium5m

Sessions by Content Type

The product analytics team is breaking down engagement by content format.

Medium10m

Shared Category Purchasers

The product team is building a recommendation engine and wants to find products purchased by more than one user.

Medium24m

Shared Endpoints

The security team is looking for shared credentials.

Medium10m

Signup to Subscription Rate

Conditional aggregation for conversion rates

Medium10m

Silent Users

Medium5m

Single Service Owners

The platform team is running an infrastructure audit and needs to identify owners who are responsible for exactly one se

Medium18m

Spending Tiers

Medium5m

Split Metric Sums

The analytics team needs two separate sums from employee metrics: one for metric IDs below 5 and another for metric IDs

Medium14m

Subscribers Without Premium

Find users who have one plan but never another

Medium10m

Successful Build Duration by Repository

The engineering productivity team measures CI throughput.

Medium20m

Sum Excluding Extremes

The finance team needs to remove outliers from a latency report.

Medium16m

Super Reviewers

The code review team wants to identify prolific reviewers.

Medium14m

Symmetric Reply Network

The messaging team stores directed replies in chat messages.

Medium10m

Tables With Many DQ Failures

The data quality team wants to find tables that have never passed a single check.

Medium16m

Tables With Most DQ Failures

The compliance team needs to know which tables have the most data quality failures.

Medium12m

Teams Below Double Average Spend

The data platform team tracks cost allocation by team.

Medium12m

Third Highest Spender

Finance is reviewing top spenders.

Medium18m

Three-Item Combinations

Generate all unique 3-item sets with total cost.

Medium10m

Three-Value Sum Combinations

The analytics team needs all combinations of 3 distinct employee metric records whose values sum to exactly 8.

Medium10m

Throttled Clients

Medium5m

Token Churn Rate

The growth team calculates churn for API tokens in September 2026.

Medium28m

Tokens With Non-Read Scope Prefix

The security team wants to verify that all API tokens have a scope starting with 'read'.

Medium12m

Top 10 AB Test Variants

The growth team is compiling a leaderboard of top-performing A/B test variants for the 'onboarding_flow_v2' test.

Medium14m

Top 10 CPU-Heavy Nodes

The infrastructure team needs a quick resource report showing the ten nodes consuming the most CPU, with their node ID,

Medium10m

Top 10 Rated Products

The product catalog team wants to highlight top-rated items.

Medium12m

Top 2 Active Push Days

The product team wants the two most active days during a recent push notification campaign (August 1 to 7).

Medium28m

Top 2 Ad Campaigns by Spend

The marketing team wants the top two ad campaigns by total spend, excluding any campaign whose name contains 'test'.

Medium18m

Top 2 Busiest API Slots

The ops team wants to identify their two busiest time slots each week.

Medium24m

Top 2 Callers per Endpoint

The team lead dashboard shows the top 2 users by API call volume for each endpoint.

Medium20m

Top 2 Cloud Services by Cost

The analytics team is forecasting demand for the holiday season.

Medium22m

Top 2 Rate-Limited Clients

The rate limiting team needs to find the two clients generating the most blocked requests.

Medium14m

Top 3 First-View Pages

The content team wants to identify the top 3 pages that users engage with earliest after signup.

Medium18m

Top 3 Revenue Months

Finance wants to identify the three highest-grossing months.

Medium18m

Top Accuracy Model

The ML platform team wants to highlight the best performing model.

Medium10m

Top Active API Tokens

The API platform team wants the issuance dates of the 5 most-used API tokens that have not yet expired.

Medium10m

Top Active Senders per Channel

The comms team wants the top 3 messages by reply count in each chat channel.

Medium26m

Top Alert Resolvers

The reliability team wants to recognize the engineers who resolve the most alerts.

Medium12m

Top API Caller

The analytics team wants to find the user who triggered the most API calls, showing their user ID and total call count.

Medium12m

Top AWS Non-APAC Service Costs

The procurement team wants the highest cloud cost in the AWS provider for each service that appears exclusively in US re

Medium26m

Top Batch Job Under Priority 1

The batch processing team wants to find the job(s) with priority 1 that achieved the highest rows completed.

Medium16m

Top Campaign by Opens

The growth team wants to find which campaign drove the most push notification opens.

Medium14m

Top Campaign by User Revenue

For each user who clicked an ad, the marketing team wants to know which campaign generated the most revenue for that use

Medium22m

Top Chat Contributors

The messaging platform team wants the top 10 most active users by total messages.

Medium18m

Top Committers in 2025

Engineering leadership wants to know which repo authors had the most top-10-ranked commits by lines added in 2026.

Medium20m

Top Content by Lifetime Value

The content team defines a content item's lifetime value as the total viewing duration (sum of dur_ms from page views) a

Medium40m

Top Content by Views

The product team wants the top 5 content items by total page views.

Medium24m

Top Content Flagger

The trust and safety team reviews flagged content.

Medium26m

Top Cost Categories

The cloud finance team wants the top 3 cost categories by total spend.

Medium20m

Top Cost Entry per Team

Engineering leadership wants the highest-cost entry within each team.

Medium12m

Top Earner Per Campaign

The growth team needs to know which user earns the most revenue per ad campaign.

Medium14m

Top Error Categories in 2025

The error tracking team wants the top error categories for 2026.

Medium10m

Top Frameworks by Accuracy

The ML platform team wants the top 3 frameworks by average accuracy among production models.

Medium20m

Top Identified Event Types

The content team wants to identify the top 3 users by total event count, but only those who had more 'create' events tha

Medium20m

Top Lessons Each Month

Rank items within time periods and keep top 3

Medium10m

Top Metric per Department

Engineering leadership wants to spotlight the top performer in each team.

Medium12m

Top Pattern Matches

Find rows matching a text pattern, ranked by count.

Medium10m

Top Percentile Spenders

Top 1% of users by total spend via percentile bucketing.

Medium10m

Top Product Categories

The product team wants the top 3 content categories by total page views.

Medium18m

Top Product Category by Transactions

The regional sales team wants to find organic transactions in August 2026, meaning purchases where the user had no activ

Medium20m

Top Products by Quantity Sold

The revenue team is preparing their Q4 analysis and needs to see which products moved the most volume.

Medium14m

Top Products per Category

Find the top 5 products by total sales within each product category.

Medium10m

Top Products per Category

The merchandising team wants the top 2 products by total quantity sold in each category.

Medium22m

Top Ranked Content Creators

Rank creators by top-10 content appearances.

Medium10m

Top Region by Order Volume

The ops team needs the single region that generates the most orders.

Medium12m

Top Regions by Critical Alerts

The SRE team wants the top 5 regions by critical alert volume.

Medium20m

Top Regions by Effective Uptime

The SRE team calculates effective uptime hours for each service region.

Medium30m

Top Repos by Successful Builds

The DevOps team wants to find which repos produce the most successful CI builds.

Medium16m

Top Revenue Products H1

The revenue team is doing their H1 review and wants to see the top 5 products by total amount for January through June.

Medium16m

Top Services by Regional Cost

The FinOps team is analyzing spend distribution in the 'us-west-2' region.

Medium12m

Top Services by Uptime

The SRE team wants to rank services by average uptime.

Medium28m

Top Services Per Provider

Rank services by total spend within each cloud provider and return the top two.

Medium10m

Top Spender

Medium5m

Top Users by Pages Viewed

The engagement team wants the top 5 users with the most total pages viewed across all sessions.

Medium12m

Top Users by Recent Spend

Find the top 10 users by total transaction amount in the last 30 days.

Medium10m

Top Users by Session Time

The product team wants to find the top 10 users by total session duration.

Medium12m

Transaction Share of User Spend

The finance team is analyzing each transaction's share of its user's total spend.

Medium22m

Transaction Timeline

Medium5m

Trend Spotter

Medium5m

Unclicked Searches by Campaign

The product team wants to know how many search queries were performed per ad campaign in the us-east-1 region by users w

Medium10m

Unique Hostnames per Region

The infrastructure team needs to count unique hostnames per region.

Medium22m

Unique Reporters per Content

The content moderation team needs a per-item flag count.

Medium22m

Unmatched Deploy Services

The data platform team maintains two pipeline registries and wants to find service names that appear in deploy logs but

Medium16m

Unsold Product Categories

The catalog team suspects dead inventory is inflating storage costs.

Medium18m

US Active User Share

The regional analytics team wants to know what percentage of active users are based in the US.

Medium14m

User Devices

Medium5m

User Engagement Summary

The engagement team needs a holistic user summary showing total sessions and total search queries per user.

Medium12m

Users Outperforming Control

The experimentation team suspects some users perform better in treatment than control.

Medium10m

Users With Admin Tokens

The permissions team needs to find users who hold API tokens with 'admin' in the scope.

Medium14m

Users Without Sessions

The product team wants to find users who have never started a session.

Medium10m

User With Most Transactions

The ops team wants to identify top contributors.

Medium14m

Word Count Per Message

The content team wants to estimate message verbosity.

Medium10m

Workers Earning Above Department Average

The product analytics team needs to know how many transactions each product has.

Medium10m

Yearly Build Duration by Repo

Engineering leadership wants to benchmark build times.

Medium10m

Year-over-Year Content Launches

Product management wants to measure launch velocity changes.

Medium16m

Zero Accuracy on First Training

The ML platform team wants to know what percentage of models had an accuracy of 0 on their very first training run.

Medium26m

Cumulative Sales Per Customer

Compute a running total of transactions per customer, ordered by transaction date. Reported from a Deloitte technical round.

Medium15m

Funnel Leakage Report

Build a multi-step conversion funnel with drop-off rates, broken down by tags.

Hard25m

The Session Stitcher

Sessionize a stream of page views using gap-based detection with LAG and running SUM window functions.

Hard30m

The Regional Cost Reconciliation

Build a running financial balance per region from transactions, orders, and cloud_costs using UNION ALL and window SUM.

Hard25m

The Cannibalization Report

Analyze product cannibalization by comparing category sales before and after new product launches.

Hard30m

2nd Most Common Content Type

The product team is investigating the 3rd most common content type among items created by creators who also appear in th

Hard28m

7-Day Onboarding Conversion

The growth team is measuring onboarding-to-engagement conversion.

Hard38m

Above Category Avg

Hard5m

Active User Penetration Rate

The growth team defines an active user as one whose last session started within the last 30 days, who has at least 5 ses

Hard36m

Adopters Before Migration

Find users of feature A who never used B before A

Hard10m

Aggregate Votes by Paper Subject

The finance team needs daily net revenue for product 101 in the US region between April 15 and April 28, 2026.

Hard38m

Allocations in Top Spending Region

The finance team needs all cost allocation entries for teams operating in whichever region has the highest total cloud s

Hard32m

Alphabetical Tag Sort

The content moderation team needs to normalize tag lists.

Hard20m

API Call Distribution Fraction

The SRE team needs to see what fraction of total API calls each method and status combination represents.

Hard26m

Average Event Progression Time

The product analytics team is measuring funnel velocity.

Hard30m

Average Sessions Per User

The social features team wants to know the average number of sessions per user.

Hard20m

Best Selling Product by Month

The revenue team needs a monthly report showing the highest-grossing product each month by total transaction amount (qua

Hard38m

Bottom 2% Services by Spend

The cost optimization team needs to identify the bottom 2% of services by total cloud spend in May 2026.

Hard36m

Cache Efficiency

Hard5m

Campaign Bookend Engagement

Product analytics wants to compare engagement bookends for each ad campaign.

Hard36m

Campaign Conversion Count

The growth team ran a push notification campaign starting one day after each user's first transaction.

Hard34m

Campaign Conversion Window

Hard10m

Campaign Engagement Rank Shift

The international growth team wants to compare ad engagement across countries between December 2025 and January 2026.

Hard36m

Category Deep Dive

Hard5m

Cheapest and Most Expensive Service per Region

The finance team needs the cheapest and most expensive service in each region by cloud cost, combining data from both th

Hard30m

Cheapest CDN Route

The infra team is optimizing cross-region data transfer costs.

Hard38m

Classify Accounts by Activity Tier

The analytics team wants to find the highest latency value that appears exactly once across all API calls.

Hard32m

Clean Migrations

Hard5m

Cloud Cost Breakdown by Provider

The FinOps team is reviewing cloud costs by provider from 2022 to 2025.

Hard26m

Commit Cadence

Hard5m

Consecutive Cost Growth Periods

The finance team wants regions where cloud spending has increased for 5 consecutive billing periods.

Hard34m

Content Page Spreads

The documentation team formats content items into two-column page spreads.

Hard28m

Cost Efficiency Variance

The cloud finance team analyzes cost efficiency.

Hard44m

Creator Favorite Content Type

For each content creator, the editorial team wants to know which content type they publish most often.

Hard28m

Department Quarterly Pivot

Engineering leadership wants a pivot table showing employee count by department and fiscal quarter.

Hard32m

Deploy Velocity

Hard5m

Endpoint Name Word Count

The data catalog team wants to measure endpoint name complexity.

Hard22m

Endpoint Ranking

Hard5m

Error Category Breakdown

The SRE team is categorizing error tracks for an incident postmortem.

Hard24m

Exact Keyword Counts in Logs

The SRE team needs to count exact occurrences of the words 'error' and 'warning' in server log messages.

Hard34m

Experiment Impact

Hard5m

Experiment Variant Ratios

The experimentation platform needs a participation ratio report.

Hard26m

Fastest and Slowest Services by Region

The platform team needs the highest and lowest latency services per region, excluding any region whose name contains 'te

Hard32m

Fastest Page View to Click

The performance team is investigating page interaction latency.

Hard36m

Feature Flag Engagement Impact

The product analytics team wants to understand how feature flags impact engagement.

Hard42m

Feature Flag Fan vs Detractor Pairs

The experimentation team wants to pair the strongest fans of each feature flag with its strongest detractors, matched by

Hard26m

Feature Name Intersection

The ML features team maintains feature names from two sources: 'training' and 'serving'.

Hard20m

First-Day Session Retention

The growth team needs the first-day retention rate.

Hard38m

First Interaction Credit

Attribute transactions to earliest touchpoint

Hard10m

Flatten Org Chart Hierarchy

Traverse a tree structure with recursive queries.

Hard10m

Friday Spending Analysis

The finance team is analyzing Friday spending patterns during Q1.

Hard26m

Full Funnel

Hard5m

Healthiest Service Check History

The SRE team wants to audit the healthiest service's check history.

Hard32m

High Engagement Pages

The product analytics team measures content engagement depth.

Hard28m

Impressions by Search Keyword

The ad analytics team wants to measure campaign performance by search keyword.

Hard26m

Incident Keyword Messages

The content moderation team is flagging chat messages that mention incident keywords: 'outage', 'incident', 'rollback',

Hard20m

Intra-Region Latency Diff

The infra team is comparing latency differences between nodes in the same region.

Hard24m

Largest CDN Response

The CDN team is debugging edge performance and wants to know which edge location served the largest single response in 2

Hard24m

Latency Quartiles Per Endpoint

The performance analytics team wants quartile breakdowns of API latency per endpoint.

Hard24m

Latency Variance and Std Dev

The ML team wants to understand how much latency varies across successful API calls (status 200).

Hard20m

Longest Uptime Streak

The SRE team tracks service health check results as 'pass' or 'fail'.

Hard40m

Longest Visit Streaks

The growth team needs the top 3 users with the longest continuous daily visit streaks up to 2026-08-10.

Hard44m

Lowest CPU Pods per Namespace

For each Kubernetes namespace, the infra team wants the 5 pods with the lowest CPU usage.

Hard26m

Market Share

Hard5m

Median Cloud Cost by Service

The compensation team needs the median cloud cost amount for each service.

Hard26m

Median Failure Rate by Table

The data quality team wants the median failure percentage for each table being monitored, rounded to the nearest whole n

Hard24m

Median Household Earnings

The cost optimization team needs employees whose cloud cost allocation is above average for their team but who are not a

Hard40m

Median Model Accuracy

The ML platform team needs the median accuracy for each model.

Hard28m

Median Transaction by Category

The finance team needs the median transaction amount for each product category.

Hard28m

Minimum Parallel Workers

The platform team needs to determine the minimum number of parallel workers required to run all batch jobs without confl

Hard36m

Model Accuracy Drift

The ML platform team wants to evaluate model drift.

Hard46m

Mode of Small Team Costs

The cloud FinOps team wants to find the most common per-service cost amount among small teams (teams with no more than 3

Hard22m

Monthly Cloud Cost Forecast Error

The data science team is building a naive forecast for cost per row, defined as cloud cost amount divided by pipeline ro

Hard46m

Monthly Deploy Counts Pivoted

The observability team wants to see monthly deployment counts for the 'production' environment, pivoted into separate co

Hard30m

Monthly Revenue Change

The finance team needs month-over-month revenue trends.

Hard36m

Monthly Service Retention

The product analytics team calculates monthly retention per service.

Hard42m

Most Efficient High-Volume Campaign

The marketing team wants to find the ad campaign with the lowest peak monthly spend that still delivers more than 1500 c

Hard34m

Most Efficient Region by Token Usage

The growth team ranks regions by token efficiency.

Hard30m

Multi-Category Buyers

Hard5m

Multi-Month Active Users

The growth team's retention analysis needs to identify users who logged sessions on at least 3 consecutive calendar days

Hard34m

New Services With Poor Health

The product quality team reviews services launched in H2 2026.

Hard30m

New vs Returning User Share

The growth analytics team calculates the monthly share of new vs returning users.

Hard38m

Oldest Alert per Service

For the incident review, the on-call team needs the oldest unresolved alert for each service.

Hard20m

Peak Concurrent Pods

The infra team needs peak concurrent pod usage.

Hard26m

Peak Concurrent Tokens

For every API token that has ever been issued, find the greatest number of concurrently active tokens during its lifetim

Hard36m

Pipeline Duration vs Throughput

The data engineering team wants to know whether pipeline throughput correlates with duration.

Hard28m

Previous Day Top Service

The daily revenue dashboard needs each date's top-spending service from the previous day.

Hard42m

Price Pairs

Hard5m

Quarterly Peak Cloud Costs

Finance needs a quarterly pivot for the cloud cost review.

Hard24m

Quarter-over-Quarter Latency Trend

The SRE team is tracking how average API latency has changed quarter over quarter from Q1 2023 through Q4 2025.

Hard36m

Regional Sales Growth QoQ

The finance team needs quarter-over-quarter sales growth by region.

Hard30m

Regional Spend Leaders

Hard5m

Resolved vs Unresolved Alerts

The incident response team needs a breakdown of alerts by severity.

Hard22m

Rolling Revenue Average

Finance needs a 3-month rolling average of total revenue from transactions, excluding refunds (negative amounts).

Hard44m

Running Total With CTE

CTE plus window function for running aggregate

Hard10m

Same-Day Session and Transaction Correlation

The product team wants to correlate sessions with same-day orders.

Hard26m

Search Algorithm Rating

The search quality team is evaluating search result ranking.

Hard30m

Search Term Length vs Click Rates

The search team is exploring a correlation between query length and click behavior.

Hard24m

Second Purchase

Hard5m

Sequential Service Transitions

The career mobility team analyzes job transitions.

Hard24m

Service Scorecard

Hard5m

Services Hitting Cost Threshold

The finance team's monthly review needs to know what percentage of services reached at least $100 in monthly cloud spend

Hard40m

Services With Most Checks in 2025

The compliance team wants to find services that had more health checks in 2026 than in any other individual year.

Hard34m

Services With Multi-Quarter Uptime

The reliability team monitors service uptime streaks.

Hard34m

Service Uptime Turnaround

The product team wants to find services that experienced a turnaround in health: at least 3 consecutive months of declin

Hard44m

Service With Most Critical Alerts

The incident commander wants the full details for the service that has received the most critical-severity alerts.

Hard30m

Session Count Distribution

The engagement analytics team wants to understand session count distribution among users who signed up between 2025 and

Hard32m

Session Page View Distance

The ML platform team is computing session travel distances from page view data.

Hard38m

Shared Channel Contacts

The trust and safety team is mapping user networks through chat messages.

Hard22m

Spend and Rank

Hard5m

Spending Range

Hard5m

Streak Status Changes

Detect value changes across consecutive rows

Hard10m

Stream Gaps

Hard5m

Team Cost Allocation Comparison

Engineering leadership is comparing individual cost allocations to their team lead and department average.

Hard42m

Tenure Spread for Active Tokens

The team analytics dashboard compares tenure extremes among active API tokens.

Hard30m

Top 3 Monthly Costs per Team

The FinOps team needs a top-spend breakdown.

Hard30m

Top and Bottom Cloud Spenders

Finance wants a quick compensation snapshot for the board deck.

Hard24m

Top Commit Authors by Repo

For the engineering review, find the top 3 commit authors in each repo by average lines added.

Hard40m

Top CPU Pods per Namespace

The engineering review board wants the two highest-CPU pods in each namespace, pivoted so each row is a namespace with i

Hard30m

Top Endpoint by Power Users

The platform team wants to find the most popular endpoint among power users.

Hard34m

Top Flagged Campaign Resolutions

The content moderation dashboard needs the ad campaign(s) that received the most user-reported alerts.

Hard42m

Top Framework by Deployments

The data team wants to identify the top framework used by deployed models.

Hard30m

Top Models by Framework

The ML ops team wants to find the top-performing models by their primary framework.

Hard38m

Top Per Category

Hard5m

Top Percentile API Tokens

The security team wants to identify the most suspicious API tokens.

Hard30m

Top Regions by High CPU Nodes

The infra team wants the top 5 regions by count of nodes running above 90% CPU.

Hard32m

Total Hours Between Consecutive Events

The platform team tracks user session state changes.

Hard32m

Transaction-Only Features

The data team needs to find features exclusive to the 'warehouse' source that do not appear in 'streaming' or 'api'.

Hard24m

Upvote Percentage by Age Cohort

The search quality team wants to compare search success rates between new and existing users.

Hard44m

User 360

Hard5m

User Campaign Overlap Percentage

The recommendations team is measuring user similarity based on ad campaign exposure.

Hard30m

User Connection Score

The social analytics team calculates each user's connection score.

Hard40m

User Spend Segmentation by Category

The customer analytics team segments users by spending behavior per region.

Hard34m

Users Who Churned in February

The platform team is investigating churn signals.

Hard22m

Users With and Without Ad Clicks

The compensation team needs a count of users who received at least one ad click bonus versus those who did not.

Hard32m

Viewer-to-Purchaser Activity

The content analytics team wants to know which users became creators after starting as consumers.

Hard34m

Weekly Order Status Report

The ops team needs a weekly order status report.

Hard34m

Weekly Transaction Day Split

The revenue analytics team is analyzing transaction patterns by day of week.

Hard34m

Weighted Variant Selection

Select a row using cumulative weight probabilities.

Hard10m

Worst Table Per Year by DQ Failures

The data quality team wants to find the table with the most failed checks each year.

Hard32m

YoY Signup Growth Rate

The growth team needs to calculate year-over-year user signup growth.

Hard42m

Zero-Retry Job Ratio by Priority

The HR analytics team needs the ratio of batch jobs that completed without any retries to total jobs, broken down by pri

Hard28m

Slowly Changing Dimension Type 2

Given a customer dimension table with historical records, query for the current address and handle SCD Type 2 logic. Reported from Cognizant and EY interviews.

Hard20m

Normalization Tradeoffs in Practice

Given a denormalized orders table with data anomalies, normalize it to 3NF and discuss when you would denormalize back. From EY and PwC senior interviews.

Hard20m
Python192 problems

The Max Frequency Element

Find the hottest items in the transaction log, ties included.

Easy6m

The Dedup Preserver

Clean up duplicate events without losing the timeline.

Easy6m

The Forward Fill

Patch the gaps in a noisy sensor stream.

Easy8m

The Word Mismatch

Spot the differences between two text snippets.

Easy8m

The Friend Counter

How popular is everyone in a messy social graph?

Easy10m

The Sequel Spotter

Automatically detect which books in a catalog are sequels.

Easy8m

The Nth Highest Key

Who's sitting in nth place on the leaderboard?

Easy8m

The Character Encoder

Squeeze a string down to its shortest representation.

Easy8m

The Monotonic Check

Is this time-series feed moving in one direction?

Easy8m

The IP Validator

Sort the real IP addresses from the garbage.

Easy10m

The Log Frequency Counter

What's blowing up in the log dump?

Easy10m

The Deduplicator

Strip the repeats, keep the originals.

Easy10m

The Config Merger

Two configs walk into a bar. One walks out.

Easy12m

Flatten the Feed

Nested lists all the way down. Make them flat.

Easy10m

Batch With Metadata

Chop a list into labeled chunks.

Easy10m

Caesar Shift Check

Check if a fixed alphabet shift transforms one string into another.

Easy10m

Character Frequency Counter

Count character occurrences and return as a dictionary.

Easy10m

Coalesce Fields

Easy5m

Column Max

Easy5m

Column Range

Easy5m

Column Sum

Easy5m

Compute Median

Compute list median without built in functions.

Easy10m

Count Vowels

Count vowel characters in a string.

Easy10m

Dominant Element

Find the element that appears more than half the time in a collection.

Easy10m

Even Filter

Easy5m

Explode List

Easy5m

Extract Domain

Easy5m

Fibonacci List

Generate the first n Fibonacci numbers iteratively.

Easy10m

Find Duplicate Values

Return elements that appear more than once.

Easy10m

Find Maximum Value

Find the largest number without using max().

Easy10m

Find Maximum Without Built-in

Iterate to find the largest value manually.

Easy10m

First Unique Character

Find the first character that appears only once.

Easy10m

Flat Map

Flatten a mixed list of ints and sublists into one list.

Easy10m

Forward Fill Missing Values

Replace None entries in a list with the most recent non None value.

Easy10m

Greeting Formatter Class

Class with string init and case sorting method

Easy10m

Letter Frequency Counter

Count occurrences of each character in a string.

Easy10m

Majority Element Median

Find median when one value dominates.

Easy10m

Merge Sorted Lists

Combine two sorted lists into one sorted list.

Easy10m

Most Reviewed Entry

Find the dictionary key whose list value has the most elements.

Easy10m

Normalize Name

Easy5m

Nth Highest Value

Find the nth largest value from an unsorted dictionary.

Easy10m

Nth Missing Number

Find the nth number absent from a sorted list.

Easy10m

Null Counter

Easy5m

Ordered Character Check

Check if all As appear before all Bs.

Easy10m

Primes Below N

Return all prime numbers less than n.

Easy10m

Progress Milestones

Report progress at each 10% increment.

Easy10m

Quality Gate

Easy5m

Quantile Calculator

Find the value at a given percentile.

Easy10m

Record Filter

Easy5m

Reverse Field

Easy5m

Run Length Encoding

Compress consecutive repeated characters with counts.

Easy10m

Sample Variance

Compute the sample variance of a list of numbers, rounded to 2 decimal places.

Easy10m

Sanitize Field

Easy5m

Schema Checker

Easy5m

Second Largest Element

Find the second-highest value in a list.

Easy10m

Sequential Word Pairs

Extract all consecutive two word pairs from a sentence.

Easy10m

Single Element Among Pairs

Find the one unpaired element in a sorted array.

Easy10m

Singleton Value Filter

Extract values that appear exactly once in a collection.

Easy10m

Sort by Value

The keys stay. The order changes.

Easy8m

Sort Descending

Easy5m

Sort Version Numbers

Sort semantic version strings oldest to newest.

Easy10m

Stack Class

Implement push, pop, and peek on a list.

Easy10m

String Compression

Compress consecutive repeated characters.

Easy10m

The Bracket Validator

Open it. Close it. In the right order.

Easy10m

The Bug Spotter

The code runs. The output is wrong. Fix it.

Easy8m

The Clock Angle

Hour hand, minute hand. How far apart?

Easy10m

The Column Transformer

A function per column. Apply them all.

Easy10m

The Manual Dedup

Remove repeats. No shortcuts allowed.

Easy8m

The Password Builder

Random characters. Fixed rules.

Easy6m

The Roman Converter

Letters to numbers. Old school.

Easy10m

Tokenize

Easy5m

Total Activity Minutes

Sum durations for matching activities.

Easy10m

Transform Column

Easy5m

Type Caster

Easy5m

Unique Dictionary Values Sorted

Sorted list of values appearing exactly once.

Easy10m

Unique Values

Easy5m

Valid Palindrome

Check if a string reads the same forwards and backwards.

Easy10m

Value Count

Easy5m

Word Count and Unique Words

Count words and return unique ones.

Easy10m

Word Counter

Easy5m

Word Frequency

Count occurrences of each word in a text.

Easy10m

Zip to Record

Easy5m

Maximum in a List

Write a function to find the maximum number in a list of integers. A warm-up screener from Wipro and Infosys.

Easy8m

The Hourly Event Bucketer

Count event types per hour from timestamped log entries.

Easy10m

The Sorted List Merger

Merge two sorted lists into one sorted list without using sort().

Easy10m

The Dictionary Inverter

Invert a dictionary so values become keys mapping to sorted lists of original keys.

Easy10m

The String Compressor

Compress consecutive character runs, returning original if not shorter.

Easy10m

The Bracket Validator

Validate that all bracket pairs in a string are properly balanced and nested.

Easy10m

The Best Trade Finder

Find the single best buy-then-sell opportunity in a price series.

Easy10m

The Stream Averager

Track running averages per key from a stream of tagged readings.

Easy10m

The Biggest Tipper

Find the most generous customer across all their orders.

Medium8m

The JSON Flattener

Turn a deeply nested API response into a flat row.

Medium12m

The Resume Filter

Extract candidate info from URLs, skip the ones you already know.

Medium10m

The Highest Salary Finder

Which job titles belong to the company's top earners?

Medium10m

The Most Frequent Review

Which piece of customer feedback shows up everywhere?

Medium10m

The File Size Profiler

Which file types are eating all the disk space?

Medium10m

The Interval Merger

Overlapping viewing sessions need to become one clean timeline.

Medium12m

The Stock Range Finder

Find the most volatile trading day for a given ticker.

Medium12m

The Status Code Counter

Make sense of a pile of raw Nginx access logs.

Medium15m

The Budget Allocator

Divide project money fairly when people wear multiple hats.

Medium12m

The Trade Log Aggregator

Break down daily trading volume by exchange from raw logs.

Medium12m

The Timezone Trip Counter

Monthly trip counts, but the timezone matters more than you think.

Medium15m

The Host Ranker

Rank vacation rental hosts by their total inventory.

Medium12m

The Email Ranker

Rank users by how much email activity they generate.

Medium12m

The Consecutive Streak

Which users showed up every single day without a gap?

Medium15m

The Schema Differ

What changed between yesterday's schema and today's?

Medium15m

The Rate Limiter

Too many requests? Not on your watch.

Medium15m

The Event Aggregator

Bucket a firehose of events into tidy time windows.

Medium15m

The Record Reconciler

Two systems, same data. What doesn't match?

Medium18m

The Dependency Resolver

Figure out what runs first when everything depends on everything.

Medium20m

Batch Partitioner

Medium5m

Batch Records

Medium5m

Char Profile

Medium5m

Cumulative Sum

Medium5m

Decode Frequency String

Parse encoded character frequencies with multipliers.

Medium10m

Deep Flatten

Medium5m

Deep Get

Medium5m

Detect Cycle in Sequence

Check if following next-pointers forms a loop.

Medium10m

Detect Outliers

Medium5m

Diagonal Extract

Medium5m

Dice Roll Scoring

Score a set of dice by matching patterns.

Medium10m

Dictionary Key Intersection

Shared keys between two dicts, sorted by key.

Medium10m

Execution Timer Wrapper

Build a reusable wrapper that measures how long a function takes to run.

Medium10m

Extract Leaf Values

Medium5m

Find Indices

Medium5m

Find Mode

Medium5m

Flatten Config Map

Convert a nested dictionary into a flat dictionary with dot separated keys.

Medium10m

Full Outer Zip

Medium5m

Group By

Medium5m

Lag Column

Medium5m

Least Recently Used Cache

Implement a fixed capacity cache that evicts the least recently used entry.

Medium10m

Left Join

Medium5m

Longest Palindromic Substring

Find the longest palindromic substring within a given string.

Medium10m

Maximum Subarray Sum

Find the contiguous subarray with the largest sum.

Medium10m

Max Length Token

Medium5m

Merge Counters

Medium5m

Merge Overlapping Time Ranges

Merge overlapping time ranges into non overlapping intervals.

Medium10m

Parse Log Line

Medium5m

Permissions Manager

Manage user permissions with config updates.

Medium10m

Pivot Rows to Columns

Reshape ordered rows into numbered column dict

Medium10m

Portfolio Profit Calculator

Compute total gain from purchase history and prices.

Medium10m

Precision and Recall

Compute classification accuracy metrics.

Medium10m

Prefix Based Word Replacement

Replace words with their shortest matching root prefix.

Medium10m

Rank Metrics

Medium5m

Rename Keys

Medium5m

Rotate Buffer

Medium5m

Row Aggregates

Medium5m

Running Batch Average

Maintain a running average across multiple calls.

Medium10m

Running Distinct Count

Medium5m

Search in Rotated Sorted Array

Find a target value in a sorted array that has been rotated at an unknown pivot.

Medium10m

Sliding Window Average

Compute a rolling average over recent values.

Medium10m

The Change Tracker

Old snapshot, new snapshot. What changed?

Medium15m

The Chunked Reader

Too big for memory. Read it in pieces.

Medium10m

The Iterator Flattener

Nested iterators. One stream out.

Medium15m

The Level Summer

A tree of numbers. Sum each level.

Medium15m

The Map Reducer

Map each item. Reduce to one answer.

Medium12m

The Response Aggregator

Multiple pages of results. One summary.

Medium12m

The Scoreboard Race

Simulate rounds until someone hits the target.

Medium15m

The Yahtzee Scorer

Five dice. Score each category.

Medium15m

Threshold Filter

Medium5m

Top N Keys

Medium5m

Transpose Table

Medium5m

Triplet Sum

Find all unique triplets summing to a target.

Medium10m

Unflatten Keys

Medium5m

URL Shortener

Map long URLs to short codes and back.

Medium10m

Validate Email

Medium5m

Valid Triangle Count

Count how many triplets from a list of lengths can form a valid triangle.

Medium10m

Weekly Buckets

Group timestamps into 7-day intervals.

Medium10m

Distribute Values Into Container Types

Sort values from a list into rotating container types (set, list, tuple). A real PwC data engineer interview question.

Medium15m

The Nearest Value Mapper

Map each key to its nearest value, breaking ties by picking the smaller one.

Medium15m

The Target Sum Pairer

Find all index-unique pairs that sum to a target, preserving scan order.

Medium20m

The Event Overlap Detector

Find all pairs of named events whose time intervals overlap.

Medium15m

The Consecutive Sequence Finder

Find the longest run of consecutive integers in an unsorted list using O(n) time.

Medium15m

The File Tree Builder

Build a nested directory tree from a flat list of file paths.

Medium15m

The Two-Stack Queue

Implement a FIFO queue using two stacks with O(1) amortized operations.

Medium20m

The Category Ranker

Return the top N records per category, sorted by value descending.

Medium15m

The Rate Limiter

Throttle abusive API clients without blocking everyone else.

Hard20m

The Change Data Capture

What's new, what changed, and what vanished?

Hard20m

The Stream Joiner

Match events across two streams when timestamps almost line up.

Hard25m

The Anomaly Detector

Spot the outliers before they page someone.

Hard20m

The Schema Migrator

Old schema in, new schema out. Hundreds of records at a time.

Hard25m

The DAG Executor

Wire up a mini pipeline and watch it run.

Hard30m

Common Prefix

Hard5m

Data Quality Report

Hard5m

Group Average

Hard5m

Merge Intervals

Hard5m

Pivot Records

Hard5m

Stream-Process a Large CSV

Write a generator that processes a large CSV file line-by-line without loading it all into memory. From Deloitte and EY interviews.

Hard20m

The Meeting Room Allocator

Find minimum conference rooms needed for a set of meeting intervals.

Hard20m

The Streaming Median Tracker

Track the running median of a number stream with efficient insertion.

Hard20m

The Hierarchy Builder

Convert flat parent-child pairs into a nested dictionary tree.

Hard20m

The Peak Performance Window

Find the contiguous subarray with the largest sum using Kadane's algorithm.

Hard20m
Data Modeling26 problems

Customer Address History

People move. Sometimes twice in a month. How do you remember where everyone was, and when?

Easy20m

B2B Invoicing Data Model

Invoices go out, partial payments trickle in, and some customers are three months overdue.

Easy15m

Fitness Studio Membership Schema

Classes fill up, waitlists grow, members no-show, and billing happens monthly. Keep it all straight.

Easy15m

Loan Management Schema

Money goes out, payments come back in (hopefully), and the balance in between needs to be exactly right.

Easy15m

Toll Road Sensor Analytics

Cars enter, cars exit. Except when they don't. And the sensors aren't always paying attention.

Easy15m

Fitness App Data Model

Reps, sets, streaks, and personal bests. Gym rats love their stats.

Easy15m

Ride-Sharing Platform Schema

People need rides, drivers need fares, and someone needs to keep track of it all.

Medium25m

Employee Transfer Tracking System

People switch teams, move offices, get new managers. HR wants the full story, start to finish.

Medium25m

Movie Streaming Analytics Schema

Someone pressed play. Did they finish? Did they skip the intro? Content licensing adds another twist.

Medium25m

Log Parsing Pipeline Schema

Raw text files, terabytes of them, full of buried signals and cryptic error codes.

Medium25m

Livestream Analytics Schema

Someone goes live, thousands tune in, chat explodes, and virtual gifts start flying.

Medium25m

POS Sales Data Warehouse

Every beep at every register across hundreds of locations. Coupons and returns make it messy.

Medium25m

Online Retail Star Schema

Prices change, categories shift, and the BI team wants to slice revenue every way imaginable.

Medium25m

Social Platform Data Model

Follows, posts, likes, replies to replies. The interactions never stop and they all connect.

Medium25m

Subscription Churn Analysis Model

Subscribers are leaving and nobody knows why. Somewhere in the listening data, there are clues.

Medium25m

Retailer Data Warehouse Design

Queries are crawling and the analytics team is getting impatient. Time to rethink how the data is organized.

Medium20m

Clickstream and Session Schema

Millions of clicks, most of them anonymous. Somehow you need to stitch them into coherent stories.

Medium25m

Housing Marketplace Analytics

Sellers want buyers, buyers want deals, and the company wants to know why conversions are so low.

Medium25m

Trending Dishes Dashboard

What's everyone eating right now? The answer changes by the hour and varies by neighborhood.

Medium25m

Telecom Network Connectivity Warehouse

Thousands of devices, all connected. When one goes down, the ripple effects get interesting.

Hard35m

Metric Definition Reverse Engineering

Five numbers on a dashboard. Your job: figure out where they come from.

Hard35m

Property Booking Platform

Vacations are complicated. Between availability, pricing, and guest reviews, there is a lot to keep straight.

Hard30m

E-Commerce Supply Chain Tracking

A package takes a winding journey from origin to doorstep. Sometimes it splits up along the way.

Hard30m

SCD Type 2 Customer Dimension

Things were different six months ago. Can you prove it?

Hard25m

Financial Trading Warehouse

Every trade, every tick, every fraction of a share. The regulators want receipts.

Hard30m

Insurance Claims Lifecycle

A claim gets filed. Then it gets complicated. Then it gets reassigned. Then it loops back.

Hard25m
Architecture12 problems