Practice
1084 problems across SQL, Python, Data Modeling, and Architecture
The Missing Refunds
Find orders from the last 30 days with no matching refund record using the anti-join pattern.
The Duplicate Detection Sprint
Find duplicate email addresses with occurrence counts and signup date ranges using GROUP BY and HAVING.
Weekend Warriors
Compare event volumes on weekdays vs weekends using CASE WHEN pivot pattern.
The Dormant Accounts
Find paying customers inactive for 90+ days with their subscription tier and lifetime spend.
30-Day Page View Counts
Product analytics wants a quick engagement snapshot.
Above Average
Above Average Interactions
Find users exceeding average activity count.
Above Category Average
Find rows whose value exceeds a joined reference.
Active API Tokens
The admin team needs all API tokens that have actually been used.
Active Campaigns
Active Token Owners in 2026
The security team is auditing API token usage.
Active Users With April Transactions
The monetization team wants to know how many active users made at least one transaction in April 2026.
Activity Histogram
Two level GROUP BY to build a frequency histogram.
Ad Clickers
Ad Revenue 2026
Finance is pulling annual ad revenue.
Alert Hotspots by Service and Severity
The infrastructure team is mapping alert hotspots.
All Infra Regions
The data catalog team needs a list of all distinct regions present in the infrastructure nodes table.
Annual Cloud Spend
The cloud finance team needs total annual cloud spending.
Annual Cloud Spend Summary
The finance team wants total cloud spending and the number of distinct services billed in each fiscal year.
Annual Pipeline Failures
The data platform team is auditing pipeline failures.
April and May Active Users
The analytics team needs a list of users who were active in April or May.
Auth Endpoints
The platform team needs all API endpoints that contain the word 'health' (case-insensitive).
Authors With Successful Deploys
The delivery team tracks successful deployment authors.
Auth Service Health Checks
The compliance team needs a full audit trail for the 'auth-service'.
Average Brand Campaign Revenue
The ad analytics team needs a quick benchmark.
Average Build Duration by Repo
The platform team is benchmarking build performance.
Average DQ Fail Rate
The data quality team computes average check fail rates per table.
Average GPU Node CPU Usage
The infra team is sizing capacity for GPU nodes.
Average Headcount by Department
HR analytics is benchmarking compensation.
Average High-Range Accuracy
The ML platform team needs the average accuracy for models scoring between 91 and 100 inclusive.
Average Latency by Health Status
For a quarterly reliability review, the team needs the average latency broken out by health status for 'healthy', 'degra
Average Latency by Status
The SRE team wants the average latency for each API call status.
Average Node Utilization
The cloud finance team needs average CPU and memory utilization broken down by region and node type.
Average Rating by Category
The product analytics team wants the average rating for products in each category.
Average Response Time by Hour
The SRE team needs average response time broken down by hour of day, sorted chronologically.
Average Search Endpoint Latency
The performance team wants the average latency for the '/api/search' endpoint.
Average Search Results Per User
The search relevance team is benchmarking query volume per user.
Average Session Duration by Device
The performance team wants to benchmark session lengths by device.
Bargain Bin
Best-Selling Reps Each Month
The marketplace team needs the top 3 sellers by total amount in each product category for January.
Big Spenders
Budget Flag
Join tables and label rows as over or under budget.
Budget-Friendly Products
The product team wants all distinct product names that are budget-friendly, meaning priced between 5 and 20 inclusive.
Campaign Match Rate
The marketing analytics team wants to measure campaign reach.
Campaign Revenue Totals
The finance team needs total ad revenue grouped by campaign.
Cart Sizes
Category Census
Category Sales Summary
The merchandising team is reviewing category performance for 2026.
Category-Specific Product Volume
Sum transactions for a specific payment type.
CDN Image Request Paths
The networking team needs all CDN log entries related to image requests.
CDN-Related DNS Lookups
The DNS team is filtering lookup records.
Character Position in Endpoint
The growth team is debugging URL patterns.
Cheapest Cost Per Region
The cost optimization team is finding the lowest cloud spend per region.
Cheapest Transaction per User
The commerce team wants to identify each customer's cheapest purchase.
Clean Averages
Clean Cache CDN Edges
The network team needs to identify CDN edges that serve cached content with no errors.
Clean Latency Cast
The data quality team found that the latency column in service health records contains some non-numeric strings.
Clicked Ad Impressions
The product analytics team needs all ad impression records where the user clicked.
Click Revenue
Cloud Cost by Team
The FinOps team is reviewing spend by team.
Common Age Buckets
The data quality team suspects duplicate records in the users table.
Completed Priority-1 Jobs
The batch processing team needs all completed priority-1 jobs.
Compute Nodes in Key Regions
The infra team is inventorying compute nodes across key regions.
Content by Specific Users
The content team wants a quick audit of items created by 'alice' or 'bob'.
Content Duration Snapshot
The content team wants a popularity snapshot.
Content Mix
Content Published in 2026
The product analytics team is investigating historical engagement trends and needs all content items published during 20
Content Sorted by Duration
The content team is preparing a catalog sorted by length.
Content Type Distribution
The data catalog team needs a count of records by content type, sorted alphabetically by type.
Content Types by Creator
The content moderation team needs to audit a specific creator.
Cost Efficiency Ratio
Finance is calculating efficiency ratios for cloud services.
Count Distinct Services
The platform team wants to understand the breadth of the service mesh.
Count Nodes in Region
The ops team needs a quick count of how many infrastructure nodes are in the 'us-east-1' region.
CPU Utilization Summary
For the quarterly infrastructure report, the team needs the minimum, average, and maximum CPU utilization across all nod
Customer Full Name Concat
For the internal directory, the HR team needs each customer's first and last name combined into a single full name field
Daily and Weekly Active Users
Count distinct users per day and per week.
Daily Cross-Platform Users
The platform analytics team needs daily unique user counts across mobile and web sessions.
Daily Deployment Count
The ops dashboard needs daily deployment counts.
Department Spend Difference
HR is benchmarking compensation across teams.
Deploy Cadence
Deploy Count by Service
The release management team needs a deploy frequency report showing how many deployments each service has had, sorted fr
Deployed Models by Framework
The ML platform team is cataloging model frameworks by deployment status.
Deployment Duration by Status
The ops team is reviewing deployment outcomes.
Deprecated Model Count
The ML platform team is auditing model versions.
Device Mix
Device Types With Chrome Users
The growth team is profiling power users by device type.
Disabled Feature Flags
The SRE team needs to audit disabled feature flags.
Distinct Blog Referrers
The analytics team needs a deduplicated list of all referrer sources for blog content.
Distinct Product Categories
The product catalog team needs a quick inventory check.
Early 2026 Data Pipelines
The data governance team needs a historical audit.
Email Census
Employees Per Department
Count members grouped by location.
Error Severity Buckets
The product team wants to classify error severity for reporting.
Errors With Service Health
The observability team needs to enrich error data with service health context.
Even-ID February Signups
The analytics team is running a signup cohort analysis and wants to isolate February signups with odd-numbered user IDs.
Even-ID June Signups
HR is auditing onboarding patterns and wants to look at June signups whose user IDs are even numbers.
Event Count on Key Days
The product team is tracking event volume on key days.
Events by Month Across Years
The product analytics team wants to see total event volume for each calendar month (1 through 12), combining all years t
Event Types Spanning Multiple Months
The product team wants to find versatile users who have triggered both 'click' and 'purchase' event types in event_data.
Expensive AWS Services
The procurement team needs a count of distinct AWS services that have at least one cost entry of 200 or more.
Extreme Headcount Departments
HR needs to find employees with extreme compensation.
Failed Payment Deployments
The SRE team is tracking failed deployments.
Features With Missing Values
The data quality team is investigating missing data.
February 2024 Signups
The HR analytics team needs a cohort snapshot of all users who signed up during February 2026.
Filter By Domain
Select rows matching a text suffix pattern.
Filtered User Roster
The people ops team needs a clean roster for the all-hands.
Find Deploy Authors
The HR team needs to find all deploy authors whose last name is 'Chen'.
First Build per Repository
The devops team wants to know when each repository first appeared in the CI pipeline.
First Migration Record
The migrations team wants the very first migration ever applied, identified by the smallest migration ID.
First Run Row Count
The batch operations team needs each job's initial row count.
Flag Check
Full Customer Order List
The ops team needs a full roster of customers alongside their order history, including customers who have never placed a
Gateway Connection Timeouts
The SRE team is hunting for specific error patterns.
Health Check Distribution
The SRE team wants to see the distribution of health check results for auth-service.
Health Checks per Service
The observability team needs a quick service inventory showing how many health checks exist for each service.
Heavy Searchers in August
The search relevance team wants to flag power searchers.
High and Critical Alerts in 2026
The alerting team wants to review stale alerts.
Higher Performing Variant
The A/B testing team wants to know whether the control or treatment variant produces a higher average metric value.
Higher Than Supervisor
Self join to find employees outscoring their manager.
Highest Cost Per Team
The executive dashboard needs peak cost metrics per team.
Highest Latency Endpoints
The SRE team wants the 3 endpoints with the highest peak latency, showing each endpoint and its maximum latency value.
High-Output Creators
The content team is evaluating high-engagement creators.
High Price Products
The procurement team needs a list of all products priced above 100.
High-Rated In-Stock Percentage
The sustainability team needs a quick metric: what percentage of products are both in stock and rated 4 or above?
High-Spend 2025 Campaigns
The marketing team wants to find ad campaigns that generated more than 100,000 in total revenue during 2026.
High-Traffic Endpoints in February
The product team wants to know how many distinct endpoints received 100 or more total API calls during February 2026.
High-Value Electronics
High Volume Batch Jobs
The data platform team needs all batch jobs that processed more than 3 million rows.
Holiday Promo Campaign Click Year
The growth team wants to find the year in which the influencer ad campaign surpassed 2000 total clicks.
Holiday Sale Campaign Revenue
Product analytics wants to spotlight the holiday sale campaign.
Inactive Unverified Users
The engagement team wants to find dormant trial users.
Initial Count
In-Stock Product Count
The inventory team needs a count of how many products are currently in stock.
Issuance Disparity
Max minus min metric per group, ranked by gap.
Japan Revenue for April
The regional sales team is pulling last month's numbers for the APAC region.
Joined Employee Details
Combine two related tables with a join.
Largest Group
Find the group with the highest member count.
Last Five Batch Jobs
The data pipeline team needs a quick tail check.
Last Migration Record
The migrations team wants the most recently applied migration, identified by the largest migration ID.
Last Server Activity
The infra team wants to know the last time each server reported a log entry.
Latency vs Regional Average
Engineering leadership wants to compare each service's latency against the regional average.
Latest Metric Values
The data platform team suspects stale records in employee metrics.
Latest Session Per User
For each user in user_sessions, find their most recent session start date.
Latest Version Per Service
The deployment dashboard needs the latest version deployed for each service.
Log Entries by Level
The SRE dashboard needs a count of server log entries at each log level, sorted by level alphabetically.
Log Levels
Log Priority
Log Volume by Day of Week
The SRE team needs incident frequency by day of week.
Longest Active Membership Streak
The compensation benchmarking team wants the five highest distinct metric values from employee metrics.
Longest Deploy With Full Identifier
The deploy report needs full identification for the deployment(s) with the longest duration.
Long Searches Containing 'er'
The search team is analyzing query patterns.
Low-Byte CDN Responses
The CDN team is investigating low-traffic edges.
Low-Engagement User Count
The product team is sizing a migration cohort.
Lowest Average Price Category
The cost optimization team wants to find the cheapest service category.
Low Latency API Calls
The SRE team wants to validate that certain endpoints are performing well.
Low Severity DQ Checks
The security team needs all data quality checks classified as 'low' severity.
Low Throughput Pipelines
The data quality team wants to identify low-throughput data pipelines.
Low Uptime Services
The SRE team is investigating underperforming services.
Max Value Per Location
Find the highest metric in each group.
Memory-Heavy Pods
The capacity team is scoping memory-intensive workloads.
Merge-Triggered Builds 2026
The release engineering team wants to know how many CI builds were triggered by merges in 2026.
Message Length
Messages Containing Keyword
The content moderation team is scanning for flagged terms.
Messages From Specific Users
The support team is investigating specific user complaints.
Metric Range Per Group
Find the max-min spread within each group.
Metric Value Quarter Complement
The data integrity team is spot-checking records where two metrics coincidentally match.
Mid-CPU Nodes
The capacity team needs to scope mid-tier nodes.
Mid-Range Cost Allocations
The compensation team needs a filtered report of cost allocation entries where the amount falls between 500 and 10,000 i
Mid-Tier Batch Jobs
The leaderboard team needs all batch jobs that rank between positions 8 and 10 by rows processed.
Missing Email for Non-Active Users
The notification platform team found an anomaly: some users from 2026 have no email on file but their account status is
Mobile Event Counts
The product analytics team is measuring engagement by device.
Monthly Active Users per Endpoint
For the platform health dashboard, count the distinct active users per endpoint during January 2026.
Monthly Category Totals
Sum amounts by category and month.
Monthly Deployment Count
The ops team tracks monthly deployment frequency.
Monthly Signup Counts
The growth team needs to track user registration trends by month.
Monthly Transaction Counts
Count transactions per user per month.
Monthly Unique Users per Campaign
The engagement team tracks monthly reach per ad campaign.
Morning Warning Logs
The SRE team needs all warning-level server log entries that occurred before noon.
Most Common Export Job Status
The analytics team wants the single most common status among batch jobs whose name contains 'etl'.
Most Recent Token Usage
The security team needs each user's most recent API token activity.
Multi-Column User Sort
Product analytics wants a quick directory of users sorted alphabetically by username, with ties broken by age bucket in
Multi-OS Users
The platform team wants to find users who have logged in from devices running different operating systems across session
Multi-Provider Cost Lookup
The finance team needs cloud cost amounts for the providers AWS, GCP, and Azure.
Multi-Variant Experiments
The experimentation platform team needs to find users who participated in both an 'onboarding' variant and a 'retention'
Never-Ordered Products
The inventory team wants to find products that exist in the catalog but have never been ordered.
Nodes in Target Regions
The infra team is triaging node issues and needs all infrastructure nodes located in 'us-east-1', 'us-west-2', or 'eu-we
Node Summary Per Region
The infrastructure team needs a region-level summary showing the total number of nodes and the number of distinct node t
No Gaps
Non-Bot Acknowledged Alerts
The on-call team is reviewing alert acknowledgments.
Non-Draft Content
The content team is filtering out draft content.
Notifications Opened on Date
The marketing team needs the total number of push notifications that were opened on January 2, 2026.
Nth Highest Salary
Find the 3rd highest amount from an employee table. A classic consulting firm screener.
Nth Largest Value
Select the row with a specific rank position.
NULL Join Behavior
Understand how NULLs affect join results.
Oldest and Newest User Sessions
The growth team wants to spotlight extremes in the user base.
One-Star Product Review Count
The product team wants to see review volume for the lowest-rated products.
Overall Average API Latency
The platform team needs the overall average latency across all API calls.
Peak Activity by Device
The platform team tracks user activity windows per device type.
Peak Ad Revenue Moment
The ad revenue team is looking for the peak earning moment.
Peak Metric Per Department
For the quarterly investor deck, the data team needs peak metrics per department.
Peak Non-Converting Month
The product team wants to find the month with the most users who had sessions but never made a purchase.
Peak Satisfaction
Peak Spending Month
The cost optimization team wants to identify the single billing month with the highest total cloud spend.
Pending Batch Jobs
The batch processing team needs to audit stuck jobs.
Pipeline Run History
The data catalog team needs a lineage summary.
Pipeline Throughput Ratio
Compute current-to-initial value ratio per period.
Platform Check
Platform Team Feature Flags
The data team is auditing feature flag ownership and needs all feature flags owned by a platform team, along with each f
Platform Team Mobile Flags
The feature flags team needs all flags owned by the 'platform' team where the flag name contains the word 'beta'.
Pod Distribution by Restart Count
The reliability team wants to understand low-restart pods.
Popular Categories
Price Check
Production Deployment Count
The SRE team is auditing deployment volume.
Production Deploys From April Onward
The DevOps team wants to count how many deployments went to the production environment during April or later (month 4 on
Product Name Letter Replace
The localization team needs a quick text transform on the product catalog.
Product Name Prefix
For a compact dashboard, the team needs just the first three characters of each product name, sorted by product ID.
Product Page Sale Searches
The search quality team wants to count search queries originating from the products page where the search term contains
Product Revenue Ranking
Products Without Sales
The content team wants to find stale inventory.
Profitable Categories by Price
Finance wants to spotlight the most profitable product categories.
Promo Campaign Cost per Acquisition
The marketing team needs cost-per-acquisition for campaigns advertised via push notifications (campaign names containing
Provider Cost Change H1
The FinOps team tracks cost fluctuations between the start and middle of the year.
Purchase Log
Push Reach
Q2 Search Volume
The product analytics team needs Q2 2026 search volume.
Quarterly Deployment Count
The release engineering team needs deployment counts broken down by quarter.
Recurring Error Types
The SRE team wants to identify recurring error types.
Regional Profits
Regional Status
Regions With 5+ Nodes
The infrastructure team needs to flag busy regions.
Retargeting Campaign Impressions
The campaign team wants all ad impressions tied to retargeting campaigns.
Revenue by Product
Revenue for Specific Users
Finance needs a quick revenue total for users alice and bob.
Reviews Per Reviewer
The engineering manager wants a workload summary.
Running Node Pairs
The infra team needs to find pairs of servers in the same region that are both healthy.
Satisfaction Score by Region
The platform team ran a developer satisfaction survey but forgot to collect region data.
Search Endpoint Status Distribution
The QA team needs the distribution of HTTP status codes for API calls to the health endpoint.
Searches by Users With Email
The product analytics team is reviewing search behavior for a specific user.
Search Terms Starting With G
The search relevance team wants to audit queries.
Second Highest Salary
Find the second largest distinct value using DENSE_RANK.
Second Highest Value
Find the second largest value without LIMIT.
Service Alert Frequency
The platform team wants to know how frequently each service appears in the alert events table.
Services With Most Error Occurrences
The reliability team needs to surface the noisiest services.
Service User Growth Rate
The growth team tracks user expansion by service.
Session Logins Dec 13 to 19
The security team needs to identify users who logged in during a specific window.
Session Pulse
Sessions Per Device Type
The product analytics team is segmenting user activity by device.
Signups by Age Bucket Since April
The hiring team wants to track recent growth.
Signups Jan to Jul 2026
HR is counting new hires during a specific window.
Slow Batch Jobs
Find orders delivered after the promised time.
Slow Failures
Slow Production Deploys
The SRE team needs to find production deployments that took longer than 150 seconds.
Sort Tokens by Scope Character
The data governance team needs to sort API token scopes for a compliance report.
Spend Categories
Status Report
Stock Status
Storage Node Lookup
The infra team is auditing critical nodes and needs to find all nodes with a database-type designation, along with each
Successful Deploy Endpoint Calls
The SRE team needs a count of successful API calls to the deploy endpoint, where success means a 200 status.
Successful Pipeline Runs
The reliability team wants to know which data pipelines have completed successfully and how often.
Successful Production Deploys
The operations team is reviewing successful production deployments that have duration data.
Suspected Bot Sessions
The trust and safety team flags sessions shorter than 5 seconds as potential bot activity.
Targeted Ad Campaigns
The ad ops team needs to find high-value campaign impressions.
Third Largest Batch Job
The data platform team needs to find the third largest batch job by total rows processed.
Threads Excluding User
The social graph team needs to count chat message threads that user 1 is not part of.
Three Lowest Distinct Cloud Cost Amounts
The cost optimization team wants the three lowest distinct cloud cost amounts.
Tiered Transaction Summary
Compute multiple date windowed aggregates in a single query.
Timeout Status Records
The data engineering team suspects some service health records have an unknown status marked as '?
Timeout Warning Logs
The SRE team is reviewing server logs for incident postmortems.
Titles Ending With S
The content catalog team is auditing naming conventions.
Top 100 Batch Jobs Total Output
The data platform team needs the total rows processed by the top 100 batch jobs ranked by rows completed.
Top 10 Batch Jobs
The platform team wants the top 10 batch jobs by rows processed.
Top 10 Model Accuracies
The ML team wants to spotlight top model performance.
Top 10 Slowest Endpoints
For the quarterly product review, rank endpoints by their peak single-request latency in 2026.
Top 5 Slowest DNS Lookups
The network team is auditing the longest DNS lookups.
Top Ad Campaigns by Revenue
The revenue team wants to rank ad campaigns by total earnings.
Top API Token Scopes
Engineering management wants to identify the scope associated with the highest-value API tokens.
Top Average By Region
Rank regions by their average metric value.
Top Deployed Model
The ML team wants to surface their best-performing deployed model.
Top Device by Sessions
The product team wants to know which device type generates the most user sessions.
Top Duration Content Items
The content team wants to find all content items that held the number one position (highest rating) in their category at
Top Five
Top Mobile OS by Session Duration
The growth team is studying engagement by device type and wants to know which mobile operating system has the highest av
Top Performing Models
The ML ops team needs all models considered top-performing, meaning those with accuracy at 95 or above.
Top Product Categories by Sales
The product team wants to identify the highest-grossing categories.
Top-Ranked Wines by Variety
The content recommendation engine needs to suggest content that fits within a user's average session duration.
Top Recent Sellers
Rank products by sales in a recent window.
Top Selling Items
Rank items by total revenue.
Top Shelf
Top Spenders Dense Rank
The product team ranks users by total transaction value.
Total Compute Cloud Cost
The FinOps team needs the total cloud spend for the 'compute' service category.
Total Cost by Category
The FinOps team needs total spend per category.
Total Engineering Cost Allocation
The finance dashboard needs the total allocated budget for the engineering team.
Total Rows by Pipeline Status
The data pipeline team needs row-level context alongside aggregates.
Total User Spend
Finance needs a summary of each customer's total spend.
Transaction Overview
The executive dashboard needs a quick snapshot: the total number of distinct users who have at least one transaction and
Transaction Source Features
The ML team needs all distinct feature names reviewed by the pipeline_v2 source that have a recorded average value.
Transactions With Product Names
Simple select progressing to a join
Trim Endpoints Right
The data quality team found trailing whitespace in endpoint names.
Trim Search Terms Left
The search team discovered leading whitespace in some search terms.
Tutorial Content Count
The content catalog team needs a quick count of how many content items have 'tutorial' somewhere in the title.
Unassigned Records
Find rows with no matching entry in another table.
Unique Hosts by Node Type
The capacity planning team needs a quick headcount.
Unique Searchers
The product team needs a quick count of search engagement.
Unique Searchers Count
The product team wants to know how many unique users have performed a search.
Unique Stream Topics
The data catalog team needs a clean inventory of all streaming topics.
Unique Visitors
Unmatched Categories
LEFT JOIN anti pattern to find categories with no products.
Unreviewed Models
The compliance team needs to flag models that have never been evaluated.
Unused Read Tokens
The security team needs to identify single-scope API tokens that are currently active.
US-East KV Store Entries
The KV store team needs a quick inventory of all entries stored in the us-east-1 region.
User Age Ranking
The platform team wants to rank users by age bucket in descending order.
User Engagement Totals
The product analytics team wants per-user engagement totals.
User Event Type Count
The engagement team needs unique event type counts per user, including users with zero events.
User Roster
User Sessions on Specific Days
The analytics team needs session data for user 4, but only sessions that started on a Saturday or Monday.
Users Per Device Type
For each device type, count the number of distinct users who have at least one session.
Users Who Clicked Ads
The engagement team needs all users who clicked on at least one ad, along with their account status.
Users With Purchase Events
The engagement team needs all users who have triggered at least one purchase event, along with their account status.
Verify Commit ID Uniqueness
The data engineering team suspects duplicate commit IDs in the repo.
View Count Per Page
The content analytics team needs view counts per page.
Weekly Transaction Volume
The ops team needs a weekly transaction volume report.
Whale Watch
Yearly Output
2026 Signup Count
The growth team needs a simple count of how many users signed up in 2026.
Join Behavior With Empty Tables
Predict the row count for INNER, LEFT, RIGHT, and FULL OUTER joins when one table has 0 rows. From an EY walk-in interview.
The Revenue Cliff
Track monthly revenue trends with MoM percentage change and flag drops >10% using LAG window function.
The Phantom Readers
Find active content readers who never made a purchase using anti-join pattern with mixed temporal scopes.
The Day-7 Retention Cohort
Build a weekly cohort retention report showing what percentage of users return after 7 days.
The Latest Transaction Per Product
Join products to transactions and use ROW_NUMBER to find each product's most recent sale.
10 Lowest Uptime Services
The SRE team wants the 10 services with the lowest uptime scores.
2FA Confirmation Rate
The security team tracks 2FA verification via push notifications.
7-Check Rolling Average
Compute a moving average over the last 7 entries.
7-Day Token Retention
The subscription analytics team tracks daily active premium users.
80th Percentile API Latency
The analytics team is benchmarking API latency percentiles.
90th Pctl Model Accuracy Gap
The operations team is measuring prediction accuracy for ML models trained within the first 14 days of 2026.
Above-Average Cloud Spend
The cloud cost optimization team wants to identify services with above-average spending.
Above Average Product Prices
The pricing team wants to find products whose base price (minimum transaction amount) exceeds the average base price acr
Above the Curve
Active Duo
Active Tokens on Target Date
The compliance team needs to identify token owners who were active on November 1, 2026.
Active Users by Session Count
Active vs Regional User Count
The people analytics team wants to know whether there are more active users or more users in the us-east-1 region among
Ad Revenue by Age Bucket
The marketplace team needs total ad revenue broken down by customer country.
After Hours API Calls
The compliance team needs to flag API calls made outside business hours in December 2026.
Alert Count by Severity Tier
The incident response team needs alert counts by severity tier.
Alert Severity Pivot by Service
The SRE team needs a pivot of alert counts by severity for each service.
All Known Endpoints
The platform team needs a unified, deduplicated list of all endpoints that appear in either the API calls table or the r
API Calls With and Without Errors
The data quality team needs a breakdown by endpoint showing how many API calls have an error message versus how many do
API Calls With Matching Status
The platform team suspects duplicate latency patterns.
API Token Churn Rate
The platform team needs the overall churn rate for API tokens.
API Traffic by CDN Edge
The marketplace team wants to profile CDN paths containing 'api' that serve the 'us-east' region.
App Stability by Region
Calculate crash rate using conditional aggregation
Auction Lot Summary
Aggregate bids per listing with highest bidder.
Authors Deploying to Dev and Production
The deployment team wants to find authors who have deployed to all three environments: dev, staging, and production.
Average Accuracy by Framework
The ML platform team wants to benchmark accuracy by framework, but only for deployed models with versions between 2 and
Average API Latency by Year
For the weekly platform review, the team needs the average latency per endpoint per year from the API calls table, sorte
Average Compensation by Department and Status
HR wants the average total compensation by department and account status.
Average Deploy Duration By Environment
Avg trip length per category within a time window.
Average Fulfillment Lag
The logistics team wants to measure fulfillment speed.
Average Initial Call Latency
The API performance team wants to benchmark first-contact latency.
Average Results for Python Searches
The search relevance team is benchmarking click-through.
Average Review Comments by Author
Engineering leadership is reviewing code quality by author.
Average Session Duration
Product analytics needs average session duration per user.
Average Update Call Latency
The API performance team measures latency for follow-up calls.
Avg Alerts by Severity
The people analytics team is studying alert response patterns by severity.
Avg Daily Active Users per Endpoint
The engagement team wants to measure average daily active users per endpoint for January 2026.
Avg Session Duration by Creator
The product analytics team wants the average session duration for users reviewed by each content creator, treating the c
Batch Job Performance Tiers
The ops team is grading batch job performance.
Best Accuracy to Training Time Ratio
The ML platform team wants to find the model with the best accuracy-to-training-time ratio.
Best Day for Ad Revenue
The analytics team wants to find which day of the month is best for ad revenue.
Biggest Deployment Decline
The ops team wants to know which team had the biggest month-over-month drop in deployments from March to April 2026.
Binary Flag Indicators
The feature management team needs a quick view of every flag's on/off state as binary columns.
Bottom Endpoints by POST Volume
The mobile team wants to identify endpoints with the least POST traffic.
Build Health
Builds per Author per Branch
The QA team wants to know how many builds each author triggered, broken down by branch.
Build Success Rate by Trigger
The CI/CD team wants to measure build success rates by trigger type.
Build Success vs Failure by Repo
The CI/CD dashboard needs a build outcome report.
Busiest Pipeline Month
The data engineering team wants to know which month saw the most pipeline runs.
Busiest Route by Passenger Volume
The engagement team wants Friday ad click counts.
Busy Authors
Campaign Click-Through Rates
The analytics team is comparing ad engagement across campaigns.
Campaign Cost Effectiveness
The marketing team needs cost effectiveness for each ad campaign between 2025 and 2026 inclusive.
Campaign Revenue by Click Channel
The marketing team is evaluating campaign effectiveness across ad formats.
Campaigns With Most Clicks
The ad platform team is reviewing campaign effectiveness.
Categories With Mixed Price Tiers
The cross-sell team targets users who engage with multiple content types.
Category Buyers
Category Revenue
CDN Traffic by Day and Hour
The ops analytics team wants to profile CDN traffic patterns.
Cheapest High-Rated Product
The procurement team is looking for the cheapest product with a rating above 4.
Classify Services by Name
The infrastructure team wants to classify each service into categories: 'api_service' if the name contains 'api', 'cache
Clicked Holiday Impressions
The engagement team wants the count of clicked ad impressions belonging to campaigns with 'holiday' in the name, limited
Click Rate
Click vs Non-Click Rates
The search quality team needs two metrics in a single row: the percentage of all search queries where a result was click
Cloud Cost Stats by Provider
Finance needs the minimum, average, and maximum cloud cost per provider, combining data from both the cloud costs table
Cloud Cost Trend Analysis
The procurement team tracks cloud cost trends across billing periods.
Combined Cloud Spend by Region and Service
The finance team needs total cloud spending for each combination of region and service, merging data from both the cloud
Completion Rate
Consistent High-Quantity Revenue
The revenue team wants total transaction revenue for each user and product pair, but only for pairs where every single t
Content Recommendation Engine
The content recommendations team wants to surface pages users haven't discovered yet.
Content Session Counts
The product analytics team needs session metrics per content item.
Cost Density Extremes
The cloud team needs to identify regions with the highest and lowest cost density, defined as total cost divided by the
Cost Share Within Category
The FinOps team wants to see how each cost entry contributes to its category's total.
Creator Stats
Creators With Top-Rated Content
The content team wants to find creators with the most top-rated items.
Cross-Region Customers
The compliance team needs to count how many distinct customers have placed orders in a region different from their home
Cross-Variant User Pairs
The experimentation platform needs to pair users who participated in the same experiment under different variants but on
Cumulative Monthly Revenue Avg
Finance needs a cumulative average of monthly revenue for 2026.
Currently Active Feature Flags
The ops team monitors feature flag state changes.
Customers Without Orders
The CRM team needs a count of customers who have never placed an order.
Custom Message Type Counts
The messaging team wants to count how many times each user received a message with a custom type (anything other than 't
Daily Error Count Change
The SRE team is tracking daily error trends.
Daily Error Resolution Ratio
The trust and safety team needs a daily ratio of removed content to reported content.
Daily Metric Percentage Change
Calculate day over day percentage change in a metric per entity using LAG.
Daily Session and User Counts
The engagement team wants daily session metrics.
Daily Spam Impression Rate
Trust and safety wants the daily spam rate for ad impressions that had an associated page view.
Daily Top Endpoints
The ops dashboard needs the top 3 endpoints by call count for each day.
Data Repo Fix Commits
The data engineering team needs to count commits mentioning 'fix' in repos with 'data' in the name (case-insensitive), e
Days with More Edited Than Unedited Messages
The messaging analytics team needs all records from days when the number of distinct message recipients exceeded the num
Deduplicate and Keep Latest
ROW_NUMBER dedup keeping latest row per key.
Deduplicated Sales Volume by Category
Deduplicate transactions, then sum sales per category.
Department Cost by Status
HR is building a headcount and compensation dashboard.
Department Running Totals
Compute cumulative metric values within each department using window operations.
Department Snapshot
Deploy Author Performance Score
The platform team wants to score deployment authors over the past 12 months using the formula (100 minus average duratio
Deployment Failure Impact
The reliability team wants to measure failure impact by service.
Deployments per Environment
The DevOps team needs a quick environment breakdown showing how many deployments went to each environment.
Deploy Reliability Scores
Engineering is running a deploy reliability competition among teams.
Devices Per Age Bucket
Product analytics needs a breakdown of device diversity among younger users.
Device Type Serving Most Users
The growth team wants to know which device serves the most unique users.
Disabled Flag Ratio
The platform team needs to calculate the ratio of disabled feature flags among all flags that were updated on January 10
Distinct Chat Conversations
The messaging analytics team needs total unique conversations in chat_msgs.
Diverse Shoppers
DQ Fail Rate by Table
The data quality team is benchmarking check pass rates across tables.
DQ Score Spread
The QA team wants to measure the spread in data quality scores.
Duplicate DQ Check Records
The data quality team suspects duplicate check entries.
Duplicated User Event Messages
The stream ops team needs a duplicated report of messages from the 'alerts' topic.
Duplicate Training Runs
The ML platform team is flagging duplicate model training runs.
Early User Activation
The onboarding team wants to track early activation.
Efficient Pipeline Throughput
The SRE team is benchmarking pipeline efficiency.
Employee Seniority Rank
Self join to compare rows within the same table
Endpoint Latency Spread
The SRE team needs to understand latency spread across API endpoints.
Endpoint With Most GET-Only Users
The platform analytics team wants the endpoint with the most users who have exclusively used the GET method.
Engagement by Content Type
The content team needs total engagement broken down by content type, using duration in seconds as the engagement proxy.
Engagement Gap
Error Rate by Region
Error rate per day and region via conditional aggregation.
Exclusive Users per Device Type
The product team measures platform exclusivity.
Experiment Conversion Pivot
The A/B testing platform needs a pivot report of conversion outcomes by variant.
Extract Deploy Versions
The release management team needs to extract the numeric version from deploy log entries for the staging environment.
Extreme API Token Usage
The platform team needs to flag outlier API tokens.
Extreme Category Totals
Find groups with the highest and lowest sums.
Extremely Late Resolutions
The SRE team flags incidents where resolution took more than 20 minutes beyond the predicted time.
Failed Constraint Checks Count
The ops team wants to know how many data quality checks with a constraint-related rule resulted in a failure.
Failure Rate
Fastest CI Build Date
The data team wants to find the build date of the fastest CI build ever recorded, showing the date and the duration.
Fastest Completion Per Day
Rank by speed within daily partitions
Fastest Regions by Latency
The performance team benchmarks API endpoints by region.
Feature Flag Adoption
The platform team is auditing feature flag adoption.
Feature Quality by Source
The ML platform team is profiling feature quality across data sources.
Feature Vote Winner
The product team runs a feature voting system where each user splits a single vote equally among all the flags they vote
Filtered Multi-Join Aggregation
Join three tables with filters and aggregate.
Find the Fifth Largest Cost
The analytics team wants to find the fifth highest cloud cost amount.
First and Last Peak Accuracy Dates
The ML team wants to know when peak accuracy was first and last recorded.
First and Last Timeout Per Service
The reliability team needs the first and last times each service experienced a timeout.
First Deploy Attribution
The growth team measures first-touch attribution by service.
First Half of Page Views
The data team wants to retrieve the first half of all page view records, based on the total row count, ordered by view I
First Time Learners Per Day
Find each day's brand new users
First Touch Attribution
Find each user's earliest channel interaction.
Flaky Domains
Frequent Message Senders
The messaging team wants to identify spammy senders.
Fulfillable Order Percentage
Logistics wants the percentage of orders that can be fulfilled.
Ghost Products
Heavy Ad Exposure
Find users with concentrated or broad ad exposure across campaigns.
Heavy Hitters
Heavy Namespaces
Highest and Lowest Cloud Costs
Finance wants to spotlight the extremes in cloud spending.
Highest Daily Spend
Finance needs the customers with the highest daily spending between March 1 and June 1, 2026.
Highest Node Density Regions
The infrastructure capacity team wants to find the top 3 regions by node density, defined as distinct nodes divided by d
Highest Throughput Pipelines
The data engineering team needs to find which pipelines produce the highest throughput for runs started in 2026.
Inactive Android Control Users
The experimentation platform team is debugging a test cohort.
Inactive Users in Date Range
The growth team needs to identify users who had zero sessions between June 1 and July 1, 2026.
Inactive vs Suspended Engagement
The product team compares engagement between premium and free-tier users.
iOS Adoption by Age Bucket
Product analytics is measuring mobile adoption across age buckets.
iOS Sessions by Device Type
The growth team wants to measure mobile engagement by region.
Job Status Duration
The data pipeline team tracks job state transitions.
Keep Most Recent Record
Deduplicate rows keeping the latest per key
Keyword-Based User Search
The search quality team needs to find users whose search terms contain the words 'deploy', 'release', 'build', or 'ship'
Largest A/B Test by Participants
The experimentation platform team wants to find the test with the most unique participants.
Largest Single Cloud Cost
The FinOps team needs to find the single largest cloud cost entry, showing the service name and the amount.
Latency Gap to 10th Fastest
The SRE team wants to compare server 'web-prod-07' against the 10th-fastest server by average response time.
Latest Migration Output per Author
The data pipeline team needs each author's most recent applied migration balance.
Leading ML Frameworks by Accuracy
The ML platform team wants to identify the most popular training frameworks.
Least Viewed Content
The product team needs to find the least-viewed content for potential deprecation.
Longest Gap Between Token Events
The people ops team tracks hiring and attrition gaps.
Longest Running Pipeline
The platform team wants to find the single data pipeline that ran the longest, returning just that pipeline's name.
Long Messages
Long-Running Feature Flags
The platform team is auditing long-running feature flags.
Lowest Cost Network-Heavy Team
The cost optimization team is investigating teams that spend more on compute than storage.
Lowest Latency per Service
The SRE team wants the lowest latency recorded for each service in the us-east-1 region.
Low Severity Checks in 2026
The compliance team needs the total number of 'low' severity data quality checks that ran in 2026.
Low-Volume Stream Topics
The stream processing team is looking for low-volume topics.
March Revenue by Customer
The revenue team needs March 2026 transaction totals per customer.
Median Null Percentage of Float Features
For the data quality report, the team needs the median null percentage across all ML features that have a data type of '
Mentorship User Pairs
The people ops team is building mentorship pairs.
Metric Count
Metric Value Pairs Over Threshold
The QA team is exploring metric correlations.
Minimum Cost Per Provider
The FinOps team needs the lowest monthly cloud bill for each provider.
Mobile vs Desktop Session Duration
The growth team wants to compare user engagement by device type in 2026.
Models With Variable Accuracy
The ML platform team wants to find models whose accuracy has varied across versions.
Model Training Completion Rate
The ML platform team reviews model evaluation results.
Monthly Cohort Retention
Compute month over month retention rates for user signup cohorts.
Monthly Revenue Comparison
The growth team is tracking month-over-month revenue changes per product.
Monthly Running Total
Cumulative sales per product across months.
Monthly Spend Pivot by Provider
The FinOps team needs a pivot table of monthly cloud spend by provider.
Monthly Transaction Summary
The product analytics team needs a monthly engagement summary for 2026.
Month With Fewest Deploys
The on-call team wants to find which calendar month (across all years) had the fewest deployments.
Most Active Chat Users
The community team wants to recognize power users on the chat platform.
Most Active Recent Committers
The code velocity team wants to find the most active committers over the past 2 years relative to the latest commit in t
Most Active Servers by Log Volume
Infrastructure wants to identify the busiest servers by log volume in 2026.
Most Commented Code Review
Product wants to surface the code review that received the most comments.
Most Common Monday Outcome
The A/B testing team wants to find the most common experiment outcome among experiments created on a Monday.
Most Efficient API Endpoint
The SRE team is evaluating which API endpoint delivers the best throughput efficiency.
Most Frequent Error Types
The platform team wants to know which error types occur most frequently.
Most Ordered Product by Country
The international expansion team needs to identify popular products in specific markets.
Most Popular Content Type
The product team wants to identify the most popular content type.
Most Popular Signup Day
The product team wants to know which day of the week sees the most user signups.
Most Profitable Region Month
Finance wants to find the single most profitable region-month combination in 2026.
Multi-Host Regions by Node Type
The growth team is analyzing regional infrastructure density.
Multi-Table Report
Join three tables into a summary report.
Mutual Channel Connections
The social features team wants to find mutual connections between two specific users.
Negative Outcome Rate for New Users
Calculate the ratio of problematic events for users within their first 14 days.
Net Lines
New Customers Per Day
Count users whose first order falls on each date.
New User Purchases
Nodes by Region and Type
The infra team needs a headcount of nodes broken down by region and node type, sorted alphabetically by region.
Nodes in Key Regions
The infra team is auditing node distribution across six key regions: us-east-1, us-west-2, eu-west-1, eu-central-1, ap-s
Noisiest Tables by DQ Failures
The data quality team wants to surface the noisiest tables.
Noisy Endpoints
Non-Trivial Fatal Errors
The analytics team categorizes error messages by length: 'short' (under 50 chars), 'mid' (50 to 150), and 'long' (over 1
Notification Delivery Ratio
The messaging reliability team needs the delivery ratio: what fraction of all sent push notifications were actually open
Notification Open Rate
The ops team tracks how often push notifications are opened.
Notifications Pivot by Weekday
The product team wants notification volume pivoted by platform and day of week.
Nth Highest Salary Per Department
Find the 3rd-ranked value within each partition.
Opened Notifications in Jan-Feb
The marketing team needs to know how many push notifications were successfully opened during January and February combin
Over-Budget Services
The PMO team needs to flag over-budget data pipelines.
Overlapping User Sessions
Self join to detect overlapping date ranges per user.
Overloaded Infrastructure Nodes
The capacity planning team considers a node overloaded if its CPU exceeds 90% or its memory exceeds 85%.
Pages Viewed by Session Duration
The engagement team wants to understand which session durations drive the most page views.
Pairwise Latency Maximum
The analytics team needs all pairwise latency comparisons.
Peak API Hour
The capacity planning team needs to identify the peak traffic hour of the day.
Peak Hour Power Callers
The support analytics team wants to count how many distinct users made 3 or more API calls during the afternoon rush (15
Peak Latency for 2026-Era Endpoints
The platform team is reviewing peak API load.
Peak Retargeting Revenue Month
The revenue team needs a monthly breakdown for ad campaigns containing 'retarget' in 2026.
Pipeline Completion Rate
The onboarding team tracks how far users get through multi-step flows.
Pipeline Overhead by Environment
The data pipeline team compares processing overhead between production and staging.
Pipeline Recovery by Priority
The data platform team is benchmarking pipeline recovery times.
Pipe Throughput
Pivot Event Counts
Reshape rows into columns by event type.
Platform Speed
Pod CPU to Memory Ratio
The capacity planning team is analyzing resource efficiency.
Power Users
Power Users by Session Activity
The growth team defines a power user as someone with more than 3 sessions and over 100 total pages viewed across all ses
Power Users by Session Count
The growth team wants to find power users who have logged more than 3 sessions.
Price Rank
Priciest Item in Each Category
The product team wants to surface the highest-priced item per category for a pricing review.
Product Ratings vs Sales
Products With Strong Unit Price
The marketplace team wants to identify high-performing budget products.
Profit Tiers
Prolific Authors in Largest Service Teams
The people analytics team wants to identify senior leads in the largest service teams.
Provider Spend Variance Between Halves
The FinOps team wants to compare regional cloud spend between March 11, 2026 and April 11, 2026.
Push Notification Open Rate
The growth team tracks push notification engagement.
Push Notification Status Pivot
The notification analytics team needs a pivot showing push notification counts by status for each user.
Push Opens by Platform and Campaign
The push notification team is measuring opt-in engagement.
Quarterly Consolidated Cloud Costs
The finance team needs to consolidate international cloud costs into a single currency, using regional conversion factor
Rank Users by Search Query Count
The platform team wants to rank users by total search queries.
Rapid Retry Detection
Detect retried API calls within 5 minutes of failure.
Rate Limit Rules Per Endpoint
The rate limiting team wants to document threshold rules per endpoint.
Rating Tiers
Recent Price Drops
Find items with a price decrease or sale event recently.
Regional Order Summary
The regional ops team needs order metrics by region: the number of orders, the number of customers, and the total profit
Regions by Alert Volume
The ops team wants to rank regions by incident volume.
Region With Best Uptime
The reliability team needs the single region with the highest average service uptime.
Region With Most Nodes
The infrastructure team wants to know which region hosts the most nodes.
Repeat Buyers Across Halves
The retention team wants to find power users who made at least 3 transactions in both 2025 and 2026.
Repeated Transactions
Detect same amount transactions within 10 minutes.
Repeat Purchases Within a Week
Find users with orders less than 7 days apart.
Repeat Purchase Window
The retention squad is looking for repeat purchasers.
Repository Commit Ranking
The data engineering team wants to rank repositories by total lines added across all commits.
Repos with More Builds Than Commits
The CI/CD team wants to compare build volume to deploy volume.
Response Buckets
Returning Buyers
The retention team wants to identify returning buyers.
Revenue Per Product With Zeros
Finance is calculating total revenue per product, including products that have never sold.
Reviewer Performance Metrics
The code review analytics team wants performance metrics for each reviewer who has reviewed at least one merged pull req
Reviewers Per Repo Per Year
The code review team wants to know which repos had the most reviewers each year.
Revoked Tokens by Scope
The trust and safety team counts blocked API tokens by scope for December 2026.
Rolling Weekly Total
Rolling 7 day sum per user via window frame.
Rows With Multiple Flag Conditions
The data engineering team is cleaning err_tracks.
Runner-Up Cost Without ORDER BY
The cost team wants to identify the second highest cloud cost amount on record.
Running Tab
Rush Hour API Latency
The capacity planning team wants to analyze rush-hour API behavior in the 'us-east' region.
Same-Day Signup Rate
Percentage of transactions on the signup date.
Same First and Last Reply Target
The messaging platform team wants to find users whose first and last message in a channel on the same day were sent to t
Satisfaction by Platform
The product team is benchmarking user satisfaction across platforms.
Second Highest Cloud Cost
The compensation team wants the second highest distinct cloud cost amount on record.
Second Highest Latency by Method
Engineering leadership wants the second-highest latency API endpoint in each HTTP method group.
Senior to Junior Ratio
HR analytics is studying workforce age distribution per department.
Servers Returning to Origin
The infra team tracks server migrations between regions.
Server With Most Errors
The SRE team needs to find the single server with the most logged errors.
Service Budget per Head
The PMO wants to know budget-per-head for each data pipeline.
Service Component Classification
The observability team is classifying services by naming pattern.
Service Provider Statistics
Per endpoint stats: count, avg latency, high latency count.
Service Reliability Tiers
The product team wants to classify services by reliability tier based on uptime: 99.
Services at Median Uptime
The SRE team needs to identify services performing exactly at the median uptime.
Service Uptime Minutes
Calculate uptime from status change events.
Session Overview
Session Rank
Sessions by Content Type
The product analytics team is breaking down engagement by content format.
Shared Category Purchasers
The product team is building a recommendation engine and wants to find products purchased by more than one user.
Shared Endpoints
The security team is looking for shared credentials.
Signup to Subscription Rate
Conditional aggregation for conversion rates
Silent Users
Single Service Owners
The platform team is running an infrastructure audit and needs to identify owners who are responsible for exactly one se
Spending Tiers
Split Metric Sums
The analytics team needs two separate sums from employee metrics: one for metric IDs below 5 and another for metric IDs
Subscribers Without Premium
Find users who have one plan but never another
Successful Build Duration by Repository
The engineering productivity team measures CI throughput.
Sum Excluding Extremes
The finance team needs to remove outliers from a latency report.
Super Reviewers
The code review team wants to identify prolific reviewers.
Symmetric Reply Network
The messaging team stores directed replies in chat messages.
Tables With Many DQ Failures
The data quality team wants to find tables that have never passed a single check.
Tables With Most DQ Failures
The compliance team needs to know which tables have the most data quality failures.
Teams Below Double Average Spend
The data platform team tracks cost allocation by team.
Third Highest Spender
Finance is reviewing top spenders.
Three-Item Combinations
Generate all unique 3-item sets with total cost.
Three-Value Sum Combinations
The analytics team needs all combinations of 3 distinct employee metric records whose values sum to exactly 8.
Throttled Clients
Token Churn Rate
The growth team calculates churn for API tokens in September 2026.
Tokens With Non-Read Scope Prefix
The security team wants to verify that all API tokens have a scope starting with 'read'.
Top 10 AB Test Variants
The growth team is compiling a leaderboard of top-performing A/B test variants for the 'onboarding_flow_v2' test.
Top 10 CPU-Heavy Nodes
The infrastructure team needs a quick resource report showing the ten nodes consuming the most CPU, with their node ID,
Top 10 Rated Products
The product catalog team wants to highlight top-rated items.
Top 2 Active Push Days
The product team wants the two most active days during a recent push notification campaign (August 1 to 7).
Top 2 Ad Campaigns by Spend
The marketing team wants the top two ad campaigns by total spend, excluding any campaign whose name contains 'test'.
Top 2 Busiest API Slots
The ops team wants to identify their two busiest time slots each week.
Top 2 Callers per Endpoint
The team lead dashboard shows the top 2 users by API call volume for each endpoint.
Top 2 Cloud Services by Cost
The analytics team is forecasting demand for the holiday season.
Top 2 Rate-Limited Clients
The rate limiting team needs to find the two clients generating the most blocked requests.
Top 3 First-View Pages
The content team wants to identify the top 3 pages that users engage with earliest after signup.
Top 3 Revenue Months
Finance wants to identify the three highest-grossing months.
Top Accuracy Model
The ML platform team wants to highlight the best performing model.
Top Active API Tokens
The API platform team wants the issuance dates of the 5 most-used API tokens that have not yet expired.
Top Active Senders per Channel
The comms team wants the top 3 messages by reply count in each chat channel.
Top Alert Resolvers
The reliability team wants to recognize the engineers who resolve the most alerts.
Top API Caller
The analytics team wants to find the user who triggered the most API calls, showing their user ID and total call count.
Top AWS Non-APAC Service Costs
The procurement team wants the highest cloud cost in the AWS provider for each service that appears exclusively in US re
Top Batch Job Under Priority 1
The batch processing team wants to find the job(s) with priority 1 that achieved the highest rows completed.
Top Campaign by Opens
The growth team wants to find which campaign drove the most push notification opens.
Top Campaign by User Revenue
For each user who clicked an ad, the marketing team wants to know which campaign generated the most revenue for that use
Top Chat Contributors
The messaging platform team wants the top 10 most active users by total messages.
Top Committers in 2025
Engineering leadership wants to know which repo authors had the most top-10-ranked commits by lines added in 2026.
Top Content by Lifetime Value
The content team defines a content item's lifetime value as the total viewing duration (sum of dur_ms from page views) a
Top Content by Views
The product team wants the top 5 content items by total page views.
Top Content Flagger
The trust and safety team reviews flagged content.
Top Cost Categories
The cloud finance team wants the top 3 cost categories by total spend.
Top Cost Entry per Team
Engineering leadership wants the highest-cost entry within each team.
Top Earner Per Campaign
The growth team needs to know which user earns the most revenue per ad campaign.
Top Error Categories in 2025
The error tracking team wants the top error categories for 2026.
Top Frameworks by Accuracy
The ML platform team wants the top 3 frameworks by average accuracy among production models.
Top Identified Event Types
The content team wants to identify the top 3 users by total event count, but only those who had more 'create' events tha
Top Lessons Each Month
Rank items within time periods and keep top 3
Top Metric per Department
Engineering leadership wants to spotlight the top performer in each team.
Top Pattern Matches
Find rows matching a text pattern, ranked by count.
Top Percentile Spenders
Top 1% of users by total spend via percentile bucketing.
Top Product Categories
The product team wants the top 3 content categories by total page views.
Top Product Category by Transactions
The regional sales team wants to find organic transactions in August 2026, meaning purchases where the user had no activ
Top Products by Quantity Sold
The revenue team is preparing their Q4 analysis and needs to see which products moved the most volume.
Top Products per Category
Find the top 5 products by total sales within each product category.
Top Products per Category
The merchandising team wants the top 2 products by total quantity sold in each category.
Top Ranked Content Creators
Rank creators by top-10 content appearances.
Top Region by Order Volume
The ops team needs the single region that generates the most orders.
Top Regions by Critical Alerts
The SRE team wants the top 5 regions by critical alert volume.
Top Regions by Effective Uptime
The SRE team calculates effective uptime hours for each service region.
Top Repos by Successful Builds
The DevOps team wants to find which repos produce the most successful CI builds.
Top Revenue Products H1
The revenue team is doing their H1 review and wants to see the top 5 products by total amount for January through June.
Top Services by Regional Cost
The FinOps team is analyzing spend distribution in the 'us-west-2' region.
Top Services by Uptime
The SRE team wants to rank services by average uptime.
Top Services Per Provider
Rank services by total spend within each cloud provider and return the top two.
Top Spender
Top Users by Pages Viewed
The engagement team wants the top 5 users with the most total pages viewed across all sessions.
Top Users by Recent Spend
Find the top 10 users by total transaction amount in the last 30 days.
Top Users by Session Time
The product team wants to find the top 10 users by total session duration.
Transaction Share of User Spend
The finance team is analyzing each transaction's share of its user's total spend.
Transaction Timeline
Trend Spotter
Unclicked Searches by Campaign
The product team wants to know how many search queries were performed per ad campaign in the us-east-1 region by users w
Unique Hostnames per Region
The infrastructure team needs to count unique hostnames per region.
Unique Reporters per Content
The content moderation team needs a per-item flag count.
Unmatched Deploy Services
The data platform team maintains two pipeline registries and wants to find service names that appear in deploy logs but
Unsold Product Categories
The catalog team suspects dead inventory is inflating storage costs.
US Active User Share
The regional analytics team wants to know what percentage of active users are based in the US.
User Devices
User Engagement Summary
The engagement team needs a holistic user summary showing total sessions and total search queries per user.
Users Outperforming Control
The experimentation team suspects some users perform better in treatment than control.
Users With Admin Tokens
The permissions team needs to find users who hold API tokens with 'admin' in the scope.
Users Without Sessions
The product team wants to find users who have never started a session.
User With Most Transactions
The ops team wants to identify top contributors.
Word Count Per Message
The content team wants to estimate message verbosity.
Workers Earning Above Department Average
The product analytics team needs to know how many transactions each product has.
Yearly Build Duration by Repo
Engineering leadership wants to benchmark build times.
Year-over-Year Content Launches
Product management wants to measure launch velocity changes.
Zero Accuracy on First Training
The ML platform team wants to know what percentage of models had an accuracy of 0 on their very first training run.
Cumulative Sales Per Customer
Compute a running total of transactions per customer, ordered by transaction date. Reported from a Deloitte technical round.
Funnel Leakage Report
Build a multi-step conversion funnel with drop-off rates, broken down by tags.
The Session Stitcher
Sessionize a stream of page views using gap-based detection with LAG and running SUM window functions.
The Regional Cost Reconciliation
Build a running financial balance per region from transactions, orders, and cloud_costs using UNION ALL and window SUM.
The Cannibalization Report
Analyze product cannibalization by comparing category sales before and after new product launches.
2nd Most Common Content Type
The product team is investigating the 3rd most common content type among items created by creators who also appear in th
7-Day Onboarding Conversion
The growth team is measuring onboarding-to-engagement conversion.
Above Category Avg
Active User Penetration Rate
The growth team defines an active user as one whose last session started within the last 30 days, who has at least 5 ses
Adopters Before Migration
Find users of feature A who never used B before A
Aggregate Votes by Paper Subject
The finance team needs daily net revenue for product 101 in the US region between April 15 and April 28, 2026.
Allocations in Top Spending Region
The finance team needs all cost allocation entries for teams operating in whichever region has the highest total cloud s
Alphabetical Tag Sort
The content moderation team needs to normalize tag lists.
API Call Distribution Fraction
The SRE team needs to see what fraction of total API calls each method and status combination represents.
Average Event Progression Time
The product analytics team is measuring funnel velocity.
Average Sessions Per User
The social features team wants to know the average number of sessions per user.
Best Selling Product by Month
The revenue team needs a monthly report showing the highest-grossing product each month by total transaction amount (qua
Bottom 2% Services by Spend
The cost optimization team needs to identify the bottom 2% of services by total cloud spend in May 2026.
Cache Efficiency
Campaign Bookend Engagement
Product analytics wants to compare engagement bookends for each ad campaign.
Campaign Conversion Count
The growth team ran a push notification campaign starting one day after each user's first transaction.
Campaign Conversion Window
Campaign Engagement Rank Shift
The international growth team wants to compare ad engagement across countries between December 2025 and January 2026.
Category Deep Dive
Cheapest and Most Expensive Service per Region
The finance team needs the cheapest and most expensive service in each region by cloud cost, combining data from both th
Cheapest CDN Route
The infra team is optimizing cross-region data transfer costs.
Classify Accounts by Activity Tier
The analytics team wants to find the highest latency value that appears exactly once across all API calls.
Clean Migrations
Cloud Cost Breakdown by Provider
The FinOps team is reviewing cloud costs by provider from 2022 to 2025.
Commit Cadence
Consecutive Cost Growth Periods
The finance team wants regions where cloud spending has increased for 5 consecutive billing periods.
Content Page Spreads
The documentation team formats content items into two-column page spreads.
Cost Efficiency Variance
The cloud finance team analyzes cost efficiency.
Creator Favorite Content Type
For each content creator, the editorial team wants to know which content type they publish most often.
Department Quarterly Pivot
Engineering leadership wants a pivot table showing employee count by department and fiscal quarter.
Deploy Velocity
Endpoint Name Word Count
The data catalog team wants to measure endpoint name complexity.
Endpoint Ranking
Error Category Breakdown
The SRE team is categorizing error tracks for an incident postmortem.
Exact Keyword Counts in Logs
The SRE team needs to count exact occurrences of the words 'error' and 'warning' in server log messages.
Experiment Impact
Experiment Variant Ratios
The experimentation platform needs a participation ratio report.
Fastest and Slowest Services by Region
The platform team needs the highest and lowest latency services per region, excluding any region whose name contains 'te
Fastest Page View to Click
The performance team is investigating page interaction latency.
Feature Flag Engagement Impact
The product analytics team wants to understand how feature flags impact engagement.
Feature Flag Fan vs Detractor Pairs
The experimentation team wants to pair the strongest fans of each feature flag with its strongest detractors, matched by
Feature Name Intersection
The ML features team maintains feature names from two sources: 'training' and 'serving'.
First-Day Session Retention
The growth team needs the first-day retention rate.
First Interaction Credit
Attribute transactions to earliest touchpoint
Flatten Org Chart Hierarchy
Traverse a tree structure with recursive queries.
Friday Spending Analysis
The finance team is analyzing Friday spending patterns during Q1.
Full Funnel
Healthiest Service Check History
The SRE team wants to audit the healthiest service's check history.
High Engagement Pages
The product analytics team measures content engagement depth.
Impressions by Search Keyword
The ad analytics team wants to measure campaign performance by search keyword.
Incident Keyword Messages
The content moderation team is flagging chat messages that mention incident keywords: 'outage', 'incident', 'rollback',
Intra-Region Latency Diff
The infra team is comparing latency differences between nodes in the same region.
Largest CDN Response
The CDN team is debugging edge performance and wants to know which edge location served the largest single response in 2
Latency Quartiles Per Endpoint
The performance analytics team wants quartile breakdowns of API latency per endpoint.
Latency Variance and Std Dev
The ML team wants to understand how much latency varies across successful API calls (status 200).
Longest Uptime Streak
The SRE team tracks service health check results as 'pass' or 'fail'.
Longest Visit Streaks
The growth team needs the top 3 users with the longest continuous daily visit streaks up to 2026-08-10.
Lowest CPU Pods per Namespace
For each Kubernetes namespace, the infra team wants the 5 pods with the lowest CPU usage.
Market Share
Median Cloud Cost by Service
The compensation team needs the median cloud cost amount for each service.
Median Failure Rate by Table
The data quality team wants the median failure percentage for each table being monitored, rounded to the nearest whole n
Median Household Earnings
The cost optimization team needs employees whose cloud cost allocation is above average for their team but who are not a
Median Model Accuracy
The ML platform team needs the median accuracy for each model.
Median Transaction by Category
The finance team needs the median transaction amount for each product category.
Minimum Parallel Workers
The platform team needs to determine the minimum number of parallel workers required to run all batch jobs without confl
Model Accuracy Drift
The ML platform team wants to evaluate model drift.
Mode of Small Team Costs
The cloud FinOps team wants to find the most common per-service cost amount among small teams (teams with no more than 3
Monthly Cloud Cost Forecast Error
The data science team is building a naive forecast for cost per row, defined as cloud cost amount divided by pipeline ro
Monthly Deploy Counts Pivoted
The observability team wants to see monthly deployment counts for the 'production' environment, pivoted into separate co
Monthly Revenue Change
The finance team needs month-over-month revenue trends.
Monthly Service Retention
The product analytics team calculates monthly retention per service.
Most Efficient High-Volume Campaign
The marketing team wants to find the ad campaign with the lowest peak monthly spend that still delivers more than 1500 c
Most Efficient Region by Token Usage
The growth team ranks regions by token efficiency.
Multi-Category Buyers
Multi-Month Active Users
The growth team's retention analysis needs to identify users who logged sessions on at least 3 consecutive calendar days
New Services With Poor Health
The product quality team reviews services launched in H2 2026.
New vs Returning User Share
The growth analytics team calculates the monthly share of new vs returning users.
Oldest Alert per Service
For the incident review, the on-call team needs the oldest unresolved alert for each service.
Peak Concurrent Pods
The infra team needs peak concurrent pod usage.
Peak Concurrent Tokens
For every API token that has ever been issued, find the greatest number of concurrently active tokens during its lifetim
Pipeline Duration vs Throughput
The data engineering team wants to know whether pipeline throughput correlates with duration.
Previous Day Top Service
The daily revenue dashboard needs each date's top-spending service from the previous day.
Price Pairs
Quarterly Peak Cloud Costs
Finance needs a quarterly pivot for the cloud cost review.
Quarter-over-Quarter Latency Trend
The SRE team is tracking how average API latency has changed quarter over quarter from Q1 2023 through Q4 2025.
Regional Sales Growth QoQ
The finance team needs quarter-over-quarter sales growth by region.
Regional Spend Leaders
Resolved vs Unresolved Alerts
The incident response team needs a breakdown of alerts by severity.
Rolling Revenue Average
Finance needs a 3-month rolling average of total revenue from transactions, excluding refunds (negative amounts).
Running Total With CTE
CTE plus window function for running aggregate
Same-Day Session and Transaction Correlation
The product team wants to correlate sessions with same-day orders.
Search Algorithm Rating
The search quality team is evaluating search result ranking.
Search Term Length vs Click Rates
The search team is exploring a correlation between query length and click behavior.
Second Purchase
Sequential Service Transitions
The career mobility team analyzes job transitions.
Service Scorecard
Services Hitting Cost Threshold
The finance team's monthly review needs to know what percentage of services reached at least $100 in monthly cloud spend
Services With Most Checks in 2025
The compliance team wants to find services that had more health checks in 2026 than in any other individual year.
Services With Multi-Quarter Uptime
The reliability team monitors service uptime streaks.
Service Uptime Turnaround
The product team wants to find services that experienced a turnaround in health: at least 3 consecutive months of declin
Service With Most Critical Alerts
The incident commander wants the full details for the service that has received the most critical-severity alerts.
Session Count Distribution
The engagement analytics team wants to understand session count distribution among users who signed up between 2025 and
Session Page View Distance
The ML platform team is computing session travel distances from page view data.
Shared Channel Contacts
The trust and safety team is mapping user networks through chat messages.
Spend and Rank
Spending Range
Streak Status Changes
Detect value changes across consecutive rows
Stream Gaps
Team Cost Allocation Comparison
Engineering leadership is comparing individual cost allocations to their team lead and department average.
Tenure Spread for Active Tokens
The team analytics dashboard compares tenure extremes among active API tokens.
Top 3 Monthly Costs per Team
The FinOps team needs a top-spend breakdown.
Top and Bottom Cloud Spenders
Finance wants a quick compensation snapshot for the board deck.
Top Commit Authors by Repo
For the engineering review, find the top 3 commit authors in each repo by average lines added.
Top CPU Pods per Namespace
The engineering review board wants the two highest-CPU pods in each namespace, pivoted so each row is a namespace with i
Top Endpoint by Power Users
The platform team wants to find the most popular endpoint among power users.
Top Flagged Campaign Resolutions
The content moderation dashboard needs the ad campaign(s) that received the most user-reported alerts.
Top Framework by Deployments
The data team wants to identify the top framework used by deployed models.
Top Models by Framework
The ML ops team wants to find the top-performing models by their primary framework.
Top Per Category
Top Percentile API Tokens
The security team wants to identify the most suspicious API tokens.
Top Regions by High CPU Nodes
The infra team wants the top 5 regions by count of nodes running above 90% CPU.
Total Hours Between Consecutive Events
The platform team tracks user session state changes.
Transaction-Only Features
The data team needs to find features exclusive to the 'warehouse' source that do not appear in 'streaming' or 'api'.
Upvote Percentage by Age Cohort
The search quality team wants to compare search success rates between new and existing users.
User 360
User Campaign Overlap Percentage
The recommendations team is measuring user similarity based on ad campaign exposure.
User Connection Score
The social analytics team calculates each user's connection score.
User Spend Segmentation by Category
The customer analytics team segments users by spending behavior per region.
Users Who Churned in February
The platform team is investigating churn signals.
Users With and Without Ad Clicks
The compensation team needs a count of users who received at least one ad click bonus versus those who did not.
Viewer-to-Purchaser Activity
The content analytics team wants to know which users became creators after starting as consumers.
Weekly Order Status Report
The ops team needs a weekly order status report.
Weekly Transaction Day Split
The revenue analytics team is analyzing transaction patterns by day of week.
Weighted Variant Selection
Select a row using cumulative weight probabilities.
Worst Table Per Year by DQ Failures
The data quality team wants to find the table with the most failed checks each year.
YoY Signup Growth Rate
The growth team needs to calculate year-over-year user signup growth.
Zero-Retry Job Ratio by Priority
The HR analytics team needs the ratio of batch jobs that completed without any retries to total jobs, broken down by pri
Slowly Changing Dimension Type 2
Given a customer dimension table with historical records, query for the current address and handle SCD Type 2 logic. Reported from Cognizant and EY interviews.
Normalization Tradeoffs in Practice
Given a denormalized orders table with data anomalies, normalize it to 3NF and discuss when you would denormalize back. From EY and PwC senior interviews.
The Max Frequency Element
Find the hottest items in the transaction log, ties included.
The Dedup Preserver
Clean up duplicate events without losing the timeline.
The Forward Fill
Patch the gaps in a noisy sensor stream.
The Word Mismatch
Spot the differences between two text snippets.
The Friend Counter
How popular is everyone in a messy social graph?
The Sequel Spotter
Automatically detect which books in a catalog are sequels.
The Nth Highest Key
Who's sitting in nth place on the leaderboard?
The Character Encoder
Squeeze a string down to its shortest representation.
The Monotonic Check
Is this time-series feed moving in one direction?
The IP Validator
Sort the real IP addresses from the garbage.
The Log Frequency Counter
What's blowing up in the log dump?
The Deduplicator
Strip the repeats, keep the originals.
The Config Merger
Two configs walk into a bar. One walks out.
Flatten the Feed
Nested lists all the way down. Make them flat.
Batch With Metadata
Chop a list into labeled chunks.
Caesar Shift Check
Check if a fixed alphabet shift transforms one string into another.
Character Frequency Counter
Count character occurrences and return as a dictionary.
Coalesce Fields
Column Max
Column Range
Column Sum
Compute Median
Compute list median without built in functions.
Count Vowels
Count vowel characters in a string.
Dominant Element
Find the element that appears more than half the time in a collection.
Even Filter
Explode List
Extract Domain
Fibonacci List
Generate the first n Fibonacci numbers iteratively.
Find Duplicate Values
Return elements that appear more than once.
Find Maximum Value
Find the largest number without using max().
Find Maximum Without Built-in
Iterate to find the largest value manually.
First Unique Character
Find the first character that appears only once.
Flat Map
Flatten a mixed list of ints and sublists into one list.
Forward Fill Missing Values
Replace None entries in a list with the most recent non None value.
Greeting Formatter Class
Class with string init and case sorting method
Letter Frequency Counter
Count occurrences of each character in a string.
Majority Element Median
Find median when one value dominates.
Merge Sorted Lists
Combine two sorted lists into one sorted list.
Most Reviewed Entry
Find the dictionary key whose list value has the most elements.
Normalize Name
Nth Highest Value
Find the nth largest value from an unsorted dictionary.
Nth Missing Number
Find the nth number absent from a sorted list.
Null Counter
Ordered Character Check
Check if all As appear before all Bs.
Primes Below N
Return all prime numbers less than n.
Progress Milestones
Report progress at each 10% increment.
Quality Gate
Quantile Calculator
Find the value at a given percentile.
Record Filter
Reverse Field
Run Length Encoding
Compress consecutive repeated characters with counts.
Sample Variance
Compute the sample variance of a list of numbers, rounded to 2 decimal places.
Sanitize Field
Schema Checker
Second Largest Element
Find the second-highest value in a list.
Sequential Word Pairs
Extract all consecutive two word pairs from a sentence.
Single Element Among Pairs
Find the one unpaired element in a sorted array.
Singleton Value Filter
Extract values that appear exactly once in a collection.
Sort by Value
The keys stay. The order changes.
Sort Descending
Sort Version Numbers
Sort semantic version strings oldest to newest.
Stack Class
Implement push, pop, and peek on a list.
String Compression
Compress consecutive repeated characters.
The Bracket Validator
Open it. Close it. In the right order.
The Bug Spotter
The code runs. The output is wrong. Fix it.
The Clock Angle
Hour hand, minute hand. How far apart?
The Column Transformer
A function per column. Apply them all.
The Manual Dedup
Remove repeats. No shortcuts allowed.
The Password Builder
Random characters. Fixed rules.
The Roman Converter
Letters to numbers. Old school.
Tokenize
Total Activity Minutes
Sum durations for matching activities.
Transform Column
Type Caster
Unique Dictionary Values Sorted
Sorted list of values appearing exactly once.
Unique Values
Valid Palindrome
Check if a string reads the same forwards and backwards.
Value Count
Word Count and Unique Words
Count words and return unique ones.
Word Counter
Word Frequency
Count occurrences of each word in a text.
Zip to Record
Maximum in a List
Write a function to find the maximum number in a list of integers. A warm-up screener from Wipro and Infosys.
The Hourly Event Bucketer
Count event types per hour from timestamped log entries.
The Sorted List Merger
Merge two sorted lists into one sorted list without using sort().
The Dictionary Inverter
Invert a dictionary so values become keys mapping to sorted lists of original keys.
The String Compressor
Compress consecutive character runs, returning original if not shorter.
The Bracket Validator
Validate that all bracket pairs in a string are properly balanced and nested.
The Best Trade Finder
Find the single best buy-then-sell opportunity in a price series.
The Stream Averager
Track running averages per key from a stream of tagged readings.
The Biggest Tipper
Find the most generous customer across all their orders.
The JSON Flattener
Turn a deeply nested API response into a flat row.
The Resume Filter
Extract candidate info from URLs, skip the ones you already know.
The Highest Salary Finder
Which job titles belong to the company's top earners?
The Most Frequent Review
Which piece of customer feedback shows up everywhere?
The File Size Profiler
Which file types are eating all the disk space?
The Interval Merger
Overlapping viewing sessions need to become one clean timeline.
The Stock Range Finder
Find the most volatile trading day for a given ticker.
The Status Code Counter
Make sense of a pile of raw Nginx access logs.
The Budget Allocator
Divide project money fairly when people wear multiple hats.
The Trade Log Aggregator
Break down daily trading volume by exchange from raw logs.
The Timezone Trip Counter
Monthly trip counts, but the timezone matters more than you think.
The Host Ranker
Rank vacation rental hosts by their total inventory.
The Email Ranker
Rank users by how much email activity they generate.
The Consecutive Streak
Which users showed up every single day without a gap?
The Schema Differ
What changed between yesterday's schema and today's?
The Rate Limiter
Too many requests? Not on your watch.
The Event Aggregator
Bucket a firehose of events into tidy time windows.
The Record Reconciler
Two systems, same data. What doesn't match?
The Dependency Resolver
Figure out what runs first when everything depends on everything.
Batch Partitioner
Batch Records
Char Profile
Cumulative Sum
Decode Frequency String
Parse encoded character frequencies with multipliers.
Deep Flatten
Deep Get
Detect Cycle in Sequence
Check if following next-pointers forms a loop.
Detect Outliers
Diagonal Extract
Dice Roll Scoring
Score a set of dice by matching patterns.
Dictionary Key Intersection
Shared keys between two dicts, sorted by key.
Execution Timer Wrapper
Build a reusable wrapper that measures how long a function takes to run.
Extract Leaf Values
Find Indices
Find Mode
Flatten Config Map
Convert a nested dictionary into a flat dictionary with dot separated keys.
Full Outer Zip
Group By
Lag Column
Least Recently Used Cache
Implement a fixed capacity cache that evicts the least recently used entry.
Left Join
Longest Palindromic Substring
Find the longest palindromic substring within a given string.
Maximum Subarray Sum
Find the contiguous subarray with the largest sum.
Max Length Token
Merge Counters
Merge Overlapping Time Ranges
Merge overlapping time ranges into non overlapping intervals.
Parse Log Line
Permissions Manager
Manage user permissions with config updates.
Pivot Rows to Columns
Reshape ordered rows into numbered column dict
Portfolio Profit Calculator
Compute total gain from purchase history and prices.
Precision and Recall
Compute classification accuracy metrics.
Prefix Based Word Replacement
Replace words with their shortest matching root prefix.
Rank Metrics
Rename Keys
Rotate Buffer
Row Aggregates
Running Batch Average
Maintain a running average across multiple calls.
Running Distinct Count
Search in Rotated Sorted Array
Find a target value in a sorted array that has been rotated at an unknown pivot.
Sliding Window Average
Compute a rolling average over recent values.
The Change Tracker
Old snapshot, new snapshot. What changed?
The Chunked Reader
Too big for memory. Read it in pieces.
The Iterator Flattener
Nested iterators. One stream out.
The Level Summer
A tree of numbers. Sum each level.
The Map Reducer
Map each item. Reduce to one answer.
The Response Aggregator
Multiple pages of results. One summary.
The Scoreboard Race
Simulate rounds until someone hits the target.
The Yahtzee Scorer
Five dice. Score each category.
Threshold Filter
Top N Keys
Transpose Table
Triplet Sum
Find all unique triplets summing to a target.
Unflatten Keys
URL Shortener
Map long URLs to short codes and back.
Validate Email
Valid Triangle Count
Count how many triplets from a list of lengths can form a valid triangle.
Weekly Buckets
Group timestamps into 7-day intervals.
Distribute Values Into Container Types
Sort values from a list into rotating container types (set, list, tuple). A real PwC data engineer interview question.
The Nearest Value Mapper
Map each key to its nearest value, breaking ties by picking the smaller one.
The Target Sum Pairer
Find all index-unique pairs that sum to a target, preserving scan order.
The Event Overlap Detector
Find all pairs of named events whose time intervals overlap.
The Consecutive Sequence Finder
Find the longest run of consecutive integers in an unsorted list using O(n) time.
The File Tree Builder
Build a nested directory tree from a flat list of file paths.
The Two-Stack Queue
Implement a FIFO queue using two stacks with O(1) amortized operations.
The Category Ranker
Return the top N records per category, sorted by value descending.
The Rate Limiter
Throttle abusive API clients without blocking everyone else.
The Change Data Capture
What's new, what changed, and what vanished?
The Stream Joiner
Match events across two streams when timestamps almost line up.
The Anomaly Detector
Spot the outliers before they page someone.
The Schema Migrator
Old schema in, new schema out. Hundreds of records at a time.
The DAG Executor
Wire up a mini pipeline and watch it run.
Common Prefix
Data Quality Report
Group Average
Merge Intervals
Pivot Records
Stream-Process a Large CSV
Write a generator that processes a large CSV file line-by-line without loading it all into memory. From Deloitte and EY interviews.
The Meeting Room Allocator
Find minimum conference rooms needed for a set of meeting intervals.
The Streaming Median Tracker
Track the running median of a number stream with efficient insertion.
The Hierarchy Builder
Convert flat parent-child pairs into a nested dictionary tree.
The Peak Performance Window
Find the contiguous subarray with the largest sum using Kadane's algorithm.
Customer Address History
People move. Sometimes twice in a month. How do you remember where everyone was, and when?
B2B Invoicing Data Model
Invoices go out, partial payments trickle in, and some customers are three months overdue.
Fitness Studio Membership Schema
Classes fill up, waitlists grow, members no-show, and billing happens monthly. Keep it all straight.
Loan Management Schema
Money goes out, payments come back in (hopefully), and the balance in between needs to be exactly right.
Toll Road Sensor Analytics
Cars enter, cars exit. Except when they don't. And the sensors aren't always paying attention.
Fitness App Data Model
Reps, sets, streaks, and personal bests. Gym rats love their stats.
Ride-Sharing Platform Schema
People need rides, drivers need fares, and someone needs to keep track of it all.
Employee Transfer Tracking System
People switch teams, move offices, get new managers. HR wants the full story, start to finish.
Movie Streaming Analytics Schema
Someone pressed play. Did they finish? Did they skip the intro? Content licensing adds another twist.
Log Parsing Pipeline Schema
Raw text files, terabytes of them, full of buried signals and cryptic error codes.
Livestream Analytics Schema
Someone goes live, thousands tune in, chat explodes, and virtual gifts start flying.
POS Sales Data Warehouse
Every beep at every register across hundreds of locations. Coupons and returns make it messy.
Online Retail Star Schema
Prices change, categories shift, and the BI team wants to slice revenue every way imaginable.
Social Platform Data Model
Follows, posts, likes, replies to replies. The interactions never stop and they all connect.
Subscription Churn Analysis Model
Subscribers are leaving and nobody knows why. Somewhere in the listening data, there are clues.
Retailer Data Warehouse Design
Queries are crawling and the analytics team is getting impatient. Time to rethink how the data is organized.
Clickstream and Session Schema
Millions of clicks, most of them anonymous. Somehow you need to stitch them into coherent stories.
Housing Marketplace Analytics
Sellers want buyers, buyers want deals, and the company wants to know why conversions are so low.
Trending Dishes Dashboard
What's everyone eating right now? The answer changes by the hour and varies by neighborhood.
Telecom Network Connectivity Warehouse
Thousands of devices, all connected. When one goes down, the ripple effects get interesting.
Metric Definition Reverse Engineering
Five numbers on a dashboard. Your job: figure out where they come from.
Property Booking Platform
Vacations are complicated. Between availability, pricing, and guest reviews, there is a lot to keep straight.
E-Commerce Supply Chain Tracking
A package takes a winding journey from origin to doorstep. Sometimes it splits up along the way.
SCD Type 2 Customer Dimension
Things were different six months ago. Can you prove it?
Financial Trading Warehouse
Every trade, every tick, every fraction of a share. The regulators want receipts.
Insurance Claims Lifecycle
A claim gets filed. Then it gets complicated. Then it gets reassigned. Then it loops back.
Hourly ETL Pipeline with Consistency
Design an ETL pipeline that loads data every hour with consistency and reliability.
Time Series CSV Ingestion Pipeline
Design a full ETL architecture to ingest a large CSV of time series data.
Order and Menu Recommendation Pipeline
Design a pipeline that ingests orders and matches them to menus for recommendations.
Streaming Pipeline Design
Design a data streaming pipeline.
Data Pipeline for Sales Analytics
Design a data pipeline for handling large volumes of sales data.
Batch ETL: MongoDB to Redshift
Design a batch ETL pipeline to move data from MongoDB to Redshift.
Whiteboard ETL Pipeline Design
Design an ETL pipeline on a whiteboard with file format, services, and data model decisions.
Real-Time Fraud Detection Pipeline
Design a scalable data pipeline for real-time fraud detection.
Event System for Multiple Consumers
Design a data processing pipeline for an event system shared across multiple applications.
Real-Time Sales Lakehouse Ingestion
Design an architecture to process real-time sales in a data lake.
Netflix-Scale Data Pipeline
Design a data pipeline that scales to Netflix's volume.
Data Ingest Pipeline with Access Tradeoffs
Design a data ingestion pipeline considering how the data will be accessed.