SQL窗口函数实战:5个高频场景完整代码
大家好我是船长。今天聊一个数据分析里最容易被问到的问题SQL窗口函数。很多人在面试的时候会被问到但真正用过的人其实不多。今天船长把实际工作中最常用的5个场景完整代码分享出来。01 什么是窗口函数先说基础概念。普通聚合函数如SUM、AVG、COUNT会把多行聚合成一行。窗口函数不一样。它保留原有行同时给每一行加上窗口计算的结果。简单说窗口函数 不减少行数的聚合计算基本语法SELECT 字段, 聚合函数 OVER (PARTITION BY 分组 ORDER BY 排序) FROM 表02 场景1计算累计销售额这个场景太常见了。按日期看每天的销售额同时想知道到这一天为止累计卖了多少钱。SELECT order_date, daily_sales, SUM(daily_sales) OVER ( ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_sales FROM sales_daily关键点ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW意思是从第一行到当前行。这句决定了窗口的范围。03 场景2计算同比/环比增长率有了窗口函数计算增长率变得很简单。SELECT month, sales, LAG(sales, 1) OVER (ORDER BY month) AS last_month_sales, ROUND( (sales - LAG(sales, 1) OVER (ORDER BY month)) / LAG(sales, 1) OVER (ORDER BY month) * 100, 2 ) AS mom_growth_pct FROM monthly_salesLAG函数取当前行之前的第N行数据。这里用LAG(sales, 1)取上月销售额。04 场景3分组排名按部门分组计算员工排名。SELECT department, employee_name, sales, RANK() OVER ( PARTITION BY department ORDER BY sales DESC ) AS dept_rank FROM employee_salesPARTITION BY按部门分组后再排序每个部门独立排名。RANK()如果有并列会跳号1,1,3。如果需要不跳号用DENSE_RANK()。05 场景4计算移动平均值消除数据波动看趋势的时候特别有用。SELECT order_date, daily_sales, ROUND( AVG(daily_sales) OVER ( ORDER BY order_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ), 2 ) AS ma_7d FROM sales_dailyROWS BETWEEN 6 PRECEDING AND CURRENT ROW当前行 往前6行 7天窗口。算出来的就是7日移动平均。06 场景5计算留存率这是增长分析的灵魂。WITH first_login AS ( SELECT user_id, MIN(login_date) AS first_date FROM user_logins GROUP BY user_id ), retention AS ( SELECT fl.user_id, DATEDIFF(ul.login_date, fl.first_date) AS days_since_first FROM user_logins ul JOIN first_login fl ON ul.user_id fl.user_id ) SELECT days_since_first, COUNT(DISTINCT user_id) AS retained_users, ROUND( COUNT(DISTINCT user_id) * 100.0 / (SELECT COUNT(DISTINCT user_id) FROM first_login), 2 ) AS retention_rate FROM retention GROUP BY days_since_first ORDER BY days_since_first这个查询会输出每天的留存用户数和留存率。07 常用窗口函数速查聚合类窗口函数SUM() / AVG() / COUNT() / MAX() / MIN()导航类窗口函数LAG(col, N) — 取前N行LEAD(col, N) — 取后N行排名类窗口函数ROW_NUMBER() — 1,2,3,4不并列RANK() — 1,1,3并列跳号DENSE_RANK() — 1,1,2并列不跳号分布类窗口函数NTILE(N) — 把数据分成N组PERCENT_RANK() — 计算百分位排名记住窗口函数的核心是OVER子句。理解OVER怎么用就理解了一切。有问题评论区见。