关于Oracle中Sort Merge Join的改写

业务场景的问题,我们有一个刷CUBE的SQL,是Oracle环境,平时跑70多分钟,

?

但是最近突然不动了,这个SQL需要算累计值,比如年累计客户数量。

?

累计值是什么意思呢?我们使用下面的数据来说明问题。

select ‘201901‘ as c_month,100 as c_customers from dual union all 
select ‘201902‘ as c_month,102 as c_customers from dual union all 
select ‘201903‘ as c_month,120 as c_customers from dual union all 
select ‘201904‘ as c_month,111 as c_customers from dual union all 
select ‘201905‘ as c_month,155 as c_customers from dual union all 
select ‘201906‘ as c_month,199 as c_customers from dual;

C_MONT C_CUSTOMERS
------ -----------
201901         100
201902         102
201903         120
201904         111
201905         155
201906         199

  

2019年1月,客户数量是100, 2019年2月,客户数量是102 ,

那么2019年1月的客户累计值是100,

2019年2月的客户年累计值是202(2019年1月的客户数量 + 2019年2月的客户数量),

2019年3月的客户年累计值是322(2019年1月的客户数量 + 2019年2月的客户数量+ 2019年3月的客户数量),

?

我使用如下的测试SQL来说明这个场景

create or replace view tab_test1 as 
select ‘201901‘ as c_month,199 as c_customers from dual union all 
select ‘201907‘ as c_month,108 as c_customers from dual;

create view tab_test2 as 
select ‘20190131‘ as monthlastday from dual union all
select ‘20190228‘ as monthlastday from dual union all
select ‘20190331‘ as monthlastday from dual union all
select ‘20190430‘ as monthlastday from dual union all
select ‘20190531‘ as monthlastday from dual union all
select ‘20190630‘ as monthlastday from dual union all
select ‘20190731‘ as monthlastday from dual;

select * 
  from tab_test1 a 
  join tab_test2 b 
    on to_date(c_month,‘yyyymm‘) <= to_date(b.monthlastday,‘yyyymmdd‘)
   and to_date(c_month,‘yyyymm‘) >= trunc(to_date(b.monthlastday,‘yyyymmdd‘),‘yyyy‘)
 order by b.monthlastday,a.c_month;

C_MONT C_CUSTOMERS MONTHLAS
------ ----------- --------
201901         100 20190131
201901         100 20190228
201902         102 20190228
201901         100 20190331
201902         102 20190331
201903         120 20190331
201901         100 20190430
201902         102 20190430
201903         120 20190430
201904         111 20190430
201901         100 20190531

C_MONT C_CUSTOMERS MONTHLAS
------ ----------- --------
201902         102 20190531
201903         120 20190531
201904         111 20190531
201905         155 20190531
201901         100 20190630
201902         102 20190630
201903         120 20190630
201904         111 20190630
201905         155 20190630
201906         199 20190630
201901         100 20190731

C_MONT C_CUSTOMERS MONTHLAS
------ ----------- --------
201902         102 20190731
201903         120 20190731
201904         111 20190731
201905         155 20190731
201906         199 20190731
201907         108 20190731

已选择 28 行。

  

从上面SQL返回的数据能看出来,monthlastday 字段分组,汇总c_customers,就能很轻松算出年累计值。但是如果a表数据太大,无法走hash 关联。

所以需要通过某种方法改成等值关联。?

1. 先创建一个时间维表,可以通过树形查询生成一个时间维度表,由于我的测试数据自小粒度是到月的,所以我的日期维度表也是到月的。

create or replace view tab_test3 as 
select extract(year from c_date) as c_year,extract(month from c_date) as c_month,to_char(c_date,‘yyyymm‘) as c_month2 
  from (select add_months(date‘2019-01-01‘,level -1 ) as c_date 
          from dual 
       connect by level <= 8); 

  

2. 通过时间维度表自关联出累计月份对应的日期。

下面SQL,同构过滤t1 表的c_month2 字段,就可以拿到任意月份的累计月份了。比如2019-07月的累计月份是2019年 1-7月份

select t1.c_month2 as groupcolumn,t2.c_month2 joincolumn
  from tab_test3 t1 
  join tab_test3 t2 
    on t1.c_year = t2.c_year
   and t1.c_month2 >= t2.c_month2
 order by 1,2


GROUPC JOINCO
------ ------
201901 201901
201902 201901
201902 201902
201903 201901
201903 201902
201903 201903
201904 201901
201904 201902
201904 201903
201904 201904
201905 201901

GROUPC JOINCO
------ ------
201905 201902
201905 201903
201905 201904
201905 201905
201906 201901
201906 201902
201906 201903
201906 201904
201906 201905
201906 201906
201907 201901

GROUPC JOINCO
------ ------
201907 201902
201907 201903
201907 201904
201907 201905
201907 201906
201907 201907
201908 201901
201908 201902
201908 201903
201908 201904
201908 201905

GROUPC JOINCO
------ ------
201908 201906
201908 201907
201908 201908

已选择 36 行。

  

3. 修改原来SQL中关于日期的不等值关联,可以起到走hash的作用。

相关文章

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注