SQL: Locating the Beginning and End of a Range of Consecutive Values

SQL Practical Cases: Beginning and End of a Range of Consecutive Values


This recipe is an extension of the prior recipe , and it uses the same view V from the prior recipe. Now that you’ve located the ranges of consecutive values, you want to find just their start and end points. Unlike the prior recipe, if a row is not part of a set of consecutive values, you still want to return it. Why? Because such a row represents both the beginning and end of its range. Using the data from view V:

	select *
	  from V

	------- ----------- -----------
	      1 01-JAN-2005 02-JAN-2005
	      2 02-JAN-2005 03-JAN-2005
	      3 03-JAN-2005 04-JAN-2005
	      4 04-JAN-2005 05-JAN-2005
	      5 06-JAN-2005 07-JAN-2005
	      6 16-JAN-2005 17-JAN-2005
	      7 17-JAN-2005 18-JAN-2005
	      8 18-JAN-2005 19-JAN-2005
	      9 19-JAN-2005 20-JAN-2005
	     10 21-JAN-2005 22-JAN-2005
	     11 26-JAN-2005 27-JAN-2005
	     12 27-JAN-2005 28-JAN-2005
	     13 28-JAN-2005 29-JAN-2005
	     14 29-JAN-2005 30-JAN-2005

 you want the final result set to be:

	-------- ----------- -----------
	       1 01-JAN-2005 05-JAN-2005
	       2 06-JAN-2005 07-JAN-2005
	       3 16-JAN-2005 20-JAN-2005
	       4 21-JAN-2005 22-JAN-2005
	       5 26-JAN-2005 30-JAN-2005


This problem is a bit more involved than its predecessor. First, you must identify what the ranges are. A range of rows is defined by the values for PROJ_START and PROJ_END. For a row to be considered “consecutive” or part of a group, its PROJ_ START value must equal the PROJ_END value of the row before it. In the case where a row’s PROJ_START value does not equal the prior row’s PROJ_END value and its PROJ_END value does not equal the next row’s PROJ_START value, this is an instance of a single row group. Once you have identify the ranges, you need to be able to group the rows in these ranges together (into groups) and return only their start and end points.

Examine the first row of the desired result set. The PROJ_START is the PROJ_ START for PROJ_ID 1 from view V and the PROJ_END is the PROJ_END for PROJ_ID 4 from view V. Despite the fact that PROJ_ID 4 does not have a consecutive value following it, it is the last of a range of consecutive values, and thus it is included in the first group.

Example 1.

The most straight forward approach for this problem is to use the LAG OVER window function. Use LAG OVER to determine whether or not each prior row’s PROJ_END equals the current row’s PROJ_START to help place the rows into groups. Once they are grouped, use the aggregate functions MIN and MAX to find their start and end points:

	 1 select proj_grp, min(proj_start), max(proj_end)
	 2   from (
	 3 select proj_id,proj_start,proj_end,
	 4        sum(flag)over(order by proj_id) proj_grp
	 5   from (
	 6 select proj_id,proj_start,proj_end,
	 7        case when
	 8             lag(proj_end)over(order by proj_id) = proj_start
	 9             then 0 else 1
	10        end flag
	11   from V
	12        ) alias1
	13        ) alias2
	14  group by proj_grp


The window function LAG OVER is extremely useful in this situation. You can examine each prior row’s PROJ_END value without a self join, without a scalar sub-query, and without a view. The results of the LAG OVER function without the CASE expression are as follows:

	select proj_id,proj_start,proj_end,
	      lag(proj_end)over(order by proj_id) prior_proj_end
	  from V

	------- ----------- ----------- --------------
	      1 01-JAN-2005 02-JAN-2005
	      2 02-JAN-2005 03-JAN-2005 02-JAN-2005
	      3 03-JAN-2005 04-JAN-2005 03-JAN-2005
	      4 04-JAN-2005 05-JAN-2005 04-JAN-2005
	      5 06-JAN-2005 07-JAN-2005 05-JAN-2005
	      6 16-JAN-2005 17-JAN-2005 07-JAN-2005
	      7 17-JAN-2005 18-JAN-2005 17-JAN-2005
	      8 18-JAN-2005 19-JAN-2005 18-JAN-2005
	      9 19-JAN-2005 20-JAN-2005 19-JAN-2005
	     10 21-JAN-2005 22-JAN-2005 20-JAN-2005
	     11 26-JAN-2005 27-JAN-2005 22-JAN-2005
	     12 27-JAN-2005 28-JAN-2005 27-JAN-2005
	     13 28-JAN-2005 29-JAN-2005 28-JAN-2005
	     14 29-JAN-2005 30-JAN-2005 29-JAN-2005

 The CASE expression in the complete solution simply compares the value returned by LAG OVER to the current row’s PROJ_START value; if they are the same, return 0, else return 1. The next step is to create a running total on the 0’s and 1’s returned by the CASE expression to put each row into a group. The results of the running total can be seen below:

	select proj_id,proj_start,proj_end,
	       sum(flag)over(order by proj_id) proj_grp
	  from (
	select proj_id,proj_start,proj_end,
	       case when
	            lag(proj_end)over(order by proj_id) = proj_start
	            then 0 else 1
	       end flag
	  from V

	------- ----------- ----------- ----------
	      1 01-JAN-2005 02-JAN-2005          1
	      2 02-JAN-2005 03-JAN-2005          1
	      3 03-JAN-2005 04-JAN-2005          1
	      4 04-JAN-2005 05-JAN-2005          1
	      5 06-JAN-2005 07-JAN-2005          2
	      6 16-JAN-2005 17-JAN-2005          3
	      7 17-JAN-2005 18-JAN-2005          3
	      8 18-JAN-2005 19-JAN-2005          3
	      9 19-JAN-2005 20-JAN-2005          3
	     10 21-JAN-2005 22-JAN-2005          4
	     11 26-JAN-2005 27-JAN-2005          5
	     12 27-JAN-2005 28-JAN-2005          5
	     13 28-JAN-2005 29-JAN-2005          5
	     14 29-JAN-2005 30-JAN-2005          5

 Now that each row has been placed into a group, simply use the aggregate functions MIN and MAX on PROJ_START and PROJ_END respectively, and group by the values created in the PROJ_GRP running total column.


Вас заинтересует / Intresting for you:

A Brief History of SQL
A Brief History of SQL 3076 views Александров Попков Wed, 17 Oct 2018, 15:04:29
SQL: how to determine which ro...
SQL: how to determine which ro... 212 views Денис Tue, 06 Jul 2021, 18:47:53
Referencing an Aliased Column ...
Referencing an Aliased Column ... 345 views Денис Wed, 14 Jul 2021, 12:59:03
How to Retrieving a Subset of ...
How to Retrieving a Subset of ... 334 views Денис Wed, 14 Jul 2021, 04:45:37
Log in to comment