Blog do projektu Open Source JavaHotel

środa, 28 kwietnia 2010

Recursive SQL and graph theory

Preface

Seems very strange but it is possible to use "recursion" in SQL statements. By recursion I mean the method of defining function (or object) which inside its definition refers to itself. To write recursive SQL statement it is necessary to follow strict rules : http://msdn.microsoft.com/en-us/library/ms186243.aspx


Example
Consider simple table containing distances between cities.
CREATE TABLE CONNECTION (CITY_FROM VARCHAR(100), CITY_TO VARCHAR(100), DISTANCE DECIMAL);

delete from connection;
insert into connection values ('Warszawa', 'Augustów', 242);
insert into connection values ('Warszawa', 'Suwałki', 267);
insert into connection values ('Warszawa', 'Olsztyn', 192);
insert into connection values ('Augustów', 'Suwałki', 32);

insert into connection values ('Warszawa', 'Płock', 103);
insert into connection values ('Płock', 'Grudziądz', 133);
insert into connection values ('Grudziądz', 'Gdańsk',106);

insert into connection values ('Warszawa', 'Mława', 116);
insert into connection values ('Mława','Elbląg', 144);
insert into connection values ('Elbląg', 'Gdańsk', 58);
Now "recursive" statement:
WITH path (city_from, city_to, distance, stops)
    AS ( SELECT  f.city_from,  f.city_to, f.distance, 0
             FROM connection f
             WHERE  city_from='Warszawa'
          UNION ALL
             SELECT p.city_from, f.city_to,   
                    p.distance+f.distance, p.stops+1
             FROM connection f, path p
             WHERE p.city_to=f.city_from )
    SELECT substr(pa.city_from,1,20), substr(pa.city_to,1,20), min(distance) FROM path pa GROUP BY (pa.city_from, pa.city_to);
How it runs ?
The beginning is the statement before UNION ALL. It selects all direct connection from Warszawa.
  • 'Warszawa', 'Augustów', 242;
  • 'Warszawa', 'Suwałki', 267;
  • 'Warszawa', 'Olsztyn', 192
  • 'Warszawa', 'Płock', 103
  • 'Warszawa', 'Mława', 116
Then runs the second part of UNION ALL command. It adds all direct connections of the destinations selected in the previous step. This time it is:
  • 'Augustów', 'Suwałki', 32
  • 'Płock', 'Grudziądz', 133
  • 'Mława','Elbląg', 144
This second part is  "recursive" - it uses 'path'  table which is created in the whole statements.
The third and next steps are repetition of the second step. The only difference is that rows taken are rows added in the previous step (not the whole path table). This time following destinations will be added.
  • 'Elbląg', 'Gdańsk', 58
  • 'Grudziądz', 'Gdańsk',106
The fourth step will bring no rows - there is no destination for 'Gdańsk'  and it ends our 'recursion'.
The final statement selects minimal distance between Warszawa and all cities found.
Warszawa             Augustów                242,
Warszawa             Elbląg                  260,
Warszawa             Gdańsk                  318,
Warszawa             Grudziądz               236,
Warszawa             Mława                   116,
Warszawa             Olsztyn                 192,
Warszawa             Płock                   103,
Warszawa             Suwałki                 267,
During evaluation also 'stops' column is created, although not used. It contains number of cities visited during journey from 'Warszawa' to destination. In the final statement, by replacing minimal(distance) with minimal(stops) we can choose paths with having minimal number of nodes.

What this 'recursion SQL' has in common with graph theory ?

Answer is simple. Connection table describes nodes of directed graph. 'Recursive' statement is an implementation of Breadth-first_search. Distance is a weigth assigned to a node and our 'recursive' statement implements the algorithm of finding the Shortest path problem Shortest_path_problem

Danger.
Unfortunately, there is also a dark side oh this moon.
Consider connection table:
insert into connection values ('Warszawa', 'Augustów', 242);
insert into connection values ('Augustów', 'Warszawa', 242);
Our 'recursive' statement will run infinitely on that table. Using terminology from graph teory our statement is invalid if there is a 'cycle' in the graph, it is valid only for graphs without cycles. What's more important - there is no simply workaround to this problem.

Brak komentarzy:

Prześlij komentarz