I have the following data structure:
CREATE TABLE test_table(
id int8 PRIMARY KEY,
patientid VARCHAR(255) NOT NULL,
studyid VARCHAR(255) NOT NULL,
gid VARCHAR(255) NOT NULL,
labels int8,
timestamp TIMESTAMP
);
With the following data:
INSERT INTO test_table(id, patientid, studyid, gid, labels, timestamp)
VALUES (1, 1, 1, 1, 0, '2015-01-11 00:51:14'),
(4, 1, 1, 1, 1, '2015-01-11 00:54:14'),
(3, 1, 1, 1, 3, '2015-01-11 00:53:14'),
(2, 1, 1, 1, 0, '2015-01-11 00:52:14'),
(5, 1, 1, 1, 0, '2015-01-11 00:55:14'),
(6, 1, 2, 2, 1, '2015-01-12 08:10:14'),
(7, 1, 2, 2, 0, '2015-01-12 08:11:14'),
(8, 1, 2, 2, 1, '2015-01-12 08:13:25'),
(9, 2, 1, 3, 0, '2015-01-12 09:14:25'),
(10, 2, 1, 3, 0, '2015-01-12 09:15:25'),
(11, 2, 1, 3, 2, '2015-01-12 09:17:25'),
(12, 2, 1, 3, 6, '2015-01-12 09:18:25'),
(13, 3, 1, 4, 0, '2015-01-13 07:14:25'),
(14, 3, 1, 4, 0, '2015-01-13 07:15:25'),
(15, 3, 1, 4, 1, '2015-01-13 07:17:25'),
(16, 3, 1, 4, 0, '2015-01-13 07:18:25'),
(17, 3, 1, 4, 0, '2015-01-13 07:18:25'),
(18, 4, 1, 5, 0, '2015-01-13 03:14:25'),
(19, 4, 1, 5, 0, '2015-01-13 03:15:25'),
(20, 4, 1, 5, 0, '2015-01-13 03:17:25'),
(21, 4, 1, 5, 1, '2015-01-13 03:18:25'),
(22, 4, 2, 6, 0, '2015-01-13 03:12:13'),
(23, 4, 2, 6, 1, '2015-01-13 03:14:07'),
(24, 4, 2, 6, 1, '2015-01-13 03:17:05'),
(25, 4, 2, 6, 0, '2015-01-13 03:19:42'),
(26, 4, 2, 6, 0, '2015-01-13 03:22:51')
;
I seek to retrieve the first occurring rows with labels=0 grouped by the combination of patientid and studyid.
The desired output contains the rows with the ids:
[1, 2, 7, 9, 10, 13, 14, 18, 19, 20, 22]
What I have tried:
select rr.* from (
select tt.*,
rank() over (partition by patientid, studyid order by timestamp) rank
from test_table tt where labels = 0) rr
where rr.rank <= 20;
20 is just an arbitrary number, which is great enough to capture enough possible rows with labels=0.
I hope someone can help me out