question

AbdulRaheemGhani avatar image
AbdulRaheemGhani asked

How to count three consecutive records with same values in a column based on the value of another column in PostgreSQL?

I have a table attendance as follows: create table attendance (id int, name varchar(30), status varchar(10), date date) This table has the following records: insert into attendance values (1,'John','absent','2016-03-01'); insert into attendance values (1,'John','absent','2016-03-02'); insert into attendance values (1,'John','absent','2016-03-03'); insert into attendance values (2,'Sam','present','2016-03-04'); insert into attendance values (3,'Sam','absent','2016-03-05'); insert into attendance values (1,'John','absent','2016-03-06'); insert into attendance values (1,'John','absent','2016-03-07'); insert into attendance values (1,'John','absent','2016-03-08'); insert into attendance values (1,'John','present','2016-03-09'); insert into attendance values (1,'John','absent','2016-03-10'); insert into attendance values (1,'John','absent','2016-03-11'); insert into attendance values (1,'John','present','2016-03-12'); insert into attendance values (1,'John','absent','2016-03-13'); Now I want to count that how many times a person has three consecutive absent records. The result should be as follows: id name count 1 John 2 because John is absent for three consecutive days two times. If they have 6 absents on 6 consecutive dates, then it should count 2 i.e (3+3). It should count 1 for 3 consecutive absents on 3 consecutive dates i.e if John is absent for 1,2 and 3 march, it should count 1. but if John is absent for 1,2 and 4 march then it should not count 1. 4 or 5 absents should be counted as 1. If there is no entry for a date, It is considered 'Present'. Any help is appreciated.
postgresql
2 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@AbdulRaheemGhani Suppose someone is absent for 4 consecutive days, then present for whatever number of days, then absent for 7 consecutive days, then present for whatever number of days, then absent for 5 consecutive days. In this situation, the number of absences should be 4, is this correct? 4 is because 4 days from the first set is 1 absence, 7 days from the second set counts as 2 absences, and last set of 5 absent days counts for 1 absence, thus making the total number of absences = 4 (1 + 2 + 1). Please confirm.
0 Likes 0 ·
AbdulRaheemGhani avatar image AbdulRaheemGhani commented ·
Yes, You got it very right. This is the actual case. It should count 4 as per the scenario you mentioned in your comment, but three non-consecutive absents should not be counted as 1. For example: If I add three rows like: insert into attendance values (1,'John','absent','2016-03-20'); insert into attendance values (1,'John','absent','2016-03-21'); insert into attendance values (1,'John','absent','2016-03-25'); In this case it counts 1, but It should not count 1. Because John is absent on 20,21 and 25 march where it is not occurring on 3 consecutive days. @Oleg
0 Likes 0 ·
Oleg avatar image
Oleg answered
Disclaimer: I am not a big fan of the language keywords used as column names, such as status or date, so the presence of them in the script is simply to accommodate the table design in the original question. From what I understand, PostgreSQL supports lead and lag, and the syntax for these 2 is identical to T-SQL, so I hope that the query in this answer will work. The idea is to query the date values of the previous absence and the absence before that. Once all 3 dates are on the same row, comparing them and requiring that the values are different by one day can serve as a criteria to determine whether the person had 3 consecutive absences. The question did not mention anything about the weekends, and also about what to do if someone had more than 3 consecutive absences. These details might necessitate some changes to the query: select id, name, count(1) absence_count from ( select *, lag([date], 1) over (partition by id order by [date]) PrevDate, lag([date], 2) over (partition by id order by [date]) BeforePrevDate from attendance where [status] = 'absent' ) a where datediff(day, PrevDate, [date]) = 1 and datediff(day, BeforePrevDate, PrevDate) = 1 group by id, name; Based on the data provided in the question, the query produces this result:
id          name       absence_count
----------- ---------- -------------
1           John       2
Hope this helps, Oleg
4 comments
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Oleg avatar image Oleg commented ·
@AbdulRaheemGhani Yes, it gives the incorrect count for now, because the query in my answer was written **before** you provided additional information clarifying how to handle the scenario with more than 3 consecutive absent days. I mentioned in my answer that the additional information might necessitate the changes to the query. Now that the requirements are very clear, I can take a look on how to modify the originally posted query to return correct results. I will edit my answer later today and let you know when this is done.
1 Like 1 ·
AbdulRaheemGhani avatar image AbdulRaheemGhani commented ·
I edited my question and added some more information for clarification. Date_Part() is used instead of datediff() in PostgreSQL. And the query did not give my required output. It comes with empty result. Thanks. @Oleg
0 Likes 0 ·
Oleg avatar image Oleg commented ·
@AbdulRaheemGhani I believe that lead and lag functions are available in PostgreSQL version 8 or higher, so I am not sure how it is possible that the query comes back with empty result (based on the sample data in question). Please confirm that when you substitute the **datediff** with **date\_part** which you must, as you already mentioned, you also **flip** the dates, which makes the where clause read this:
where
    date_part('day', [date] - PrevDate) = 1 
    and date_part('day', PrevDate - BeforePrevDate) = 1
The above should work fine because there is no time part in the date values. Please let me know. I will post another comment in the comments section under your question regarding what you posted for clarification.
0 Likes 0 ·
AbdulRaheemGhani avatar image AbdulRaheemGhani commented ·
I ran the query again with proper changes, but it gives count of 11 where it should count 4. I ran the query for the scenario you mentioned. @Oleg
0 Likes 0 ·
AbdulRaheemGhani avatar image
AbdulRaheemGhani answered
@Oleg, I found the following query which worked fine for me: WITH RECURSIVE a(id, name, date, n) as ( SELECT id, name, q.date, 1 as n FROM ( SELECT id, name, date, date-lag(date) OVER (PARTITION BY name ORDER BY date) as lag FROM attendance WHERE status='absent' ) q WHERE lag >1 or lag is null UNION SELECT a.id, a.name, a.date, a.n + 1 FROM a JOIN attendance at ON ( a.id = at.id and at.name = a.name and at.date = a.date + n) WHERE at.status='absent' ) SELECT id, name, sum(long_absences) FROM ( SELECT id, name, count(*)/3 as long_absences FROM a GROUP BY id, name, date having count(*) >=3 ) as absences GROUP BY id, name;
10 |1200

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.