How to count the number of NEW ids in hive SQL table by date?

I have a table with a bunch of months and ids. I want to count how many NEW ids I’ve gotten in each month. For example, say I have the following table:

Month | ID ------------ Jan   | 123 Jan   | 456 Jan   | 789 Feb   | 123 Feb   | 101112 Mar   | 456 Mar   | 12345 Mar   | 6789 

I want the output to be:

Month | # New IDS ------------------ Jan   | 3 Feb   | 1 Mar   | 2 

I’m truly lost on the best way to do this and haven’t been able to find anything that’s similar to this problem.

Add Comment
1 Answer(s)

One option uses two levels of aggregation. Assuming that month is of a date datatype (or at least something that can be consistently sorted as a date):

select month, count(*) new_ids from (select min(month) month from mytable group by id) t group by month 

You can also use window functions:

select month, count(*) new_ids from (     select month, row_number() over(partition by id order by month) rn     from mytable ) t where rn = 1 
Answered on September 1, 2020.
Add Comment

Your Answer

By posting your answer, you agree to the privacy policy and terms of service.