How to count the number of NEW ids in hive SQL table by date?
I have a table with a bunch of months and ids. I want to count how many NEW ids I’ve gotten in each month. For example, say I have the following table:
Month | ID ------------ Jan | 123 Jan | 456 Jan | 789 Feb | 123 Feb | 101112 Mar | 456 Mar | 12345 Mar | 6789
I want the output to be:
Month | # New IDS ------------------ Jan | 3 Feb | 1 Mar | 2
I’m truly lost on the best way to do this and haven’t been able to find anything that’s similar to this problem.
One option uses two levels of aggregation. Assuming that
month is of a
date datatype (or at least something that can be consistently sorted as a date):
select month, count(*) new_ids from (select min(month) month from mytable group by id) t group by month
You can also use window functions:
select month, count(*) new_ids from ( select month, row_number() over(partition by id order by month) rn from mytable ) t where rn = 1