@hpalmer, there are several approaches that one can use for this, I will share two of them here:
Expand intermediate table, then filter:
1. inner join transactions and refdata on “some_id" with cartesian join on date(expand to e.g. over 10b records)
2. filter using between or <= and >= operators (back to 100m records)
Note, date fields needs to be converted to integer datatype
Expand refdata on import, then simple inner join:
1. write import UDF to expand redata table by adding a record for every day (ref_dt) between from_dt and to_dt, resulting imported table:
refdata (1m records)
2. inner join between refdata and transactions on some_id=some_id and ref_dt=tran_dt
I think that Expand intermediate table, then filter is a more extensible and natural approach.
The second approach (Expand refdata on import, then simple inner join) is more optimal when granularity of time intervals is high - then the first approach becomes signifcantly more resource-intensive.
There are a few other approaches that would require mapping UDFs or building intermediate tables.
Please let us know if this helped.