Skip to main content
Submitted

Change Junk Dimension Key Data Type

  • April 3, 2019
  • 8 replies
  • 4 views

Forum|alt.badge.img

Currently, junk dimension tables are automatically keyed with varbinary 64 fields.  This takes up a lot of room in a fact table and can’t be used as a join in Tabular.  Can junk dimensions share the functionality of the supernatural key stores by storing the hash locally, and using an integer as a key instead?

8 replies

Forum|alt.badge.img
  • Known Participant
  • April 4, 2019

Totally agree!


  • New Participant
  • April 4, 2019

Hi Doug,

The key data type is defined by the Hashing algorithm:

cisxQvJbsj5u4kPVufnE2A.png


Forum|alt.badge.img
  • Known Participant
  • April 4, 2019

We've been wrestling with this as well.

Our workaround is to cast the hash as bigint, which is in fact a plain old truncate of 64 bytes to 8 bytes, and use that for joins and key fields.

It's not optimal but given the chance of collisions is in the billions and - as much as our customers are growing - aren't running into tables with billions of entries, it works (and is very fast when used in multidimensional, tabular and qlik as key :)).

The best way to generate these keys would be to have TimeXtender include a CRC64 function, or a non-cryptographic hash implementation which can be used to generate these bigints natively.

Example for Postgres SQL: https://github.com/theory/pg-hash64

Wikipedia has a good list: https://en.wikipedia.org/wiki/List_of_hash_functions

fast-hash is used by many: https://github.com/ZilongTan/fast-hash

There are open-source CRC64 implementations in C-sharp: https://github.com/damieng/DamienGKit/blob/master/CSharp/DamienG.Library/Security/Cryptography/Crc64.cs

So if I might make a suggestion: implement this natively in TX DH https://github.com/ZilongTan/fast-hash


Forum|alt.badge.img
  • Author
  • Known Participant
  • April 4, 2019

Hi Radek,

You summarized my thoughts on this perfectly!  The legacy algorithms aren't safe to use, and may result in collisions. Thanks for weighing in so thoughtfully on this!

Best regards,

Doug Wynkoop


  • Community Manager
  • April 4, 2019

I know our current recommended fix for Qlik circular references is to use a "link table" which we create through a junk Dimension. So at least for Qlik users, this would be a big improvement. 


Forum|alt.badge.img
  • Known Participant
  • August 5, 2020

Just wanted to bring this up again in the context of Tabular, where the varbinary data types aren't supported.  Using legacy integer STILL feels unsafe.  We make heavy use of junk dimensions in our project and I would love to see this updated.  


Forum|alt.badge.img
  • Known Participant
  • October 12, 2020

This is important for a number of our implementations as well. As PowerBI becomes more and more popular as an integrated part of a data management platform, it becomes more and more of an unneeded hassle to work around this.


  • Community Manager
  • February 26, 2021

Thanks for submitting your ideas! Your continued feedback on our product is very important to us. 

We will be evaluating your input for future improvements in the product.

At the time of this posting, the current product roadmap is focused on a major evolution in foundational pieces of our software including splitting the development into ODX, MDW, Semantic Layer and User Portal. With this evolution, we will continue to see incredible innovation in the product that may exceed your expectations or make the need for this feature obsolete.

If you have any questions or would like to discuss further, please reach out to me or the dedicated TimeXtender Solution Specialist in you region. Thanks!