Skip to main content
PLANNED

Dedicated settings for snapshot tables

  • September 28, 2017
  • 14 replies
  • 0 views

Forum|alt.badge.img

Snapshot fact tables are common in many data warehouse projects. Often the data that they store can not be re-loaded as the dw does the snapshotting and is it's only storage.

I dont think there is a bulletproof way to do this currently in TX in terms of performance and data safety. There should be dedicated settings for snapshot tables.
I've used incremental loading but that's not a safe way and I always take a backup of any snapshot tables before a deployment just to be safe.

I've also tried using the history setting but I found the performance to be really bad for big tables as the unnecessary comparison logic is such an overkill for what needs to done.

My suggestions for how this setting should work:

1. It should make sure that the valid table is never ever truncated or dropped (unless with a flashy warning dialog asking you to confirm)
2. You would set the snapshot value field.
3. It would do a delete for the incoming snapshot value before inserting the dataset.

Extra
----
4. It's features could be expanded to have configuration for defining the snapshotting value based on the exection time or a parameter, having it do a daily/weekly/monthly snapshot.
5. It could do a daily snapshot but then have a setting to only store the latest along with week-end/month-end snapshots.

 

 

14 replies

Forum|alt.badge.img
  • Known Participant
  • September 28, 2017

Agreed. We also use a regular history table with the snapshot value included in the primary key, however the performance for a large history table is awful (in general, not just with snapshots).

Your first point should be the case for all history tables. These must never ever be dropped / truncated under any circumstance.


Forum|alt.badge.img
  • Known Participant
  • November 29, 2017

Had the same issue. I made a snapshot project in TX which holds the settings for my snapshots such as snapshot period, delay, ... In the actual projects where the snapshots should be stored I have 1 history table and 1 table that holds the actual values.

Once the snapshot should be taken I push the contents of the actual table to the history table. So for monthly snapshots it only has to run the slow history comparison process once instead of each scheduled execution and it does not have to check for differences since you add a new snapshot period.

Using this method you avoid deletes and updates in the history table and to build the actual table you only use TX's truncate function so no deletes and huge loge files in the end.


Forum|alt.badge.img
  • Known Participant
  • November 30, 2017

Hello everyone,

Doug Wynkoop here with TimeXtender US Support.  One way to deal with snapshots is to do an insert into the valid table of a history table.  This solution can work if:

1) you need to do no lookups or transformations in your snapshot table, 

2) each insert is guaranteed to be unique, and 

3) the data does not need to be validated in the snapshot table 

Those conditions usually apply, because a snapshot is simply trying to capture every record on a given day as they exist with no further modification or validation.  The only reason you need history turned on at all so that the table will not be truncated, so inserting into the valid table bypasses the costly history algorithm altogether.

This is a good workaround in a simple case, but often requires that you use a custom table insert if things more complicated.  Often times, snapshots include aggregation, which won't work with a normal insert.  Table and related records inserts also don't support an explicit time stamp for the insert, although DW_TimeStamp can sometimes work for this.  It's also not uncommon to want to do some kind of left-join or "not exists" insert, which you can do with related records, but not if you want to also use aggregation or something fancier.

To summarize, there is a current workaround that's robust and performant in many cases, but it does require custom code. 

I hope that this helps in the meantime, but you are all right that this is a common requirement.  I personally think an automated solution for this would be really nice to have!

Best regards,

Doug Wynkoop


Forum|alt.badge.img
  • Author
  • Known Participant
  • December 12, 2017

I think the workarounds here above really stresses the need for a dedicated "one way" feature for this common design scenario. I hope this will be added to the backlog and prioritized high.

It needs to be automated!

Best regards,
Andri Heidberg


Forum|alt.badge.img
  • Known Participant
  • December 13, 2017

Two thumbs up for adding this feature!


  • New Participant
  • December 18, 2017

Another one here.

Snapshots are relatively common and it would make sense to have some functionality in there.


Please update us whether the snapshot functionality will be added?


Forum|alt.badge.img
  • Participating Frequently
  • March 7, 2018

Thumbs up for adding this feature!


Forum|alt.badge.img
  • Author
  • Known Participant
  • April 26, 2018

Can we expect the "PLANNED" tag on this feature any time soon :)


  • New Participant
  • December 19, 2018

I would also appreciate this functionality.


  • New Participant
  • March 29, 2019

I agree in the above feature request - a snapshot feature will be very useful as a separate feature. I have received this request from partners/customers several times during the last years ...


  • New Participant
  • March 29, 2019

Any expected implementation plans?


Forum|alt.badge.img
  • Known Participant
  • April 23, 2019

Any progress on this?


Forum|alt.badge.img

One thing that's lacking is to actually design the specification that a certain type of history is forbidden in a certain table. For example in a snapshot table (also in a type I table) you will want to prevent type 2 fields and set the default for new fields to type 0 or 1. (And in a type 2 table you will want to default new fields to type 2)

Without this, adding new fields will destroy the functionality.

 

Another thing that's not currently possible with a history table is snapshot updating. Real example from our DW: we have a daily snapshot table that we updated hourly. Data from yesterday is not supplied anymore so it would get TombStone=1. This is data that you still want to report on. Data from today that doesn't exist anymore will also get TombStone=1. This is data that you do not want to report on.

Currently we fix this by running a data cleaning pre-script that deletes the current snapshot from the valid table.