Enable different sampling methods in Data Refinery

In the current version of Data Refinery we have noticed that there's no way to change the sampling method for the source data. It's always top rows only. With the additional limitation of 10000 samples only this makes the Data Refinery Flow view show empty view if a specific filter is applied. As an example this could be a filter for country_code but when the data at the source is relatively well sorted by country_code the first 10k samples might simply not contain the fitting samples.

One option to improve that would be to add the possibility to change the sampling type in the source data settings. Currently it is an immutable "top rows" setting.

Another thing is that even with a random sampling method it could be that data refinery might give the wrong idea that no data exists when a right filter is applied. The view will show nothing as if no rows with these conditions exists. Maybe a feedback should be provided to the users that the data could still exist beyond the sample size?

The perfect solution if possible to introduce would resample the view if the current filter set would turn out empty?

Needed By

Quarter

Post comment

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Please enter your email address

RELATED IDEAS

Enable different sampling methods in Data Refinery