Greenplum connector in DataStage: CSV ipv TEXT bij external tables

See this idea on ideas.ibm.com

When in a field (varchar/text) of the source query (postgres) a carriage return exists then we can not use the Greenplum connector of Datastage.

Needed by Date

Sep 1, 2019

Post comment

Guest

Reply
| Nov 30, 2020

This enhancement development is now complete. It is now available as a patch (https://www.ibm.com/support/pages/apar/JR62978)
You can open a support ticket from there https://www.ibm.com/support/home/ to get access to the patch download.

0 reply Hide replies

Guest

Reply
| Nov 18, 2020

Dear Virginie,
Thanks a lot for the great news ! We're looking forward to it.
Any idea about timing ?
Kind regards,
Philippe

0 reply Hide replies

Guest

Reply
| Nov 18, 2020

Dear Philippe,
We are currently working on this enhancement request. It should thus be part of the next release of the product.
Have a great day.

0 reply Hide replies

Guest

Reply
| Nov 17, 2020

Hi Virginie,
Any news about the implementation of this idea ?
Kind regards,
Philippe

0 reply Hide replies

Guest

Reply | Aug 4, 2020

Hi Virginie,

For the Vlaamse Milieumaatschappij it remains a very important improvement of the Greenplum Connector in DataStage.

The problem can be described as follow:

When a newline (Linefeed of carriage return+linefeed) is present in a text/varchar column on the source query, the load procedure using the Greenplum Connector aborts.

This is caused by the command used to create the external table: the connector uses by default the format TEXT and we cannot specify the QUOTE character. Therefore the <newline> character in the varchar column is considered as the end-of-line delimiter, and not as part of the varchar column.

CREATE [READABLE] EXTERNAL TABLE table_name     
    ( column_name data_type [, ...] | LIKE other_table )
     LOCATION ('file://seghost[:port]/path/file' [, ...])
       | ('gpfdist://filehost[:port]/file_pattern[#transform=trans_name]'
           [, ...]
       | ('gpfdists://filehost[:port]/file_pattern[#transform=trans_name]'
           [, ...])
       | ('gphdfs://hdfs_host[:port]/path/file')
       | ('pxf://path-to-data?PROFILE[&custom-option=value[...]]'))
       | ('s3://S3_endpoint[:port]/bucket_name/[S3_prefix]
             [region=S3-region]
            [config=config_file]')
     [ON MASTER]
     FORMAT 'TEXT' 
           [( [HEADER]
              [DELIMITER [AS] 'delimiter' | 'OFF']
              [NULL [AS] 'null string']
              [ESCAPE [AS] 'escape' | 'OFF']
              [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
              [FILL MISSING FIELDS] )]
          | 'CSV'
           [( [HEADER]
              [QUOTE [AS] 'quote'] 
              [DELIMITER [AS] 'delimiter']
              [NULL [AS] 'null string']
              [FORCE NOT NULL column [, ...]]
              [ESCAPE [AS] 'escape']
              [NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
              [FILL MISSING FIELDS] )]
          | 'AVRO' 
          | 'PARQUET'
          | 'CUSTOM' (Formatter=<formatter_specifications>)
    [ ENCODING 'encoding' ]
      [ [LOG ERRORS] SEGMENT REJECT LIMIT count
      [ROWS | PERCENT] ]

Workaround:

As a workaround we replaced the <newline> character in the source query by a string we hope will not be used elsewhere, and substitute it back to the <newline> after the load.

But it takes a lot of time and effort to implement, and is error prone.

Expected solution:

The best solution would be to incorporate new parameters in the Greenplum connector to specify some parameters/options of the CREATE EXTERNAL TABLE statement or the "gpfdist" command, to be able to cover each and every case we could encounter.

For example we could specify the FORMAT (TEXT/CSV) and with CSV be able to specify the QUOTE parameter to encapsulate the <newline>. (see the parameter in green)

Don't hesitate to contact me if something is not clear or if you need more information.

Guest

Reply
| Aug 4, 2020

We still consider this a high priority.

0 reply Hide replies

Guest

Reply
| Aug 4, 2020

Thanks for submitting this idea. This was submitted last year, and tagged as urgent, so I'd need to know if you could finally proceed with the ODBC option mentioned as a workaround, or if you still consider this one as a high priority on your side.
Thanks for any details you can provide.

0 reply Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Please enter your email address

RELATED IDEAS

Greenplum connector in DataStage: CSV ipv TEXT bij external tables