Automate the clean up of files in Cloud pak for data

See this idea on ideas.ibm.com

In a customer case after only 9 month the file-api-claim had over 800k files and it was suspected that we had a performance issue.

We disabled selinux relabling

But we also ran the cron job cleanup script.

But this was not enough. We had to run more cleanup manually to remove spark jobs.

We went from 800k files to 500k

It would be nice that we add this functionality into the product to auto-clean.

Also scripts reffered here does did not go very deep in the file cleaning with CP4D 4.8.4, it could be better enhanced after we discussed with Manjot from the spark dev team.

https://wwwpoc.ibm.com/support/pages/node/6980928

Maybe we could put in the gui for the admin a usability option where the customer sets how many days of logs he wants to keep.

IBM would automatically clean up for the customer by default in the product, maybe we could keep 2 days as stated in the script.

Needed By

Not sure -- Just thought it was cool

Post comment

Guest

May 9, 2025

Other tip on asset-file-api cleanup
https://www.ibm.com/support/pages/large-amount-files-stored-filesystem-causes-performance-degradation-and-some-cases-cluster-unavailability

Reply
Hide replies

Guest

May 9, 2025

Spark has added feature in 5.1 to enhance this
https://www.ibm.com/docs/en/cloud-paks/cp-data/5.1.x?topic=applications-configuring-spark-log-level-information
Also see tip https://wwwpoc.ibm.com/support/pages/node/6980928
This is a step in the right direction, we need other services to participate into this request.
Would be nice if we can set how long we want to retain logs in the CP4D / Software hub GUI and CP4D would handle clean up daily or weekly.

Reply
Hide replies

Guest

Mar 16, 2025

2 more issues related to this
TS018772895 and TS018664310
This is a high priority feature request.

Reply
Hide replies

Guest

Aug 6, 2024

Log and zip files (from project exports even using --hard-delete) has always been a problem (since CPD3.2) causing the file-api-claim to grow without bounds FOR NO REASON. This is very bad behavior for any serious application.

Reply
Hide replies

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Please enter your email address

RELATED IDEAS

Automate the clean up of files in Cloud pak for data