Idea
Collaborate with the Db2 team (and Cloudant?) to support Watson AI (WML, Visual Recognition, NLC) natively in the database.
User experience
(Just one possible way it could work...)
There is already a thing called user-defined functions (UDF). Why not add: AI user-defined function (AIUDF).
Scalar functions would be the easiest to start with..
Example 1
Imagine I have a Db2 table, called 'tweet_table', with a column containing tweet text, called 'tweet_txt'.
Imagine I have a WML model[1] already deployed to analyze the sentiment of tweet text. So I could create an AIUDF that uses that model and returns an integer: 0 (for negative sentiment) or 1 (for positive sentiment.)
To analyze the sentiment of the tweets in my table, I could do this:
SELECT tweet_id, sentiment( tweet_txt ), tweet_txt FROM tweet_table;
Example 2
Imagine I have a Db2 table, called 'animals_table', with a BLOB column, called 'images', that contains images of animals.
Imagine I have a WML model already deployed to classify an image by the type of animal in the image. So I could create an AIUDF that uses that model and returns a string: the name of the top class (eg. 'cat', 'bird', or 'dog'.)
To return all images that contain cats, I could do this:
SELECT image_id, image FROM animals_table
WHERE classify_animal( image ) = 'cat';
[1] This idea could use the new AI OpenScale object, AI functions (or whatever they will be called), as well as models.
Important technical details
Here's the important thing... it could be very slow to run and get costly (WML API calls) if I run these AI queries over and over.
So the fancy part would be that Db2 would do things like:
- Be slick about authentication
- Handle any asynchronous behaviour
- Elegantly handle throttling, and API call limits
- Create shadow tables (or hidden columns) storing previous results to save API calls
- If previous results are stored, automatically update them if the data changes or the model is redeployed
- Performance: process all the data in the table when the AIUDF is registered and store results, or wait until queries are run only stored results then
- Allow users to configure how these issues are handled
Business case
Today, users can create extra tables or columns by hand, run the WML API calls, and save the results. When the models are updated, you could use scripts to go back and update saved results. When the data in a row changes, you can have triggers that call the WML API again. Etc. But that's a pain.
Sooner or later, IBM has to be able to say "Db2 has built-in support for AI queries" or "Db2 natively supports AI".
There are a lot of Db2 users out there with a lot of data in their databases.. If some small percentage of those users decided to analyze some of that data using Watson, that's a lot of people signing up for the Watson services and that a lot of API calls processing all that data. $$
Future growth
What is described above is just a beginning. We should also make built-in models (eg. visual recognition and NLC models) available for AI queries in Db2 as well.
Terminology (and competitive evaluation)
*Some people think an "AI database" is one that you can query using natural language instead of just SQL. But that's not what I'm talking about. I'm talking about this: https://mldb.ai/