Top 4 AI questions for every SaaS provider

by Heidi Elmore

Is your AI data ethically sourced?

AI news these days is enough to make your head spin. The breadth and depth of announcements almost make one yearn for the “simple” days of microservices and cloud. Daily, we hear about updates to the large language models (LLMs), updates to the productized chat interfaces like ChatGPT/Bard, and updates to how every SaaS provider under the sun plans to utilize this new technology. It’s a lot. 

AI is a siren song of time savings and productivity. It promises to summarize vast amounts of data, surface helpful tips, and streamline workflows. But before we commit to the promises of AI from every provider, software vendors need to be transparent with customers about where their AI models are coming from. We are used to asking questions about ethically sourced consumer products. What about ethically sourced data for AI?

As I wander the halls at Enterprise Connect this week, these are the types of conversations I want to have with software providers. 

1. Do you use my data to train your tuned or proprietary models?

As SaaS providers surface fresh AI innovations in their products, it’s only logical that end-user telemetry is part of the modeling fueling these capabilities. It will be interesting to see how transparent software providers are about their AI features’ origin story in the coming months. 

Generative AI providers like Google, OpenAI/Microsoft, and Adobe (in partnership with NVidia) aim to democratize access to an aspirational array of new standard models and capabilities. But they have also been coy about what data was used for training their LLMs. Only Adobe, in their announcement last week, made clear that they trained based on their own copyrighted or stock photography. 

Regardless, AI for most SaaS providers and end customers will not be relevant with just off-the-shelf generative models, even those as vast as OpenAI or Google.  So SaaS providers will have two choices: tune the generative AI models or create their purpose-built models to sit alongside.

In most cases, it’s both by adding data to an existing or new proprietary model. And the source for that data likely comes from end users in the form of product telemetry. From you. 

Many software providers can use end-user data via language in their software user agreements. Whether you are indifferent, against it, or somewhere in between, I feel every customer should have a clear idea of how each provider uses your data. And your agreement should be in writing. 

2. Are your proprietary models centralized or segmented? 

In cases where software surfaces insights to your end users through direct product usage, you should understand if the data powering those insights is segmented. While Microsoft and Copilot have been adamant about tenant-specific “context” in AI suggestions, we haven’t heard whether more significant, anonymized insights are created across customer sets. For example, the productivity tool Smartsheet has been specific about not commingling insights across customers.  

There’s the concern about privacy, but also specificity to ensure that insights delivered to end users are meaningful. In some cases, you can’t expect predictive analytics for an end user in K-12 education to be the same as those for a large, multinational enterprise. 

Not every customer will want their data to be part of the insights powering other organizations. Regardless of strategy, software providers must be open about how their models are trained, updated, and also tested for accuracy and bias.

3. Can I opt-out of you using my data in your proprietary models?

AI creates the ultimate bartering system between software providers and their end users. It’s more than just a give/get around monetization. Most users are happy to share their data to make a specific tool or application more beneficial if given the proper value. But they have to be given a choice.

Not only are there existing policies like GDPR and the new California Consumer Privacy Act (CCPA) requiring software providers to comply with the “right to be forgotten,” there are also a host of new data protection laws underway. Cloud giant AWS recently made a data sovereignty pledge to ensure their services remain compliant by default. With hyperscalers on board, there are limited excuses for SaaS providers. (Outside of legacy product architecture.)

SaaS providers must ensure that customers can easily opt out of sending data to AI services, both contractually and in-product. And time will tell if that also means whether that opt-out requires refactoring of existing models.

4. How can you enable me to tune my experience based on existing investments? 

“Bring your own” applies to many parts of SaaS, and AI seems like it’s no different. 

Collaborative (or distributed) learning enables decentralized tuning of models across organizational boundaries without sharing the raw data. For organizations that have already put in the work to their AI models, this protects the end user data and makes SaaS strategy selection modular. 

There’s nothing worse than making yourself beholden to one specific software provider. The distributed model seems like a good compromise for organizations that want the value of AI without the potential privacy risks. 


These new AI innovations are creating opportunities for more insights and optimizations in the daily work of SaaS applications. But surfaced suggestions by AI come at a price often not understood by the average end user. Software providers must be thoughtful and transparent about the value that AI provides and where the value of that AI technology originated from.