Is Citizen Data Science A Good Thing?
Self-service Discovery Analytics established itself soundly in the mid-2010’s thanks to pioneers like Qlik, Tableau and Tibco Spotfire. These vendors had a new sales model approaching the influencers and decision makers in the business and offering entry level licenses for a price that one could essentially put on a Purchase Card because that price was within their discretionary budget authority. The User Experience of these platforms closed a huge and painful gap for the business users, but opened new concerns around data security and governance. The IT teams developing and supporting the enterprise analytics solutions were taken aback and tried to resist the movement to no avail. Then the industry researchers published points of view on the value proposition of self-service and offered frameworks and approaches to enable the movement. IT eventually came to the realization that it is best to embrace self-service analytics and collaborate with the business for the good of the enterprise. Once that acceptance caught on, the vendors enjoyed significant licensing revenue and account expansion. Then those business analysts felt their next constraint. They were still dependent on IT to deliver the desired data for their self-service discovery analytics & visualization use-cases.
On the heels of the incredible success and market disruption brought about by these self-service Discovery Analytics platforms came another wave of self-service enabling technologies; self-service data preparation / curation tools. The vendors followed the same commercial and user experience model as their self-service discovery analytics counterparts. They enjoyed the same success and caused a similar market disruption in the data integration & transformation segment. This time the research firms were out in front of the self-service data preparation products. They offered adoption frameworks, product reviews, comparisons and ratings. IT again struggled and resisted supporting this next self-service wave because it blurred the line for where and who should be building the data pipelines that extract, transform and load datasets with content considered fit for purpose by the business analysts producing and consuming those datasets. On the surface the self-service data prep platforms are functionally equivalent to the traditional IT-oriented ETL / ELT platforms. But they offer a significantly simpler user-experience. This helped reduce constraints and the dependency on IT to provide datasets. Business end-users building datasets also raised “data anarchy” concerns as these datasets proliferate across the organization. Those organizations that intentionally embrace the value proposition of self-service simply extended the rules of engagement and governance guidelines to account for self-service data prep in addition to self-service discovery analytics. Many of those same organizations also invested in modern data catalogs in order to automate the knowledge sharing, tracking and administration of those datasets throughout their entire life-cycle.
Enter Self-Service Predictive Analytic Platforms and the Citizen Data Scientist. Gartner defined the Citizen Data Scientist as “A person who creates or generates [machine learning] models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.” Glassdoor reports that Data Scientist is the number one job for the last four years. The U.S. Labor Department reports the demand for data science skills will drive a 27.9% rise in employment through 2026. There is a mounting huge demand for a short supply of qualified data scientists. Add the fact that organizations across all industries and all sizes are realizing Machine Learning (ML) and Augmented Intelligence (AI) is the new analytics table stakes. Increased demand, shortage of supply and mounting pressure to develop and operationalize ML models before the competition does. A perfect storm! There is early evidence of software platforms coming out targeting the data savvy Business Analyst / Citizen Data Scientist persona once again. The platforms provide a low-code or no-code user-experience offering pre-built ML models and algorithms targeted for pre-conceived classes of predictive analytics use-cases. The stakes (rewards and risks) are orders of magnitude higher in ML and AI as compared to the descriptive / diagnostic analytics. We need to scale but we need to keep our organizations out of trouble. How do we compensate for the unique expertise of a qualified data scientist to ensure we are using proven methods, selecting the right models for the right problems, using the right data to feed the models for training and validation purposes, ensuring the data being used and the outcomes being produced are accurate, ethical, legal and unbiased? Start with leveraging and extending the self-service framework to include continuous innovation, exploration and vetting of new predictive analytics use-cases and hypotheses. Quickly determine whether to pivot to a different use-case or persevere. In the event the decision is to persevere, it’s time to engage the qualified data scientist to perform a comprehensive Model Risk Management review prior to adding the use-case to the Analytics portfolio for continuous integration / deployment and operationalization release. The value proposition here is we have the business domain expertise mashed up with the data science expertise. We’re grooming the business analysts to be more self-sufficient even in ML / AI based analytics. This helps scale the ideation of promising new ML / AI Analytic use-cases conserving the demand for the data scientist to focus on the mentoring of the business analysts and developing the next generation of more complex models to be published to the library of pre-built ML models.
Yes, it’s a good idea to extend self-service into the predictive, machine learning analytics space. Extend the rules of engagement and governance frameworks to include [machine learning] Model Risk Management. Pair up the Citizen Data Scientists with qualified Data Scientists. Establish a continuous innovation / exploration discipline to incubate and validate new analytics-enabled use-cases. Then watch your competitors navigate your wake in the rear-view mirror.