Data Science

"Machine Learning is written in Python, Artificial Intelligence in PowerPoint"

With over 20 years of experience, Data Science and Data Visualisation are a core competency of Micro Source.

The "old" name for these actvities are Management Support systems, Decision Support Systems and Business Intelligence. The main issue in Business Intelligence is applying it in a useful manner. For that Goodhart's law may be the most important consideration: "When a measure becomes a target, it ceases to be a good measure." Metrics are typically just a proxy for what we really care about and we often forget that correlation is not causation. Many things we do care about can not be measured. We are inclined to focus on what can be measured and as a result make mistakes. In general metrics will be gamed, people will tweak what they do to satisfy the measure and focus on short-term results. Most long-term goals depend on a complex mix of factors and are hard to measure. Letting metrics replace strategy can harm a business. Metrics can be helpful, but we should never forget that they are just proxies.

It becomes ethically problematic when Data Science is used to mislead and deceive users through deceptive design of user interfaces we call Dark patterns. Dark patterns are user interface design choices that benefit an online service by coercing, steering, or deceiving users into making decisions that, if fully informed and capable of selecting alternatives, they might not make. Such interface design is an increasingly common occurrence on digital platforms including social media websites, shopping websites, mobile apps, and video games. The following study by a group at Princeton university provides a lot of information on this issue: Dark Patterns at Scale: Findings from a Crawl of 11K Shopping Websites

(Read the blog posts by Rachel Thomas on https://www.fast.ai/ for more thoughts on these issues)

To perform useful Data Science we support the following "pillars":

- Data Science is all about "change" whereas most other business support systems are about "run". Therefore software provisioning, configuration management, and application-deployment applications like Chef and Ansible are indispensable for effective Data Science.

- The next level is Metadata Management. Without knowing what the data means it is impossible to collect, analyze and present data in a meaningful manner.

- Knowing how processes are running is paramount to create useful data, therefore Proces Mining is the third pillar of Data Science.

- Storing the data in a data base obviously is a core technology. There is a lot to think about there. The storage structure (Data Vault, etc.), granularity and Real Time aspects are all important.

- Distilling meaning from ever more data requires sophisticated statistical analysis and Machine Learning.

- To present the data in a comprehensive and effective way we speak the language of Data Visualisation.

- Since data is only useable if it can be accessed by all parties involved, licensing is an important aspect of the software. We have quite some experience with the consequences of licensing models.

Henk Scholten is a certified Data Vault specialist and finished the following Coursera data science on-line courses:

- Stanford University Machine Learning by Andrew Ng

- Johns Hopkins Data Science

- TU Eindhoven Process Mining.