A Data Science Platform (DSP) for Scientists

Scientific advance, like civil society, is based on a shared understanding of objective facts. Hippocampus Analytics is an advocate for and developer of solutions for scientific data sharing that will also address problems in the modern data sharing economy. According to a 2017 Forbes article, data scientists spend 80% of their time preparing data and only 20% of their time using that data to solve a problem. This is problematic because data scientists tend to come from computer science backgrounds while scientists focus their training on employing the scientific method. Even freely available data sets can be difficult to use unless a scientist is working with a data scientist. Additionally, the amount of data available has only grown since the publication of that article.

The DSP provides easy to access to a range of basic data preparation services to the scientific community. Users search a catalog of data along with metadata (e.g. experimental parameters). They can manipulate data sets using familiar filters including Excel-like SORT, range functions and SQL-like SELECT, JOIN functions. These tools can be customized to the user’s needs and our basic visualization tools can be used to refine the data set. The finished product is delivered to the user's computer in their choice of formats. We also provide a mechanism by which they can publish their data and analyses, enabling collaboration and accelerating knowledge acquisition.

This website describes the technology being developed for the DSP. Our system is based on Kubernetes technology and can run in a hybrid cloud environment or on prem. the Kubernetes cluster is currently inaccessible externally. If you are interested in a demonstration of back-end processing, please email me. The user interface (illustrated below) is also not yet implemented. The user will be able to specify complex data transformations in a set of steps, each performed by one or more module(s). Finally, as part of the DSP, we are developing efficient mechanisms for searching large data sets based on biological circuits that perform similar tasks. Recommender engines were initially developed for this purpose and remain widely used in industries such as social media. However, they are unsuitable for use in the scientific endeavor as they rely on algorithms that are not transparent. This introduces users to an unknown unknown and enables confirmation bias (among other problems). Users do not know which data they are not seeing and they do not know why. This is antithetical to the idea of a controlled experiment. To read more about the importance of critical thinking and how to think critically, please look here.


Features

I am always looking for partners and to connect with other data enthusiasts. Please contact me if you have questions or are interested in learning more.


Hippocampus Analytics is a Woman Owned Small Business.