Researchers, research engineers, have you thought about ElasticSearch to drive your R&D?
Mis à jour : 28 juin 2019
Well, maybe not “drive” but at least value your research efforts and exploit them to the best! Having been a researcher in a not so far past life – which brings back good memories...-, I do know how painstaking monitoring relevant research papers can be, when building a state-of-the-art for example. In addition, say you work in a collaborative project, as you run batches of experiments, it does not last long until the data you and your partners produce and share reaches a large size. That's a fact, R&D teams consume, produce and share impressive amount of data.
Fortunately, with the growth of new technologies, there are lots of solutions to manage this frenetic “knowledge production” say for exploring and storing literature (Mendeley, Zotero, etc.), for finding and sharing data (GitHub, Research Compendia, etc.), for writing and publishing or for evaluating your research (ScienceOpen). However, if I may, I have reasonable grounds for believing that open-source license ElasticSearch might be an interesting tool to add value to your research efforts, in terms of analytics and extraction of hidden information.
As a reminder, ElasticSearch is a search engine providing distributed, multi tenant-capable full-text search. It is part of a whole Elastic Stack that includes a collector and log-parsing engine for the ETL tasks (ETL: Extract-Transform-Load), namely LogStash, and a nice user-friendly platform to analyze and visualize “unexpected” trends in your data, namely Kibana.
You will fall in love with ElasticSearch for its scalable capabilities, its astounding speed, its multitenancy, the maintenance of clients in many languages (Java, Python, SQL, Ruby, and so on). ElasticSearch uses standard RESTful APIs (which allows you to integrate, manage and query the indexed data in several ways) and JSON. Initially used to parse and extract data from log files, ElasticSearch addresses all data types and you can empower the whole thing running machine learning tasks to highlight errors, behaviors, and so on. Indeed, the log shippers, namely Beats, run across thousands of host servers, collecting, tailing, and shipping logs to Logstash.
Who uses Elastic stack and what for?
In the industry, there are already a lot of “ElastiSearch players”. We can cite Adobe that uses Elastic to make search smarter with machine learning at scale: CNN (Convolutional neural networks) are used to get image embeddings that describe the images of a dataset. This leads to a set of describing vectors that can be indexed and searched using ElasticSearch. Similar images can then be found by picking the most similar embeddings based on the calculation of an Euclidean distance. There is also the examples of Slack and Cisco using ElasticSearch for anomaly detection, IEEE GlobalSpec that using ElasticSearch for replacing its legacy product search. And many many others such as Facebook, Netflix and Linkedin and many use case examples (say business intelligence, web analytics, compliance, and so on) since ELK (ElasticSearch-Logstash-Kibana) has become quite a popular log management platform.
Use cases for R&D
You can use it to monitor and manage your R&D team's activity but not only. Since any data types are supported (numbers, text, structured or not), you can imagine using ElasticSearch for solving plenty of new challenges within a research experiment. For instance, Dr Schulz from Yale University (department of Medicine) needed a flexible easy analysis for a large amount of evolving data. He used ElasticSearch to search the database shared with clinicians and researchers to identify novel causes of cancer, therapeutic targets and eligible patients for clinical trials.
A second example is the Center for Open Science that used ElasticSearch for improving scientific research and collaboration shows the potential the tool for R&D teams.
There are plenty of possibilities that you may consider to speed up your research, enhance your efforts, explore new dimensions in your datasets using ElasticSearch. Depending on your workflow and on your problem, you should carefully choose where to have ElasticSearch taking action, which data to correlate, and so on. I hope this post will inspire you!