OPTIMIZED DATABASE FOR CHEMICAL LABORATORIES AND PRODUCTION

SKU0038

DESCRIPTION

The web service is designed for searching chemical compounds, including methods for their synthesis and articles describing these methods. The service aims to optimize the work of researchers in the chemical field at all stages of research, from data collection to the development of new methods and substances. As a result, the process of developing new compounds and working with necessary substances becomes more efficient, contributing to the advancement of chemical technology and related fields such as biotechnology, the food industry, ecology, and pharmaceuticals. The centralized chemical database includes both compounds and synthesis methods, with references to original sources. By consolidating all reactions and compounds, researchers no longer need to search through a vast number of sources, each containing endless information. Instead, they can simply enter the required compound in the search bar and get all the relevant information, particularly the reactions that can be used to synthesize it.

ADVANTAGES OF THE DEVELOPMENT

This structured database with multiple entities addresses the issue of information overload in chemistry. The entities, namely “compound” and “reaction,” are automatically populated using a machine learning model that extracts both the compound and the reaction related to it from articles. This approach allows all useful information for a chemist to be collected in one database. Current open-access search engines (e.g., PubChem or SciFinder) only have the entity "compound" and lack information about reactions, which are crucial in chemical development. Thus, chemists can learn about physical properties, toxicity, etc., but not about the possible reactions to obtain that compound.
This database is more informative than existing solutions, as it fills a gap in available chemical data. Additionally, most search engines today (PubChem, SciFinder, ChemSpider, ChemSynthesis) rely on manual data entry, which leads to errors and duplicates. The manual process also hinders the timely updating of data, as thousands of articles are published monthly, making it physically difficult to process all of them. The machine learning model used in this service automates the extraction of data from articles, speeding up the process of compiling information about reactions and compounds, thereby creating an up-to-date chemical database. The database uses the Neo4j graph database to establish connections between entities, replacing traditional relational databases. This allows faster searches and better visualization of data. Currently, no chemical databases employ this approach.

CHARACTERISTICS

Number of compounds in the database: At least 1 million
Number of reactions in the database: At least 1.2 million
Number of articles and/or article references: At least 400,000
Query execution speed: No more than 500 ms
Maximum number of simultaneous users: No more than 1,000

APPLICATION AREAS

Pharmaceutical and food laboratories
Educational institutions