Chemical and biochemical research published in scientific publications and patents is rapidly increasing. This explosive growth makes it difficult for scientists to keep up with all the new discoveries and advances, even within relatively focused disciplinary fields. Therefore, in response to this rapid growth of research texts, automatic text mining tools have been developed to help scientists efficiently and effectively extract knowledge from these data. In chemistry and biochemistry, key information about synthesis, properties, and mode of action of chemicals is critical for pharmaceutical and life science applications but is often described only in natural language texts.
Biochemical texts contain rich information. In this research topic, we hope to explore text mining methods to analyze and convert unstructured natural language descriptions of chemicals and their interrelationships into actionable structured knowledge. The complexity of biochemical texts presents some challenges for achieving these goals. First, biochemical texts often contain many domain-specific terms, making models developed for general-purpose language processing less effective. Second, scientific literature often contains complex sentence structures. Finally, biochemical texts often couple chemical structural information with linguistic descriptions, resulting in texts containing images, graphs, and tables that convey key information. To this end, this article collection calls for novel approaches to address these challenges, aiming to improve the effectiveness of text mining in biochemical data.
This Research Topic calls for research papers addressing natural language processing or text mining of chemical or biochemical texts, including scientific literature or patents, as well as information retrieval from natural texts, semi-structured texts (e.g., tables), or procedural/instructional texts which can be adapted to bio-chemical texts (e.g., recipes). Exploratory research that has not been executed is also welcome.
The nominated research themes include but are not limited to:
• information extraction tasks such as chemical or drug named entity recognition and identification of relations between chemical entities
• (bio)chemical document summarization or classification, and
• construction of knowledge bases or knowledge graphs from (bio)chemical texts.
Methods that target the identification of crucial information in relevant texts, such as chemical entities of interest and their properties (e.g., chemicals, polymers, drug names, molecules), details of chemical reactions or synthesis, or interactions between chemicals, biological molecules or genetic variation are welcome. We also welcome chemical-based algorithms, tools, and methods targeting the identification of drug-drug interactions or drug repurposing evidence in biomedical text.
Research that addresses the linguistic characteristics of biochemical texts, including resource development such as annotated corpora or domain-specific terminologies, or methods for constituent components of a chemical text mining system, including specialized domain-specific tokenization or chemical structure analysis, are also in scope.
Keywords:
Chemical Text, Biochemical Text, Text Mining, Unstructured Natural Language Descriptions, Information Extraction
Important Note:
All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.
Chemical and biochemical research published in scientific publications and patents is rapidly increasing. This explosive growth makes it difficult for scientists to keep up with all the new discoveries and advances, even within relatively focused disciplinary fields. Therefore, in response to this rapid growth of research texts, automatic text mining tools have been developed to help scientists efficiently and effectively extract knowledge from these data. In chemistry and biochemistry, key information about synthesis, properties, and mode of action of chemicals is critical for pharmaceutical and life science applications but is often described only in natural language texts.
Biochemical texts contain rich information. In this research topic, we hope to explore text mining methods to analyze and convert unstructured natural language descriptions of chemicals and their interrelationships into actionable structured knowledge. The complexity of biochemical texts presents some challenges for achieving these goals. First, biochemical texts often contain many domain-specific terms, making models developed for general-purpose language processing less effective. Second, scientific literature often contains complex sentence structures. Finally, biochemical texts often couple chemical structural information with linguistic descriptions, resulting in texts containing images, graphs, and tables that convey key information. To this end, this article collection calls for novel approaches to address these challenges, aiming to improve the effectiveness of text mining in biochemical data.
This Research Topic calls for research papers addressing natural language processing or text mining of chemical or biochemical texts, including scientific literature or patents, as well as information retrieval from natural texts, semi-structured texts (e.g., tables), or procedural/instructional texts which can be adapted to bio-chemical texts (e.g., recipes). Exploratory research that has not been executed is also welcome.
The nominated research themes include but are not limited to:
• information extraction tasks such as chemical or drug named entity recognition and identification of relations between chemical entities
• (bio)chemical document summarization or classification, and
• construction of knowledge bases or knowledge graphs from (bio)chemical texts.
Methods that target the identification of crucial information in relevant texts, such as chemical entities of interest and their properties (e.g., chemicals, polymers, drug names, molecules), details of chemical reactions or synthesis, or interactions between chemicals, biological molecules or genetic variation are welcome. We also welcome chemical-based algorithms, tools, and methods targeting the identification of drug-drug interactions or drug repurposing evidence in biomedical text.
Research that addresses the linguistic characteristics of biochemical texts, including resource development such as annotated corpora or domain-specific terminologies, or methods for constituent components of a chemical text mining system, including specialized domain-specific tokenization or chemical structure analysis, are also in scope.
Keywords:
Chemical Text, Biochemical Text, Text Mining, Unstructured Natural Language Descriptions, Information Extraction
Important Note:
All contributions to this Research Topic must be within the scope of the section and journal to which they are submitted, as defined in their mission statements. Frontiers reserves the right to guide an out-of-scope manuscript to a more suitable section or journal at any stage of peer review.