Abstract:
With increasing complexity and volume of collected data continuing to rise, it is becoming ever more important to develop systems with high interactability. Businesses wi...Show MoreMetadata
Abstract:
With increasing complexity and volume of collected data continuing to rise, it is becoming ever more important to develop systems with high interactability. Businesses with an interest in big data continue to seek solutions that limit cost while providing effective, simplified solutions to current issues in data retrieval. Combined analysis and application of a multi-factorial system will likely lead to promising results in ease of reporting of complex data by nontechnical end users. This survey is focused on natural language processing (NLP) implementations for data query systems, especially related to massive data sets (1TB+) in OLTP databases, OLAP databases, and data warehouses. We are seeking the most up-to-date and effective uses of NLP for Speech-to-SQL and Text-to-SQL generation, and the most recent advancements in data warehousing to optimize ELT efficiency and data retrieval, focusing on the highest performing code implementations on the Spider and WikiSQL datasets. Many models, including sequence-to-sequence (seq2seq), sequence-to-SQL (Seq2SQL), and fuzzy semantic to SQL (F-Semtosql), among others, are briefly described and compared. As well, recent advancements in data warehousing technology like multi-disk buffering in the ELT process and hybrid multi-dimensional and relational OLAP databases (HOLAPs) are discussed. The learning gathered here is applied to fill a gap in the current industrial knowledge base in service of increased efficiency in data access, retrieval, and reporting in a customer-facing environment.
Published in: 2021 IEEE International Conference on Recent Advances in Systems Science and Engineering (RASSE)
Date of Conference: 12-14 December 2021
Date Added to IEEE Xplore: 25 January 2022
ISBN Information: