Skip to Main Content
In the last few years, information extraction (IE) has become a rapidly expanding field as the machine-readable documents keep growing exponentially. IE is the perfect solution to transform factual knowledge from publications into database entries. Many efforts have been made to automatically extract and mine scientific texts ranging from biochemical to terrorism attacks reports. This study is looking into the opportunity to extract important facts from the PETRONAS health safety and environment (HSE) reports for database construction and analysis purpose. The reports are currently managed by PETRONAS Group HSE in Malaysia which contain the information on incidents and accidents occurred during the design, construction, operation and maintenance by all the PETRONAS Operating Units locally and worldwide. The effort to automate PETRONAS HSE reports will greatly benefit the PETRONAS Group HSE to automatically populate the database entries in which traditionally the task is arduous and time consuming. Many algorithms have been reported for IE ranging from simple statistical methods to advanced natural language processing (NLP) methods. This study investigates one of the NLP approach known as link grammar (LG) for extracting relevant information. LG appears within limited literature search to be the most suitable candidate algorithm. However, an exhaustive literature search will reveal the algorithm best suited to this application work.