Optimizing an LLM Prompt for Accurate Data Extraction from Firearm-Related Listings in Dark Web Marketplaces | IEEE Conference Publication | IEEE Xplore

Optimizing an LLM Prompt for Accurate Data Extraction from Firearm-Related Listings in Dark Web Marketplaces


Abstract:

The Dark Web, known for its anonymity and illicit activities, presents considerable challenges for Law Enforcement Agencies (LEAs) due to the complexity and volume of dat...Show More

Abstract:

The Dark Web, known for its anonymity and illicit activities, presents considerable challenges for Law Enforcement Agencies (LEAs) due to the complexity and volume of data generated within it. Online marketplaces on the Dark Web are notorious for facilitating illegal activities such as drug trafficking, counterfeit goods, and weapons sales while using advanced obfuscation techniques to avoid detection. The unstructured nature of data on these platforms and their constantly evolving operations make manual extraction and analysis exceedingly difficult.This paper addresses the pressing need for structured information extraction from Dark Web marketplaces, with a specific focus on firearm-related listings. Traditional rule-based methods have proven inadequate due to their reliance on HTML tags and pattern recognition, necessitating more adaptive solutions. Thus, the application of Large Language Models (LLMs) and Prompt Engineering to tackle these challenges is explored. By leveraging the capabilities of LLMs, this study aims to transform the extraction process into a more efficient and accurate system. Various generative models and prompt formulations are tested, to determine the most effective approach for extracting detailed information such as product specifications, pricing, and seller details.The proposed pipeline involves feeding crawled marketplace pages into a generative model, which then identifies Product Details Pages (PDPs) and consequently extracts relevant information from them. The use of LLMs marks a significant advancement over traditional methods, enhancing the accuracy and comprehensiveness of data extraction. Additionally, this research highlights the effectiveness of prompt engineering in improving information retrieval.This work underscores the critical need for sophisticated tools to monitor and combat illegal activities on the Dark Web, particularly in the context of firearm trafficking. By refining techniques for automated data extraction and a...
Date of Conference: 15-18 December 2024
Date Added to IEEE Xplore: 16 January 2025
ISBN Information:

ISSN Information:

Conference Location: Washington, DC, USA

Funding Agency:


I. Introduction

The Dark Web has long been a domain of anonymity and illicit activity, posing significant challenges for Law Enforcement Agencies (LEAs) tasked with monitoring and investigating these hidden corners of the internet. As the digital underworld continues to expand, so too does the volume and complexity of the data generated within it. Online marketplaces on the Dark Web are particularly notorious for facilitating a wide range of illegal activities, from drug trafficking to the sale of counterfeit goods and weapons, all while employing sophisticated techniques to evade detection. The unstructured nature of data on these platforms, combined with the constant evolution of their operations, makes it exceedingly difficult for LEAs to extract and analyze relevant information manually. The sheer scale of illegal listings, coupled with the marketplace administrators’ deliberate efforts to hinder automated scraping tools, further intensifies these challenges.

Contact IEEE to Subscribe

References

References is not available for this document.