A Comprehensive WebScraping of IMDb’s Top 50 Movies using Beautiful Soup | IEEE Conference Publication | IEEE Xplore

A Comprehensive WebScraping of IMDb’s Top 50 Movies using Beautiful Soup


Abstract:

Data encompasses all factual and measurable information that is accessible, quantifiable, or recordable. This information can take on a variety of forms, including numeri...Show More

Abstract:

Data encompasses all factual and measurable information that is accessible, quantifiable, or recordable. This information can take on a variety of forms, including numerical figures, written content, visual depictions, or symbolic annotations. The primary goal behind this undertaking is to extract such data from the official website of the Internet Movie Database, IMDb’s top 50 movies of all time using the BeautifulSoup package available in the Python package library which can be used for HTML parsing to create a parse tree for parsed pages. These parsed pages can in turn be utilised to extract data from the HTML code snippets used to code the official website’s data aiding in further analysis and research studies. For the main objective here, we have used IMDb’s official website as the foundation, with the language Python serving as a programming link to scrape data using the built-in functions in the Python package library such as Numpy, Pandas, Requests and BeautifulSoup.In this paper, we create an accurate Data Frame using a Python package called Pandas, this Data Frame can enable users to search for any required attribute from the variety of attributes available in the data frame such as release year, meta scores of each film, the type of genre they belong to and parental guidance suggestion. The objective of the paper mainly focuses on easing the user’s search process by providing them with different attributes they can select from. The example website utilisation presented herein serves as a fundamental illustration of our research efforts. Extrapolating this methodology across various websites spanning multiple domains opens up an immense amount of opportunities for insightful analysis and strategic data utilization, considering the transformative potential of employing such a thorough strategy across a range of digital platforms.
Date of Conference: 17-18 April 2024
Date Added to IEEE Xplore: 11 June 2024
ISBN Information:
Conference Location: Chennai, India

Contact IEEE to Subscribe

References

References is not available for this document.