Project Overview
This project aimed to analyze IMDb’s extensive dataset, encompassing international movies and TV episodes from 1925 to 2022. Focusing on two actors (Anthony Hopkins, Daniel Radcliffe) and two actresses (Natalie Wood, Emma Watson), our objective was to trace their career progression, analyzing movie counts, awards, and ratings over time. This analysis allowed us to observe trends in actor productivity and popularity, while also showcasing the capabilities of SQL for large-scale data handling.
Tools and Libraries Used
To conduct this analysis, we utilized:
SQL Developer: For writing and testing SQL queries, which were later finalized in Omega CMD
Python: For additional data handling and visualization, with libraries like:
Pandas: For managing and manipulating the query output data
Seaborn and Matplotlib: For visualizing career progression and comparative metrics
Methodology
Data Exploration and Query Writing
Query Development: We started by exploring IMDb’s dataset to determine which columns and tables were essential for extracting the actors’ and actresses’ information. Using SQL, we structured queries to obtain metrics such as career span, total movies by year, and ratings.
Command Line Execution and Export: We ran the finalized queries on Omega CMD, then exported the data to CSV format for further analysis in Python.
Analysis 1: Career Overview for Each Actor/Actress
Career Span and Movie Counts: Using SQL queries, we gathered data on each actor’s and actress’s career span, yearly movie counts, and total movies over their lifetimes.
Career Graphs: We visualized career progress for each actor/actress using line graphs, revealing peaks and declines in productivity. For instance:
Anthony Hopkins demonstrated an initial rise, followed by steady activity after his iconic role in The Silence of the Lambs.
Emma Watson showed consistent activity during her educational years and an increase in the latter years before taking a hiatus.
Insights: These career graphs showed that actors’ output typically follows a normal distribution, with peak years and gradual declines. This pattern aligns with known industry trends.
Analysis 1-b: Career Segmentation
Segmented Analysis: We divided each career into disjoint time intervals to observe variations in output over time.
Insights: This approach highlighted key periods, such as Natalie Wood’s peak after her Oscar nomination and Radcliffe’s busiest years post-Harry Potter.
Comparative Analysis: Ratings and Awards
Movie Counts and Ratings: Using bar charts, we compared actors based on the number of movies and average ratings, which showed Emma Watson having the highest average rating despite fewer total movies.
Awards and Acclaim: For awards like the Oscars, Golden Globes, and People’s Choice Awards, we found:
Anthony Hopkins led in Oscar nominations and wins, reflecting his career longevity and critical acclaim.
Emma Watson and Daniel Radcliffe scored higher in People’s Choice Awards, demonstrating their popularity among younger audiences.
Composite Rating Metric: We created a custom “rating-to-movie-count” ratio to balance quantity and quality, ranking Watson and Radcliffe as top performers based on this metric.
Analysis 2: Actor/Actress Productivity (1952-1961)
We extended our SQL analysis to a span from 1952-1961 to identify actors and actresses with the highest productivity, examining maximum and minimum counts in various ranges.
Findings: Notable results included actors like Mahmoud Al Meleji and actresses like Nirupa Roy, who had exceptionally high output during this period.
Challenges and Solutions
Login and Execution on Omega CMD: The Omega CMD environment required a specific setup. We opted to use SQL Developer initially to streamline query testing, reducing debugging complexity.
SQL Debugging: With limited SQL experience, we initially struggled with query errors. Familiarizing ourselves with SQL syntax over time improved query accuracy.
Ambiguity in Requirements: For Analysis 2, we faced unclear requirements but chose to analyze both interpretations, leading to a comprehensive overview.
Key Insights
Career Progression: Actors’ careers typically follow a pattern with periods of high productivity and later years of sporadic work.
Award Recognition: Anthony Hopkins was the highest-ranked for career awards, while Emma Watson achieved higher average ratings, emphasizing critical acclaim versus popularity.
Productivity Trends: Analyzing career data over time showcased SQL’s effectiveness in handling extensive data sets and the utility of visualization for interpreting career trajectories.
Summary
This project highlighted how data analysis can reveal patterns in actor productivity and popularity, validating industry norms about career trajectories and audience acclaim. It also demonstrated SQL’s robustness for large data sets and the value of visual tools like Python in transforming data into actionable insights.



