Why I created Youtube Movie Ranker?
One of my favorite casual activities is watching free YouTube movies.
Before watching a movie, whether it be shown on any site, or in theaters, is to check out the IMDb score, genre, and popularity before investing the next 2 hours into a movie.
The problem I saw with Youtube Movies is the difficulty in navigating between IMDB and Youtube.com to research movies.
Additionally, Youtube lacked any feature to sort movies by Youtube metrics such as likes let alone IMDb data such as popularity, IMDb score, and year released.
I decided to solve this problem by creating an all in one integreated solution, between IMDb, and Youtube, which allows viewers to rank movies based on IMDb rating, IMDb votes, YouTube likes, and release year.
Additionally, my solution allows for users to view the genres of a movie, and movie description. My solution allows users to browse, research, and watch YouTube movies without ever needing to leave the site.
Why I chose these technologies?
The initial plan was to use Postgres as I was already very familiar with it from building my Car Gurus scraping project.
However, with this new project, I had to account for the potential scale, and the higher amount of traffic to this site.
Another factor I considered was that the data will have inconsistent schema across each movie, as well as a large amount of attributes for each movie.
Because of this, I decided to use MongoDB for data storage, for its unstructured data structure which does not require joins during transactions.
Additionally, I planned to use Flask for both front-end and back-end by passing data to jinja templates using Flask.
Upon initial deployment to Vercel, I discovered Vercel limited Flask to data transfer using only session storage, which unfortunately maxed out at around 4 kb.
My max data transfer needs was around 200kb, which meant I had to implement a headless solution, separating the back-end and front-end by using Flask as a REST API server, and implementing a JavaScript framework as a client-facing application.
I decided to use Vue3 due to its simplicity, future support, and the fact that it is a relatively new framework, released in 2020.
Major Challenges.
The first major challenge was learning a completely new framework, using JavaScript, a language, I have bearly touched previously.
Trying to manage dependencies, Vercel requirements, issues with many dependencies not compatible with Vue 3, and a couple of corrupted packages
because of installation halting in the middle due to permission issues, was probably the most difficult part of the project. End to end,
the whole project took around 2 weeks, with dependency and dev ops challenges taking up at least a quarter of the development time.
Another issue was trying to build a data pipeline that is able to integrate YouTube data with IMDB data before transforming it into BSON for MongoDB.
The major challenge was querying each movie in IMDB, which presented two issues. The first is, there isn't a unique ID field that can be used
to query movies in IMDB given Youtube data. The only field that did work to some extent was the movie title, however, around 30% of the youtube movies
did not have a unique name when searching through IMDb datasets. I did consider runtime, however, the runtime had a large difference between YouTube and IMDb
which would not be accurate enough. The second issue I encountered when building the pipeline, was trying to work with querying a 9.7 million record TSV file.
I had to implement a combination of iterrows, loc features from pandas, and regex to create an optimized pipeline.
Minor Challenges.
When creating the modal for the movies to play in, I created a modal element inside of the for loop for each card in Vue. This caused an unexpected issue
with users who used AdBlock. When youtube initializes a youtube video on a website, it preloads youtube ads. If the preload fails, it keeps resending, every few seconds for around half a minute
to several minutes. This does not impact performance when it’s just one video. However, since my site was initializing ~1000 videos at once, this would completely cripple
my site with thousands of requests to Google ads. To solve this, I had to implement one modal that would pass in movie data as variables when a specific card was clicked.
Another challenge was pausing the youtube video when closing the modal. The difficulty came from using a combination of prebuilt solutions that made researching
and creating a solution challenging. I used Bootstrap modal, YouTube API, and Vue which each created restrictions in the type of solution I could use. First of all the only way to detect
whether a Bootstrap modal was closed was to catch the asynchronous event that it throws from behind the scenes. Additionally working with stopping the Youtube video via the API in the Iframe,
was challenging in its own right. Next Vue has limitations as to how you can reference HTML elements from the methods and computed functions.
Another issue encountered which I found interesting, was a result of not understanding how the key was used in the v-for loop in Vue. When ordering the movies
by just one specific field and one specific sort order, four random movies with incorrect sort order would show up at the top of the results, and if you were to
click another order by field, and switch back, the 4 movies would accumulate, over and over. I looked into MongoDB to see if I could find any pattern between these
four movies. I discovered that these 4 movies each had another movie in the database with the same name. That’s when I realized that for some reason Vue was mixing up these
movies, or had some trouble identifying these movies in the loop. Because the data that was passing from Vue data to the template in front was in the correct order, but once displayed
through the for loop the order was incorrect. Turns out I used the movie title as the key in the for loop, and when switching to a unique id such as the auto-generated Object ID from
MongoDB, that solved the issue.
Screenshot of home and movie pop-up.