So yesterday night at around 10 PM after a tiring day at work, I realized that its just tuesday only and also there is no Arsenal champion's league match today. I decided to watch a movie. A good sports based movie.
The good thing was that, I have some good hollywood movies in my hard disk(that mostly I have copied from my friends hard disk), and I have not watched most of them.
But which movie belongs to genre sport !!. Ahh. I am not going to do a manual search for each movie on google/imdb to find its genre.
I wanted some quick solution. A small python script will do the trick for me here. The script will take the name of each file present in my movie folder and give me the genre as well as imdb rating of that movie.
Not so easy bro. So first of all, the filename is not the movie name. Filenames are like, We.Are.MarshallDvDrip[Eng]-aXXo or White House Down 2013 BRRip 720p x264 AAC - PRiSTINE [P2PDL] or The Sting.avi etc.
Doing a direct search for these names were not giving me any result in IMDB. so I needed to extract movie name from these file names. Let me write a quick regex for that. Not so tough .After doing some hit and try I came up with "^(.+?)\s[\(\[\d].+" which was giving fair results for most of the cases. Rest was handled by re.
Now comes the hectic part. fetching the output using urllib and using BeautifulSoup to parse it. Ahh.!! Parsing is tiring. Especially on tuesday night :P. Google helped me in that. There is this IMDbPY package already there. Though It has some problems, like it doesn't work with python3 and it is little slow. But still it was solving my use case.
After that I wrote a quick script which finds the movie name from the filename from my movie folder and got me the result. Ahha. problem solved.
<p>__author__ = 'harsh'<br>import imdb import os import re ia = imdb.IMDb() inp=raw_input("enter the movie directory location: ") print(inp) for i in os.listdir(inp): i=i.replace("."," ") i=i.strip("avi") i=i.strip("mkv") m=re.search("^(.+?)\s[\(\[\d].+",i) if m: name=m.group(1) s_result = ia.search_movie(name) if(len(s_result)>0): x=s_result ia.update(x) if(x.has_key('genres') and x.has_key('rating')): print(name+"-"+str(x['rating'])+" "+str(x['genres']))</p>
It took around 10 minutes to get me the result. And yeah I have chosen Hoosiers, the first result having genre sports (I always believed greedy is good :P). It was really a nice watch.
I can do a lot of other things also. Like sort the movies by decreasing order of imdb rating and etc. Yeah sure. Will do it later. :P