In the last 3 days, I tried to find out the probable friends, and it was not as difficult as I thought.
I have around 340 friends, and if I can get all the friends list pages of all my friends, then it would be easy to get the probable friends. wget is a very nice utility to download anything. For authenticating the session, we need to pass the username, password, and some other information to orkut as post data. It would be time consuming to findout all the form information. I used Live http headers extension in firefox. With that, if you do any operation in any website, it will store all the http information. For example, if you start Live http headers, and login to orkut, it will store all the requests that it had sent for logging in. It would take around 4-5 requests before displaying the home page. We can use that http headers to build wget request. There are few options --load-cookies, --save-cookies, --keep-session-cookies in wget. With that, we can have a proper session with wget.
Once the authentication is done, rest all can be automated. I have done in the following way.
- The script takes my id, and downloads my profile page by wget.
- VIM parses that file, and finds out no.of friends, and a C program generates the URLs for all the friends. (It is possible to generate URLs also with VIM. I did not remember how to do it at that time)
- wget gets all friends list pages of all my friends.
- VIM parses all the files, and stores all the friends of friends in plain text format.
- A program written in Java takes all the friends of friends, and finds out probable friends.
I have around 50,000 friends of friends. Around 1000 friends of friends have more than 5 mutual friends. It was much more than what I expected.
After getting all this data, I was wondering, if orkut adds this feature, would I use it or not? Except for the feature, find your gmail contacts in orkut, In all other features, orkut uses paging, and shows 10-20 items in a page. If I have to see all the 1000 friends of friends, I may have to open 50 pages, and it would be tedious. But, with my script, I could get all the details in a single plain text file, with one line for one friend of friend. So, it was very easy for me to find out my friends. If orkut had implemented this feature a week back, I would not have tried to use this feature, and would have browsed all the 50 pages, and would not have learned about logging into a website programmatically. So, I should be thankful to orkut for delaying this feature. ;)
I would like to thank Deepak Manohar for teaching me few VIM commands for parsing the html files.