Exporting Mercurial Data
16 February 2014
Yesterday, I came across a report card for GitHub users. It inspired me to mine the data from my current project, which uses Mercurial. For those of you that do not know, Mercurial is similar to git in that it is a distributed source control system. It is written in Python, which makes it the language of choice for exporting the data I am interested in.
For now, I only want to get the revision information into MongoDB so that I can play with the data later. For this, I needed a few packages that I installed via pip.
The first package I installed is hgapi. Python has an API that it uses internally. However, it is not an official API because (I suspect) the Mercurial team wants to keep its options open to change it. When Mercurial is installed, it also puts the API on the file system, but it is not stored where Python can find it. There is a work around to use it. However, to keep things simple, I opted to follow Mercurial's suggestion and used hgapi. Simply install hgapi via pip by running:
Since I am putting the data into MongoDB, I also needed a MongoDB driver. I am using PyMongo.
Because the Mercurial API returns the time stamp as a string and I want to be able to parse the string to a datetime so that I can properly store it in MongoDB, I also imported dateutil.
With all of the dependencies installed, it is time to put it all together. The code below opens up the repository from the local Mercurial repository, loops through each of the revisions and inserts the metadata into the MongoDB collection.
I would love to know how the report card for GitHub users runs so fast. The Mercurial API is not exactly fast to loop through all of the revisions. But, then again, I do not find Mercurial to be all that fast to begin with. That being said, the data is now in MongoDB, and I can use its speed to quickly map-reduce the data for reports.
is licensed under a Creative Commons Attribution 4.0 International License