Working with the Libindic Modules

After I had successfully completed the setup of libindic on my system. My next target was looking for bugs and solving them. So I started playing around with the modules and started looking for bugs.

The first module that I started with was Soundex. The module was not showing proper  results , so I started to work on that . I added conditions for the inter and intra language cases and the module started to work fine after that. I made a pull request to merge the changes.I was new to open source , so initially I faced hurdles like  travis test failure. I used to check my mail almost all day to see if the changes were merged or not. After successful merge of my first pull request, I was very happy! 😀  This gave me motivation to explore more modules and work on them.

The next module that I worked on was Spellchecker. I realized that the dictionaries taht were used didn’t have many words. So I started looking for more datasets(ILCI, ILMT, WikiDump, IITB Hindi Wordnet )and started to make dictionaries out of them. Since they were huge corpus, I spent my whole week sitting in the lab day and night watching the code run .

Next module and the one which I enjoyed the most working on was the Scriptrender module. The pdf generated from the urls had alignment problems , so I decided to work on that. I started looking for python libraries and pdfkit library solved the alignment problems. So I made a pull request. But there were some rendering problems with the Malayalam Font and complex script rendering was broken. Slowly, many people from the community got involved in helping me get better results. And I started to try different fonts.”Meera” font solved the problem.  And we got the desired results. The appreciation for my work gave me a lot of confidence to work more.  While working on this module , I realized how helpful everyone in the community was , which gave me courage to work more.

Next was the IndicStemmer module. The community already had stemmer for Malayalam . I made stemmers for Hindi and Punjabi . I followed research papers for the rule based approach for the 2 languages and implemented them. Whenever I am bored, I start adding quotes to the fortune corpus for Hindi and really enjoy reading them .

I really enjoy contributing to the modules and will continue exploring them whenever I get time .   Primary focus is my GSoC project. ( yes , thats what the next blog post will be about  :D)



Libindic Installation Guide