GSoC Phase 1 Wrapup

Hola!

It has been a wonderful GSoC season at coala. The projects have been really exciting and I have been handed the mentorship of the project to enhance cobot.

What is cobot?

cobot is a bot used by the coala community to serve various purposes:

  1. It is a way for newcomers to be easily invited to the community.
  2. It assists the maintainers with various tasks such as issue assignment and reviewing.
  3. It helps to search some documentation.
  4. It exposes some fun websites such as Wolfram and lmgtfy to generate easy links.

Since it’s introduction it has become an integral part of the community. Members use it very frequently across chatrooms to automate various arduous tasks such as opening issues and changing their tags.

What has been achieved in this Phase?

Even though cobot is so integral to the community it was kind of hacked together initially. It had no unit testing and quickly done up in a not so clean manner. So in this phase it was a target to get a functioning bot which was at par on the previous bot including all features and testing.

Here are some detailed updates by my extremely delightful and awesome student Meet:

  1. https://meetmangukiya.github.io/post/phase-1-mid/
  2. https://meetmangukiya.github.io/post/phase-1-end/

He has successfully completed all targets set for this phase and now onwards to the next…

What next?

One most important thing that is a target for this project is the ability to search across documentation and this search should give back results that are most relevant to the query. After much discussion over topic modeling and designing a search index we have landed on the resolution of doing a smart manual index because the documentation is just not dense enough to create a reliable automated technique.

Cheers

Bye

 


First phase over

The first phase has finally ended.

My student has done a great job and has finished 100% of his milestone.

As it always happens, his preplanning was quite bad and needed to get shaped up. But now, after getting shaped up, it looks more realistically and should be enough for him to finish his whole work for the next phases.

This phase was quite easy, as mostly mockup and design was made. Looking forward for Alex to start implementing the real things in phase 2!


GSoC Mentoring Starts

So this year, I am mentoring Saurav for the gsoc project, Documentation Extraction and Parsing, which basically continues the work from my last year’s project.

Last year’s project consisted of revamping the documentation extraction API and creating a language-independent class that parses documentation.

A bear was also planned, but it never got merged because of some regressions.

Saurav’s work is to first get a working DocumenatationStyleBear merged. By working, it means it should work as intended on the main coala repos. For keeping it simple, only Python is being supported right now.

Niklas has been a lot of help. I can’t begin to say how he has been very welcoming to both of us, and also has familiarized himself with the documentation parsing codebase in a few days despite being a few days, and has done a lot of code reviews.

I hope, with Saurav’s commitment and Niklas’ guidance, we can get the documentation extraction and parsing working as a proof of concept for atleast two languages.

GSoC 2017 Starts

Dear reader,

it was a long time without a blog post – If you want a very quick overview over great stuff happening at coala, GitMate, my life just read the headings 🙂 For more details, I’m afraid you’ll actually have to read it.

coala gets 10 GSoC Students

Last year went away fast and a new GSoC is coming for coala and me.

First things first: as unbelievable as it is, coala got 10 slots for Google Summer of Code as an own organization (first year!). We received more than 50 applications out of which only one or two were spam.

Unfortunately that means we couldn’t take a lot of students. A lot. Good students. Great. Students.

We did have some serious problems during the application phase and there were things that could have been done better from the admin side – making the competition more fair. The truth is: we weren’t prepared for so many so good applications and our processes were not decentral enough; too much work was done by the admin team directly and not enough by the mentors. We’ve learned our lesson and will have some serious iteration on how to make better processes for GSoC in 2017.

On, to better parts. We now have 10 students who definitely deserved a slot out of which I have the honour to mentor/comentor two together with my git-mate Fabian Neuschmidt:

The bonding phase has started and this year we are using GitLab milestones more strictly to track all projects. For every project we have a milestone for every phase. Check https://gitlab.com/coala/GSoC-2017/milestones/ and you’ll immediately get an impression on the progress of each and every project.

Using GitLab Burndown Charts for GSoC is Awesome

It is super crucial to know wether your student is on track with his GSoC project so you can see if things go wrong. With GitLab we can use burndown charts to identify those issues earlier. This is the first time we’re doing this so everyone is a bit behind schedule, we’re just trying it out but as the GSoC progresses we’ll get more strict about this.

I’m very happy to work with Naveen and Hemang this year, they’re totally into their projects, highly motivated and really add something to the community – we can learn a lot with each other!

The GitMate Rewrite is MIT and Getting Ready

If you visit gitmate.io now you will find a fully functional web app that allows you to configure GitHub plugins. The coala code analysis plugin isn’t ready yet but we’re working on it full steam and the first few plugins are totally ready.

With GitMate you can do any automation for PRs, issues, etc. – any events on GitHub and soon other platforms as well (if we’re good we’ll have email support soon so GitMate can automatically review linux kernel patches :)).

We’ll give you more updates about gitmate at blog.gitmate.io if we find time to actually do blog. Are you thinking about using this in your OS project or company? Shoot me an email at lasse@gitmate.io!

blog.coala.io is Running!

Thanks to the great Yuki who is constantly bored and thus lives a major part of his life in the caves below the DigitalOcean all coala websites and GitMate are now properly deployed and maintained.

Also we’re finally getting more and more content on blog.coala.io with the GSoC students (who we’re shamelessly forcing to blog) but also the coala community team with brilliant efforts like coala recipes and other fun contests. This is awesome 🙂

PyCon(s?) Come to EuroPython!

I’ve been travelling a bit. You might have seen me if you visited some PyCon. It’s a lot of fun and I’ve been meeting many many people! I’m seeing forward to EuroPython where we’re trying to get many coalaians. We’re also in contact with the PyTest community to maybe do a joint sprint or so – I’m so much seeing forward to that.

If you meet me at a conference, be sure to talk to me 🙂

Cheers!

Lasse

Mentorship begins!

Hello guys!

Last year has been amazing, I’ve blogged a lot regarding my GSoC experience as a student.

This year I come back, but as a mentor! My student is Alex (github.com/Nosferatul) and together we’re going to work on Improving the coala CLI. This surely is not going to be an easy task.

So far, Community Bonding is coming to an end and Alex is doing great. He’s already finished all his assignments and has worked on some mockups for his project.

Looking forward to working with him once the coding session begins!

Make sure to follow his blog to find out more about his project from his posts!

https://gsocsite.wordpress.com/category/gsoc/


Google Code In at coala

Dear people!

coala participated in Google Code In thanks to FOSSASIA this year.

We have always been active in engaging newcomers and teaching people about Open Source. It is only natural that we think and work towards helping pupils all over the world take this step and learn about contributing to open source. (If you are a teacher and reading this, reach out to us on coala.io/chat – we’re very interested in working with you and are also starting an initiative in germany to connect to schools.)

So let’s get some data: we had 37 successfully completed tasks. Our mentors wrote an impressive amount of 26 GCI tasks – some of which are multiple paged step by step guides that are still used for non GCI purposes.

Our star contributor Kaisar Arkhan is a GCI winner and Ridhwanul Haque made it as a finalist. We are proud of you! Kaisar Arkhan is actively helping us with our infrastructure to get status.coala.io greener every day!

An unimaginable huge part of the credit here goes to John Mark Vandenberg who mainly administered GCI for us and mentored a huge number of students by himself and helped us writing up the best possible tasks we could have. We are very thankful that we could build on his experience with the program and that we had his valuable input at every stage. Backstage, we had Mario Behling and Hong Phuc Dang from FOSSASIA working tirelessly so we could make this happen.

If you meet any of those – consider inviting them for a cup of coffee and thank them for what they are doing for our community, for FOSSASIA and for the Open Source education.

A Git Workflow for Humans

Introduction

This blog post serves as a documentation for a Git workflow that I successfully use for my Open Source projects (e.g. coala) as well as my commercial clients. It’s focused on two things:

  • Code quality, because we need it. Otherwise our stuff will break.
  • Simplicity, because we’re humans and we don’t want to use something as complicated as Git flow. (I have seen a lot of people claiming to use Git flow, however when we talked about it it almost always turned out they don’t actually use it. :))

It gives general guidelines and I encourage people to change the workflow according to their special needs – however make sure that everything you do goes towards simplificy and quality and happens for a good reason.

The following paragraphs will define the most simple and minimal approach which is a base case of how this workflow works, the extensions paragraph defines some extensions which help you dealing with several common usecases. You will likely end up using the base workflow with one or two extensions.

The last paragraph will recommend some tooling which allows you to run this workflow more efficiently.

Base Workflow

Branch Names

Branch names are important because they influence how we think about the workflow. The main branch for Git repositories is master. Master is supposed to be always stable and the main point for developers to start with. The respect for a branch named master is higher than for e.g. develop and you will yield higher quality results by just naming it like that.

For development you will want to go with user owned branches. If I name my branches feature/newui, the name contains less information than me naming it sils/newui, sils being my user identification. Any developer knows who to contact if there is a stale branch or any problems – the owner of that branch.

As an owner of a branch, I can also reset my branch to a new commit that has nothing to do with the previous history. It’s my branch and it’s my responsibility.

Code Review

Great. I have my owned branch, I developed a crazy new thing and I want it to be in master! How does it work?

Do the natural thing. Submit a Pull Request, Merge Request, patches on BugZilla or whatever review UI you already use.

Start reviewing: my strong recommendation is to make good commits and review every commit on it’s own. Make sure that every commit only changes one thing and is as small as possible. Reviewers will find more bug and you will have saved a lot of time on the long run. “Reduce technical debt.” Of course you will also want to use continuous integration and code analysis on your project to save you lots of review time and enable people to find and fix issues earlier. You can use the git rebase –interactive for fixing up your commits – don’t be afraid, after you lean it once it’ll come in handy in a lot of situations.

Many workflows would now propose to do a merge commit. I recommend to do a fast forward or implement a semi-linear workflow – why? If you worked with merge commits for a longer time you probably saw failing builds on master or other critical branches even if you had CI on all branches – merge commits are changes. If you don’t review them (and that’s a hard thing to do) they may bite you. What does this mean?

Before doing a merge you have to rebase your commits onto the latest version of master. The continuous integration will be retriggered and your builds verify your code again. You should also check manually if the commits you added underneath your existing ones could do any harm! After doing that you can either do a fastforward (git merge –ff-only) or a merge commit (git merge –no-ff) if you want to keep history of your PRs/MRs. I recommend doing the fastforward and thinking in changes, not in features. This purely psychological thing can change the way you develop source code. Your builds will not fail of deterministic reasons anymore.

Releases

I recommend doing continuous releases from your master branch. Either push your website to your server or your package as a prerelease to PyPI.

If you manually want to trigger releases, set up your CI to do it for you on your command right from master. (E.g. using the “when: manual” in GitLab CI or when tagging a commit.)

If that is sufficient for you, you won’t need any other branch than master and user owned branches.

Extensions

The following paragraphs explain how you can extend your workflow.

Hotfix Branches

You may have the need to be able to fix any production issues really quickly. You will want to bypass code review. You might even want to bypass continuous integration. The solution is simple:

Just set up automatic deployment for hotfix/… branches.

The most important thing however is not to use master! Master is always stable and reviewed. You deploy a hotfix *temporarily* and pause all other development until a clean equivalent of the hotfix is merged/fastforwarded to master. This way you don’t get your master broken but you’ll be able to temporarily deploy potentially dirty hacks when needed.

Release Branches

If you want to maintain bugfix releases featuring only selected bugfix commits you will want to branch off a release/… branch when doing a release. Usually you’ll want to name it after the major and minor but not include the micro as your branch will move over your micro releases. (E.g. release/0.8 is good.) Whenever you want to do a bugfix release, just cherry pick your commits onto that branch and trigger a release when needed.

Apply the same code review policies as for master. Doing automatic prereleases may be awesome for the people using your software, being able to get the latest stuff from master in no time.

Tooling

Long story short: keep away from GitHub. GitHub forces you into their workflow using merges, cluttering history, compromising your code quality (at the advantage of being a bit simpler for them to implement and for you to use).

The best tool I found so far for this is the GitLab Enterprise Edition, which is sadly not free software. The recommended setup is:

  • Protect the master branch. Nobody can push. Everybody can merge.
  • Allow merges only when builds pass.
  • Allow merges only when at least one (potentially more) nonauthor approved a merge request.
  • Set merges to fastforward only. GitLab will offer coders a rebase button even so you don’t have to do it manually every time.
  • Automatic deployment or when: manual for master/release/hotfix branches.
  • Set up GitLab CI to build your stuff and test it, if you’re deploying with docker, test in docker!
  • Use static code analysis like coala in your GitLab CI.
  • Enforce a minimal test coverage, ideally your coverage should always grow or stay. That’s a good way to handle legacy projects as well as mature well tested ones.

GUADEC 2016

Hi again!

I had the pleasure of visiting GUADEC this year again.

A lot of great things happened – as always GUADEC with it’s perfect size got me to speak to a hell lot of new and interesting people. Thank you all for being there – it was a pleasure.

Most of all, GUADEC has brought me to consider running GitMate as a consultancy business. The decision has not been done yet but it’s a viable option that we didn’t really consider yet for some reason.

Among other things I had the pleasure to moderate the interns lightning talks as well as the regular ones and present my annual coala lightning talk as well. My full talk about growing open source communities is available at Youtube and the CCC.

Seeing forward to next year – cheers!

https://wiki.gnome.org/Travel/Policy?action=AttachFile&do=get& /></p>
									</div><!-- .entry-content -->
			
			<footer class=

That’s it, folks!

So this is it. The end of my Google Summer of Code. An amazing 12 weeks of working on a real project with deadlines and milestones.

Thanks, awesome mentor!

First and foremost, I would like to thank my mentor Mischa Krüger for his constant guidance and support through the tenure of my project.

Thank you for clarifying my trivial issues that were way too trivial. Thank you for clearing my doubts on the design of the classes. Thank you for writing a basic layout for a prototype bear. Thank you for understanding when I was not able to meet certain deadlines. Thank you Mischa for being an awesome mentor.

The Beginning

I was first introduced to coala in HackerEarth IndiaHacks Open Source Hackathon. I wanted to participate in it, so I took a look at the list of projects and saw coala. I jumped on their gitter channel and said hi. Lasse hit me back instantly, introduced me to the project, asked me to choose any newcomer issue, and my first patch got accepted in no time.

As the hackathon came to an end, it was time for organisations to start thinking about Google Summer of Code. By then, I had been taking part in regular discussions, and code reviews, Lasse asked me if I’d like to do a GSoC:

I slowly pivoted to choosing language independent documentation extraction as my GSoC project as I found it having greater depth than my other choices.

I feel privileged to be contributing to coala. The project itself is awesome in its entirety. I have contributed to my fair share of open source projects and I have never found any other project that is so organized and newcomer friendly. How coala is awesome should be itself another post.

About my project

Now to my project. As stated repeatedly in my past posts, my project was to build a language independent documentation extraction and parsing library, and use it to develop bears (static analyzing routines.)

How it all fits together

Most of the documentation extraction routines were already written by my mentor. Except a couple of tiny bugs, it worked pretty well. The documentation extraction API was responsible for extracting the documentation given the language, docstyle and markers and return a DocumentationComment object.

The DocumentationComment class defines one documentation comment along with its language, docstyle, markers, indentation and range.

My first task was to write a language independent parsing routine that would extract metadata out of a documentation i.e. description, parameter and return information. This resides inside the DocumentationComment class.

The point of this parsing library is to allow bear developers manipulate metadata without worrying about destroying the format.

I then had to make sure that I had support for the most popular languages. I used the unofficial coalang specification to define keywords and symbols that are used in different documentation comments. They are being loaded along with the docstyle.

Although I do not use the coalang stuff yet and still pass keywords and symbols manually, it will be used in future.

Lastly, I had to implement a function to assemble a parsed documentation into a documentation comment.

I separated this functionality into two functions:

  • The first function would take in a list of parsed documentation comment metadata and construct a DocumentationComment object from that. The object would contain the assembled documentation comment and its other characteristics. Note that this just assembles the inside of the documentation comment, not accounting for the indentation and markers.

  • The second function takes this DocumentationComment object and assembles it into a documentation comment, as it should be, taking account of the indentation and the markers.

Difficulties faced

  • The first difficulty I faced was the design of the parsing module itself. With the help of my mentor, I was able to sort that out. We decided on using namedtuples for each of the metadata:
Parameter = namedtuple('Parameter', 'name, desc')
ReturnValue = namedtuple('ReturnValue', 'desc')
Description = namedtuple('Description', 'desc')
  • If I wanted to make the library completely language independent, most settings would have to be configurable to the end user. Initially I hardcoded the keywords and symbols that I used, but later the coalang specification was used to define the settings. They are yet to be used in the library.

  • While trying to use the above mentioned settings, I realized that the settings extraction didn’t work for trailing spaces. Since I had to have settings with trailing whitespace, I had to fix the extraction in the LineParser class.

What has been done till now

coala

56e1802 DocumentationComment: Add language, docstyle param
72b6c9c DocumentationComment: Add indent param
bc4d7d0 DocumentationComment: Parse python docstrings
337b7c1 DocumentationComment: Parse python doxygen docs
99fa059 DocumentationCommentTest: Refactor
fc2e3bf DocumentationComment: Add JavaDoc parsing
12ede4f ConsoleInteraction: Fix empty line tab display
07135f5 DocumentationExtraction: Fix newline parsing
5df5932 DocumentationComment: Fix python parsing
f731ee4 DocumentationComment: Remove redundant code
e442dce TestUtils: Create load_testdata for loading docs
7de9aed LineParser: Fix stripping for escaped whitespace
31b0410 DocstyleDefinition: Add metadata param
edc67aa DocumentationExtraction: Conform to pep8
3a78aa9 DocumentationComment: Use DocstyleDefinition
dc35a0a DocumentationComment: Add from_metadata()
78ff315 DocumentationComment: Add assemble()
3c239d7 setup: Package coalang files

What lies ahead

The API still has a long way to go. A lot of things can be added/improved:

  • Maybe the use of namedtuples is not that efficient. I think classes should be used and subclassed from these namedtuples. This will allow the API to be way more flexible than it currently is, and also retaining the advantages with using namedtuple.

  • A cornercase in assembling #2645

  • Range is not being calculated correctly. #2646

  • The API is not using the coalang symbols/keywords. #2629

  • A lot of things are just assumed from the documentation while parsing. Related: #2143

  • Trivial: #2617, #2616

  • A lot of documentation related bears can be developed from this API.

It has been an awesome 3 months and an even more awesome 7 months of contributing to coala. That’s it folks!

Other projects.

Also, I want to talk about the projects of other students:

  • @hypothesist did an awesome job on coala-quickstart. The time saved in using coala-quickstart vs. writing your own .coafile is huge and this will lead to more projects using coala. He has also worked on caching files to speed up coala.

  • @tushar-rishav built coala-html! Its a web app for showing your coala results. He has also been working on a new website for coala.

  • @mr-karan did some cool documentation for the bears and implemented syntax highlighting in the terminal.

  • @Adrianzatreanu worked on the Requirements API.

  • @Redridge’s work on External Bears will help you write bears in your favourite programming language.

  • @abhsag24 worked on the coalang specification. We can finally integrate language independent bears seamlessly!

  • Thanks to @arafsheikh, you can now use coala in Eclipse.