HomeWinBuzzer NewsGitHub Discusses AI Algorithm Finding Issues in Open Repositories

GitHub Discusses AI Algorithm Finding Issues in Open Repositories

GitHub has recently updated the AI algorithm for the good first issues feature that now surfaces issues across 70% of recommended repositories.


-owned is now fully behind AI (artificial intelligence) as a driver for finding issues in projects. Called the “good first issues” feature, it makes it easier to find problem in large source projects that are typically very long and have multiple issues.

Good first issues was brought to GitHub on the web back in May 2019. It was designed to highlight recommendations on issues applied by project managers. The company says it updated the tool in December with more powerful AI algorithms.

Now, the good first issues features surfaces issues in 70% of recommended repositories. In a blog post, GitHub senior machine learning engineer Tiferet Gazit, the company developed a list of 300 labels across popular open projects.

However, this list of label names only surfaced problems from 40% of recommended repositories. Furthermore, those who maintain those open projects were still having to label problems manually. With the new AI recommendation algorithm functions mostly automatically without input from project maintainers.

“here is a tradeoff between coverage and accuracy, which is the typical precision and recall tradeoff found in any ML product. To prevent the feed from being swamped with false positive detections, we aim for extremely high precision at the cost of recall. This is necessary because only a tiny minority of all issues are good first issues.”

Weighted Measurement

In use, the new AI tool on GitHub can predict the probability over the requirement for a recommendation. This probability is weighted with a confidence score equal to the probability.

“To surface issue recommendations given a trained classifier, we run inference on all qualifying open issues from non-archived public repositories. Each issue for which the classifier predicts a probability above the required threshold is slated for recommendation, with a confidence score equal to its predicted probability.”

These issues are given a confidence score based on the relevance of their labels, with synonyms of “good first issue” given higher confidence than synonyms of “documentation”.”

Luke Jones
Luke Jones
Luke has been writing about all things tech for more than five years. He is following Microsoft closely to bring you the latest news about Windows, Office, Azure, Skype, HoloLens and all the rest of their products.

Recent News