Domain-specific LLM based on actual Jenkins usage using ci.jenkins.io data

Project goal: To develop a web app using an existing open-source LLM model with Jenkins usage data collected for domain-specific Jenkins knowledge to be fine-tuned

Skills to study/improve: Python, React.js, LLM, AI/ML, Jenkins, Ollama, LangChain, UI, Infra statistics, Data Analytics

Details

Background

This full-stack project focuses on a proof-of-concept (PoC) idea to fine-tune an existing open-source LLM model (such as Llama 2) with domain-specific Jenkins data to be compiled, wrangled, and processed by the contributor as a part of an AI-driven application, to develop a minimalistic UI for the user to interact with the LLM as a complete end-to-end product. The main source of raw data will be the publicly available ci.jenkins.io datasets. The contributor will get to be involved in every step of the application development process, from data collection, wrangling, and processing to fine-tuning the model and developing the UI. Unlike the very similar GSoC 2024 LLM counterpart, this project is very research-focused and data-driven, and will be a lot more difficult to achieve a successful outcome.

Summary

  • Strategy: Test failure analysis based on the test data from ci.jenkins.io

    • Help the user with failure diagnosis

      • Is the failure due to infra?

      • Is the failure due to a code change?

      • Is the failure due to an unreliable test (“flaky test”)?

    • Sample repositories and use cases

      • Jenkins core

      • Jenkins acceptance test harness

      • Jenkins plugin BOM

Project Size

175 - 350 hours

Project Difficulty

Intermediate to Advanced

Quick Start

Become familiarized with the flow of the GSoC 2024 LLM project.

Potential Mentors

Project Links

Organization Links

> Go back to other GSoC 2025 project ideas