My Open Source Adventure

2024-12-05

Contributing to open source projects doesn't even feel like work - it feels real, and it's been amazing.

My First Hacktoberfest!

I signed up for Hacktoberfest, a month-long celebration of all things open-source!

During Hacktoberfest, software developers worldwide contribute to open-source projects, particularly to issues labeled with hacktoberfest. The goal is to have 4 approved pull requests by the end of the month, which means one pull request per week.

For the first pull request, I looked for any issue labeled good first issue to get my foot in the open-source world.

The challenge wasn't the issue itself but rather finding it. It seems many repository owners take advantage of Hacktoberfest and its labels by creating fake good first issues and labeling them with hacktoberfest, even though they are not good first issues at all and the repository was freshly created!

However, after experimenting a little with GitHub's advanced search feature, I found a good one!

I chose this issue. While it involved little work, it helped me conquer my fear of forks! It was a great way to begin my Hacktoberfest journey. The issue was straightforward: adding a new list item to the README.md. The time it took to fork the repository and clone it was longer than the time it took to complete the issue! However, the process of forking, communicating with the repository owner, and creating my first open-source pull request from my forked branch made this a good first issue indeed!

Contributing to Classmates' Projects

I embarked on a contribution journey where I contributed to my peers' projects and reviewed the pull requests (PRs) of others who were contributing to my own projects.

This experience was not only exciting because I got to explore other people's projects, but also because it required me to read and understand their code deeply, figuring out exactly where to apply changes to implement new features.

All of the projects are focused on using LLM (Large Language Model) chat completions to enhance various CLI tools, with each person having the freedom to choose the specific functionality of their tool.

The common feature we needed to implement in each other's repositories was support for a -t or --token-usage flag. This flag would display the number of tokens used in the prompt and the tokens generated in the completion response.

My Project

For my project, OptimizeIt, my partner was Nonthachai Plodthong.

The Issue: The issue filed on my repository clearly outlined the feature to be implemented: adding a -t or --token_usage flag to display the token count used by the program. After some discussion between my partner and me, we decided to use -tu instead of -t, as there was already a -t flag that allows the user to set the temperature. Additionally, we chose --token-usage for the long command name, as it more closely aligns with the standard naming conventions than --token_usage.

The Pull Request: Nonthachai Plodthong opened a pull request on my project to merge their changes. We spent several hours communicating back and forth to ensure everything was covered -- all edge cases, comments, updates to the README.md, and more. This was a perfect simulation of a work environment, where the reviewer and the author constantly collaborate to ensure everything meets the required standards.

Examples of requested changes included adhering to coding standards, removing unnecessary try/catch blocks, and fixing a program crash in an edge case.

Ultimately, the PR was successfully merged after addressing over 10 requested changes, ensuring that the new feature was integrated correctly without impacting any of the existing functionality!

The Project I Contributed To

The project I contributed to is genereadme, which belongs to Cleo Buenaventura.

The Issue: The issue I filed on Cleo's repository proposed implementing the same feature mentioned above. I provided a concise and thorough description, explaining what the feature is and why it should be implemented. I also included several examples of how the feature would be used.

The Pull Request: To approach this task, I first needed to understand how the commander package worked. Cleo used this package to parse command-line arguments into options and command arguments, and to display usage errors when needed. Afterward, I analyzed Cleo's code to understand the guidelines he follows, such as variable naming conventions and overall code structure.

I then added the new command-line flags and implemented the logic to accumulate and display the token usage information based on whether these flags were present.

The requested change was to add a shorthand flag -tu that behaves the same as the --token-usage flag. This was necessary because the -t flag was already being used for the temperature setting in the program.

Once the requested change was made, Cleo successfully merged the PR into his repository!

Contributing to C++ Open Source

Following the previous week, where I did my first straightforward open source contribution, this week I wanted to tackle bigger contributions.

For starters, I came across a C/C++ Library that prints the alphabet as ASCII art in different fonts.

However, afterward, I still wanted to do more, so I started looking for a larger project to contribute to. To my surprise, the very popular faker.js library has a C++ version in the works, faker-cxx, which I was very excited to work on!

First Contribution

For my first issue of this week, I was tasked with adding a new letter to the drpepper font in the ascii-art library.

This was a straightforward task because the file structure was very easy to navigate. There was a folder called fonts containing the supported fonts. Before I began the process of adding my new character, I examined how others had added characters to the font.

I simply followed the existing style, added my character, and then submitted my pull request.

However, I found the method of adding new fonts somewhat peculiar. Here's an example:

vs D()
{
    vs character = getCharGrid(4, 5);
    character[0][0] = character[0][4] = character[1][1] = character[1][3] = character[2][1] = character[2][3] = ' ';
    character[0][1] = character[0][2] = character[0][3] = character[3][1] = character[3][2] = character[3][3] = '_';
    character[1][0] = character[2][0] = character[2][2] = character[2][4] = character[3][0] = '|';
    character[1][2] = '.';
    character[1][4] = '\';
    character[3][4] = '/';
    return character;
}

I'm not entirely sure why it was done this way, but it seems to allow for clear modifications if needed in the future.

Second Contribution

For my second issue in the faker-cxx library, I had to add a new faker function to the Date module. This function should mimic its JavaScript counterpart.

At first, I thought, "It's just one function... how hard can it be?" But I was wrong.

I had to read through a lot of code to understand how everything was structured, where things were declared, where they were exported, how tests were conducted, and how to add new ones.

The contributing guidelines emphasize that all additions must be thoroughly tested.

There were many custom types and reusable functions, so I had to familiarize myself with each one before I could start coding.

Once I identified the necessary files and became accustomed to some initially unfamiliar C++20 syntax, like:

void doSomething(const& auto param){}

I realized this is syntactic sugar for templates! This use of auto allows the function's parameter type to be deduced without explicitly specifying it.

After understanding the internal functions (used internally for development but not exposed in the API), I added two functions. One accepts two ISO dates, and the other two timestamps. Luckily, there was already an internal function to generate a random date between two given dates, so I mainly needed to manage parameter conversions and preparations.

Once the functions were ready, I documented them for the API and created and ran the necessary tests. Adding tests was initially confusing, but after examining existing ones, it became straightforward.

However, running the tests proved challenging. I needed to build the entire library using cmake, which usually wouldn't be an issue, but I encountered a persistent error with the fmt dependency not found.

Despite following all the steps (on both Windows and Unix), I couldn't get the cmake build to succeed. After several hours of debugging, I decided to try another build method provided by the project, using bazel, which was much simpler.

After successfully building the project and running the tests, I was delighted to see all tests pass. I then tested my newly added function in a fresh C++ project, and it worked perfectly!

Finally, my pull request was accepted without any requested changes!

It felt really great contributing to a well known library like faker, that I have personally used at my job!

The Hunt for the Perfect Issue

I don't really have anything specific in mind. I might contribute to projects I've already contributed to, or I might choose something new and exciting, but either way, whatever I end up doing is going to be worth it! I'm mainly looking for something fun and engaging. I'm not so fussy about what language or technology is used per se; I just want the idea or the domain to be fun!

In previous contributions, I've contributed tests, bug fixes, features, simple documentation updates, and refactoring. I can't think of anything that I haven't done, therefore, it wouldn't really matter to me what type my contribution is, as long as I feel like it meant something, that I learned something from it, and that it benefited the project!

I am hunting for an issue that is not too easy, is fulfilling to work on, and teaches new things!

Diving into Hugging Face's chat-ui

I decided to contribute once again to Hugging Face's chat-ui!

The issue I created was to add Markdown rendering to user messages that are sent and stored. At the time, user messages were being displayed as unrendered markdown text.

To tackle this, I created an issue detailing my planned approach. I intended to use Marked to conditionally render certain markdown elements into HTML. After diving deep into Marked's documentation, I discovered this was indeed possible - you can simply disable the elements you don't want to render into HTML!

As I started working on the implementation, I noticed something interesting: there was already a MarkdownRenderer component in place. This component was being used to render the LLM's replies from Markdown into HTML. However, it wasn't working properly for my use case.

Strangely, while the Markdown was being converted to HTML, the elements weren't displaying on-screen - they were showing up as strings instead. This behavior was puzzling, as the code appeared correct after multiple reviews. I even started doubting my understanding and created a new empty project with TailwindCSS, copying chunks of HTML and CSS from the chat-ui project to test pieces individually. Everything worked fine in isolation.

After several hours of code review, I finally spotted something unfamiliar in the Markdown Renderer component: a function called escapeHTML was being used to escape HTML, even though DOMPurify was already being used for sanitization right after!

The solution? I deleted this function, and suddenly everything worked perfectly! The markdown was now rendering properly on screen, and I could even edit the response.

The entire process took over 30 hours collectively, and the result was deleting a single line of code! Sometimes the simplest solutions are the hardest to find.

The Achievement

I'm glad to announce that I've successfully made all user messages sent using the WYSIWYG editor compatible with the existing MarkdownRenderer component! This means that once I transitioned all user messages from textarea to use the MarkdownRenderer component, everything worked exactly as expected, and it looks amazing!

Now users can easily write markdown in the input, send it, and view it in their chat history with the LLM. The goal was to make chat-ui behave exactly like claude, and the mission was successful!

The Disappointment

However, there are downsides. My communication with the repo maintainer has ceased, despite my several attempts to contact them. My issues (one, two) and pull request remain open but have received no updates whatsoever.

I've attempted several times to communicate with the maintainer on the issues, but no reply has been given. The maintainer continues to reply to some other issues and pull requests, but not mine.

This has been extremely disappointing, as I put a lot of time and effort into making the requirements work exactly as expected. Seeing my work neglected like this is truly heartbreaking. What's even worse is that I discovered and resolved an issue in my pull request where LLM-sent messages weren't being converted into Markdown properly when using the MarkdownRenderer component -- which was then used by the maintainer in a commit they made to the repo (and this happened after my pull request was up).

I've truly enjoyed working on this issue and this project, but it's extremely disappointing to see such poor communication take place in the open source community. Certainly nothing can be all "flowers and roses," however, this is something that shouldn't happen.

Lessons Learned

These activities underscored the importance of writing clean, well-organized, and accessible code that is easy for others to understand and build upon. I learned firsthand how critical it is to maintain a consistent coding style, adhere to best practices, and ensure that our code is structured in a way that welcomes contributions from a diverse group of developers.

Beyond the technical aspects, this experience highlighted the significance of clear and detailed communication in collaborative environments. Providing comprehensive descriptions in issues and pull requests is essential, not only to convey the purpose and context of a change but also to facilitate a smoother review process. The more context and clarity provided, the easier it is for reviewers to understand the intent, identify potential issues, and suggest meaningful improvements.

Aside from my disappointment with chat-ui, I would say this has been extremely rewarding. I ended up learning what a What You See Is What You Get editor is and how to create one, learned a new framework (svelte), and many other things.

These contributions were incredibly rewarding and exciting. I feel more confident than ever in my ability to contribute to open source projects.

Final Thoughts

Ultimately, this has been a powerful reminder that open source is not just about code -- it's about community, collaboration, and the shared commitment to creating something better together. Each contribution, no matter how small, adds value to the collective effort.

I would certainly recommend working on open source projects to everyone out there — you meet lovely people, learn many new things, and work on amazing projects! Just perhaps, maybe not chat-ui.