Large-scale source code archival, publishing, and indexing with Debsources (Stefano Zacchiroli)

Pour la prochaine séance du séminaire « Codes sources », nous aurons l'honneur de recevoir Stefano Zacchiroli.

Lors de cette séance, Stefano Zacchiroli présentera :

« Large-scale source code archival, publishing, and indexing with Debsources »

Résumé :

The source code commons is a thing. Largely thanks to more than 30 years of Free and Open Source Software (FOSS) development we now dispose of hundreds of billion lines of publicly available source code.

When properly systematized, by providing uniform ways to access both the actual source code and related metadata, such an immense corpus is an invaluable resource for scholars in several fields of software studies, e.g., evolution, history, maintenance, and quality assurance. If properly archived, the same corpus can increase the chances that humanity will not lose a strategic part of its digital heritage.

We present our experiences in this field with Debsources ( a medium-sized (~3.5 billion lines of code), curated corpus of FOSS source code, extracted from more than 20 years of history of the Debian distribution. We discuss the need and challenges of bringing this experiment to the next level.

