This lesson is in the early stages of development (Alpha version)

Publishing your data analysis code: Glossary

Key Points

Get it in Git	Publishing analysis code allows others to better understand what you have done, to verify that your analysis does what you claim, and to build on your work.e Use `git init`, `git add`, `git commit`, `git remote add`, and `git push`, as discussed in the Software Carpentry Git lesson Include all the code that you have written to use in this analysis. Leave out e.g. temporary copies, old backup versions, files containing secret or confidential information, and supporting files generated automatically.
Structuring your repository	Put code into a specific subdirectory (or several, if there is lots of code). Keep important metadata, such as a license, citation information, and README in the root of the repository. Keep other ancillary data, documentation, etc. in separate subdirectories. Use `git mv` to move files and let Git know that they have moved.
Documentation and automation	Use a README or similar file to explain the essential steps of running your analysis. Use shell script or similar to automate the steps you would take to perform your analysis. Use command-line arguments or other parameters instead of having to manually edit lines of code.
Jupyter Notebooks and automation	Jupyter Notebook can be run in a non-linear order, and store their output as well as their input Remove all output from notebooks before committing to a pure code repository. Test notebooks from a fresh kernel, or run them from the command line with `jupyter nbconvert`. Use environment variables to pass arguments into a notebook.
Data	Use `curl` to download data automatically. For small amounts of data, and code that is specifically to analyse only those data, data and code can be stored and published together. For large datasets, or where code is used for multiple different datasets, keep the two separate. Data can be frequently be published, if there are no constraints preventing it. If data are not published, then publishing analysis code becomes less valuable.
Reproducible software environments	Different versions of packages can give different numerical results. Documenting the environment ensures others can get the same results from your work as you do. `pip freeze` and `conda env export` give plain text files defining the packages installed in an environment, and that can be used to recreate it.
Verifying your analysis	Use `pip install -r` and `conda env create` to create a new environment from a definition. Running your full analysis end-to-end in a clean environment will highlight most problems. Binder services (e.g. MyBinder) will create an environment in the cloud based on your definiiton.
Publishing in open science repositories	Source code hosts like GitHub are designed for active development of software. Open science repositories are to keep versions of record for the longer term. Services like Zenodo allow you to package a particular commit, archive it, and give it a permanent identifier. Many services like Zenodo will automatically give you a DOI for any dataset, including repositories pulled from GitHub. DOIs for code repositories can be cited in journal articles the same way as any other publication.

Glossary

FIXME