Speech:Git

=Using Git on Caesar=

Git is a distributed version control system designed to allow indefinite number of developers to work on a single project, and easily branch from that project if needed. It was originally developed by Linus Torvalds to facilitate Linux kernel development, which has thousands of contributors across the globe; it is now used in many other software development projects in addition to the Linux Kernel.

Due to the increasing number and complexity of scripts developed by us for use in Sphinx; we needed a safe way to manage versions and facilitate development by multiple individuals during a semester, and ease testing of new and improved scripts; while ensuring that there will always be a "stable" source of scripts for use during experiments.

Git will meet our needs:
 * It is fast and effective.
 * The most common tasks aren't difficult to learn.
 * It is widely supported on many architectures.
 * Meaning that users can develop scripts on their Windows machines, while retaining the ability to push and pull updates from the Linux-based Caesar & Co.
 * It requires minimal configuration.
 * Although it can support a Subversion/CSV-like server. Users will only need to have access to the directory representing the "Master" branch. No daemons are required.
 * It is decentralized.
 * The core idea around Git is that developers will "clone" a copy of the source code from a master repository, creating a localized version of that source code.
 * Users will then make their modifications, test it, then once they confirm that it suffers no major flaws, they can 'push' it to the master repository.

=Setting up the Git server.=

Master repository (user.git) setup
The method caesar is set up with, there are two "master" repositories:

The first is a bare repository designed for users to push updates to. The difference between a bare and non-bare repository is minimal: a normal repository has all git-related files contained within a .git directory within the archive; a bare git repository is essentially that .git directory being used as the repository with no directly accessible executables. The reasoning for this is a bit complicated and drawn out; a simple explanation is that it ensures consistency between remote developers and the master repository. If this is still too complicated for you, the simplest explanation is that git will NOT allow remote users to push updates to a non-bare repository, meaning it wouldn't meet our needs. This git repository will be located at /mnt/main/scripts/user.git. For right now, we focused on making this repository available locally. In the future, there are plenty of good scripts which are easily installed and well-supported to make it easy to view the git repository; the one of the best, cgit, can seen running the |Linux kernel source-tree viewer.


 * Setup:
 * 1) Create the folder for which the repository will be contained in.
 * 2) Initialize the bare directory.
 * 3) **Note the --bare option is required! A normal git directory does not allow for remote users to push updates to.
 * 1) **Note the --bare option is required! A normal git directory does not allow for remote users to push updates to.
 * 1) **Note the --bare option is required! A normal git directory does not allow for remote users to push updates to.

That's it for the git repository. Now, its completely bare. We need to add the files we want into it. Normally, we just go into the repository and do ; however, we can't perform such actions on a bare repository. To get by, we need to move the old user directory to user.old/, make it into a normal git repository, then push the updates to the other repository. Moving this directory serves a secondary purpose: it creates a backup of the directory just in case something bad happens.

To do the above, do the following:
 * Instructions:


 * 1) Move /mnt/main/scripts/user to /mnt/main/scripts/user.old.
 * 2) Create the git repository.
 * 3) Since it isn't tracking any files, we need to add the files we wish to track.
 * 4) * git add addQueueUsers.pl convert.pl copySph.pl createTranscript.pl find.pl genFileIDs.csh genPhones.csh genT* lm_create.pl parseDecode.pl ParseTranscript2.pl run_decode.pl setup_SphinxTrain.pl updateDict.pl
 * 5) **There are a few untracked directories and files. These are mostly irrelevant files and directories.
 * 6) We need to "commit" the changes to start version tracking.
 * 7) *When doing this, you will be brought your defined text editor. For caesar, I think it defaults to Vi.
 * 8) *It will contain a list of auto-generated comments which are by default commented out using hashes (#). Uncomment the necessary stuff, save, and quit. Once you exit from the editor, git will "Commit" the changes.
 * 9) **Please note that by not having a commit message, either by deleting everything within the editor, or otherwise not uncommenting out any of the auto-generated lines, git will cancel the commit.
 * 10) Create a new 'origin' remote repository to push updates to:
 * 11) Lastly, push the update:
 * 1) **Please note that by not having a commit message, either by deleting everything within the editor, or otherwise not uncommenting out any of the auto-generated lines, git will cancel the commit.
 * 2) Create a new 'origin' remote repository to push updates to:
 * 3) Lastly, push the update:
 * 1) Lastly, push the update:

Master Executable repository
The second "master" repository is located at /mnt/main/scripts/user. It is essentially an always up-to-date non-bare git repository. Since we cannot easily find scripts and executables within the user.git directory, this directory will contain an easily-accessible and always up-to-date version of the scripts for experiments.


 * Setup Executable repository:
 * 1) Within /mnt/main/scripts/, clone the master repository.

Now, we want this repository to update when somebody pushes a commit to the bare master repository. Git supports running external scripts at specific steps, see |here. Now we want to setup a script within user.git/hooks called "post-receive", which calls another script which will do the commands we want.

The new script which actually does the work is called post-receive-pushUpdate.sh and is located in the same directory. Its contents are as follows:


 * 1) !/usr/bin/tcsh
 * 2) Author: Eric Beikman
 * 3) Date: August 2013
 * 4) Description:
 * 5)       This script is executed by git whenever the master repository is
 * 6)       updated, pulling any changes to the master executable repository at
 * 7)       /mnt/main/scripts/user.

cd /mnt/main/scripts/user/

env -i git pull origin master
 * 1) env -i is needed to execute the command in an empty environment.
 * 2) For some reason git will reset a necessary environment variable when executing these scripts, making it so it doesn't work.

As you can see, its a simple script which simply pulls an update from that git repository's origin.

After that, everything works.

Keep in mind that modifying the files within /mnt/main/scripts/user is discouraged!, clone a copy of the master repository (user.git) THEN commit it! In the future, we should find a way to do this.

=Using Git=

Git can either be super simple, or asinine. The following instructions are the bare minimum you need to know to work.

It is recommended you check out this really helpful |interactive tutorial. Keep in mind that all commands here will work here, but substitute the remote targets (the tutorial uses http links) with  or   if the repository is already created (origin is a link which points to the repository's origin).

Initial setup

 * 1) In your home directory, make sure that there isn't an existing repository called "user".
 * 2) *If so, either get rid of it, rename it, or go into a sub-directory within your home and continue on.
 * 3) Go into that directory and clone the master repository.
 * 4) **DO NOT CLONE /mnt/main/scripts/user!
 * 5) ***You will not be able to push updates to it!
 * 6) **It should return something like:
 * 1) **It should return something like:

You now have a clone of the repository within a directory called "user". It is independent from the master repository, meaning changes within it are isolated from the "stable" code; at least, it is isolated until you commit them to the main repository.

Code commits
In git, committing refers to taking a 'snapshot' of the code at the current time.

By default, git will not automatically search for changed files, rather you have two options:


 * Option 1: Manually define what files to commit using.
 * 1) Use  and manually define each changed file.
 * 2) Use  to commit these changes.


 * Option 2: Specify git to do the searching for you using
 * 1) Use  to find modified files within the repository and update them.
 * 2) *Please note that git will search for files which already are being tracked. It will ignore any new files.

After you issue the  command using the above, you will be brought your defined text editor. For caesar, I think it defaults to Vi. It will contain a list of auto-generated comments which are by default commented out using hashes (#). Uncomment the necessary stuff, save, and quit. Once you exit from the editor, git will "Commit" the changes. Please note that by not having a commit message, either by deleting everything within the editor, or otherwise not uncommenting out any of the auto-generated lines, git will cancel the commit.

I recommend uncommenting only the modified files, and the line which says "Changes to be committed:".

Logs
Git keeps a very descriptive log of all previous commits, along with who committed them, and when. It can be accessed by using

The below is what you should expect to see: commit 241174fdd2ee9bd31044b1a495eb116edf19c452 Author: Eric Beikman  Date:  Thu Aug 1 19:39:46 2013 -0400

Changes to be committed: modified:  pruneDictionary2.pl                        Changed a absolute reference to dictionary2.pl    to a relative one.

commit 1dd691364cce854dfc1144adb0909d2fff791a9f Author: Eric Beikman  Date:  Thu Aug 1 17:31:02 2013 -0400

Comments: Copied over 3 files from /mnt/main/scripts/train/scripts_pl. These are scripts which have been modified recently and would benefit from additional development.It would work better for them to be located here (/mnt/main/scripts/user) as they aren't a part of the base sphinx lines 7-29

SHA1 hashes are used to identify between revisions.

Diff
Say we want to see the difference between the current code and a previous revision. To do so, we use

Using git log, we determine that we want to compare the current code with revision 863515f5d8547e7c81125c82425c2966f0cbc11c. To do so, we enter

We should get something like:

diff --git a/pruneDictionary2.pl b/pruneDictionary2.pl index 2ef0008..4a7ebdc 100755 --- a/pruneDictionary2.pl +++ b/pruneDictionary2.pl @@ -40,7 +40,7 @@ while()

-$sysCmd = "./dictionary2.pl $temp_pruned $dict $output_file"; +$sysCmd = "/mnt/main/scripts/train/scripts_pl/dictionary2.pl $temp_pruned $dict $output_file"; system($sysCmd);
 * 1) This calls the dictionary2 script which will create a new dictionary that only contains the words in the
 * 2) Pruned list.

lines 1-13/13 (END)
 * 1) remove temporary files

The above is like the usual diff output. Each file that has a change will be listed separately, along with the changes.

Reverting
The term "reverting" refers to reverting one or more files back to a previous state.

Say we made a large mistake somewhere and we don't know where it is; we want to revert back to the last working version, which happens to be revision: 863515f5d8547e7c81125c82425c2966f0cbc11c. We use

Pushing
Once you make all the additions and changes on the your local repository and confirm that they have no ill-effects, you can push the updates up to the master repository.

To do this, use

You should see something like this: caesar ejg58/user> git push Counting objects: 5, done. Delta compression using up to 4 threads. Compressing objects: 100% (4/4), done. Writing objects: 100% (4/4), 4.57 KiB, done. Total 4 (delta 1), reused 0 (delta 0) Unpacking objects: 100% (4/4), done. remote: From /mnt/main/scripts/user remote: * branch            master     -> FETCH_HEAD remote: Updating 8335f19..8c78519 remote: Fast-forward remote: clone_exp.pl |  123 ++++++++++++++++++++++++++++++++++++++++ remote: train_02.pl  |  175 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ remote: 2 files changed, 298 insertions(+), 0 deletions(-) remote: create mode 100755 clone_exp.pl remote:  create mode 100755 train_02.pl To /mnt/main/scripts/user.git 8335f19..8c78519 master -> master

The middle of the logs contain script output from the script which updates the master executable repository; in other words: once your push has been saved, it is then pulled by the executable repository, this ultimately means that all updates made are effective immediately.

Pull and Fetch
Pull and fetch are two tools used to update/synchronize a local repository with any commits made since the pull, fetch, or clone. It should be done regularly while working to ensure that you are always updating the most up-to-date version of the code, minimizing the amount of merging you would need to do.


 * Pull:

Pull is probably the easiest way to do it, but it is rather dumb. Meaning that it will blindly update your source code with no input. Its usage is simple:

By default, it will pull from the repository's origin, though you could arguably pull from other repositories.


 * Fetch

Fetch is a lazy "pull", meaning that it will get all updates, but it will not merge your code with the updated code it downloaded. You will need to run  afterwards.

The syntax is almost exactly like git pull: