hapConstructor Home Page

Single gene hapConstructor (hapConstructor 1.0)

Haplotypes carry important information that can direct investigators towards underlying susceptibility variants, and hence multiple tagging-SNPs are usually studied in candidate gene association studies. However, it is often unknown which SNPs should be included together in haplotype analyses, or how the tests should be constructed for maximum power. We have developed a program, hapConstructor, which automatically builds multi-locus SNP sets to test for association in a case-control framework. The multi-SNP sets considered at any step in the process need not be contiguous; the SNP sets are built based on the significance of the preceding steps’ SNP subsets. An important feature is that missing data imputation is carried out based on the full data, for maximal information and consistency in the building process. HapConstructor is implemented in a Monte Carlo framework that provides appropriate significance testing, that can account for the construction process and naturally extends to related individuals. Further, empirical false discovery rate thresholds are also available. HapConstructor is a useful tool for exploring multi-locus associations in candidate genes and regions in a valid and structured process.

Command line execution

java -jar hapConstructor1.0.jar hapConstructor rgenfile[.rgen]

.rgen parameter file

Detailed description of .rgen XML file

Example files

All hapConstructor Example Files

Python script to post-process .build files

Download script and run script in folder with .build files. The script generates three text files, increase_risk.out, decrease_risk.out, and other.out. The two risk files contain all the test results in the .build files that correspond to their respective direction of risk. The chi-square test results are place in the other.out file. The .out files contain the markers across the top of the file followed by columns for the test model, test statistic, columns compared (for Odds ratios), the observed statistic value, and the empirical p-value. Each of the lines contains a test result and the haplotype or SNP that was used as the exposure variable for the test.

Download here

Zip file containing .jar file, example files, and post-processing script.

Gene-gene hapConstructor (hapConstructor 2.0.1)

An extension of hapConstructor1.0 considers multi-locus data for two genes/regions simultaneously. Our extension allows construction of multi-locus SNP sets at both genes, and also provides tests to identify joint gene-gene effects and interactions between single variants or haplotype combinations.

Command line execution

java -jar hapConstructor2.0.1.jar hapConstructor rgenfile[.rgen]

.rgen parameter file

Detailed description of .rgen XML file

Example files

hapConstructor2.0 Example Files

Python script to post-process .build files

Download script and run script in folder with .build files. The script generates three text files, increase_risk.out, decrease_risk.out, and other.out. The two risk files contain all the test results in the .build files that correspond to their respective direction of risk. The chi-square test results are place in the other.out file. The .out files contain the markers across the top of the file followed by columns for the test model, test statistic, columns compared (for Odds ratios), the observed statistic value, and the empirical p-value. Each of the lines contains a test result and the haplotype or SNP that was used as the exposure variable for the test.

Download here

Zip file containing .jar file, example files, and post-processing script.

Click here for additional algorithm details.

Instructions to run hapConstructor

1. Java 1.6 JRE must be installed on your system (Download here)   - To check if Java is installed go to a command prompt and type java.   - To check the Java version installed go to a command prompt and type java -version.

2. Download hapConstructor jar file (see above).

3. Create .rgen and .dat files.   - Note that the .rgen and .dat files can be placed anywhere on your system, but precaution needs to be taken when specifying where they are located when you execute the program.   - In the simplest situation, the .rgen and .dat are in the same directory as the .jar file. In this scenario, the .rgen would have specify the .dat file as being in the same directory (i.e. genotypedata=”GenotypeData.dat”).   - The .dat file also should not contain any extra lines at the bottom. This will cause an error while the program is reading the data.

4. In a command terminal go to the directory with .jar file and enter command java -jar Genie.jar hapConstructor <.rgen file name> .   - If the .rgen file is in another directory, then it is necessary to specify that location in the command line.

Additional Notes

Maximum number of locus  - Due to memory constraints, hapConstructor can perform the maximum of 63 locus.

Running out of heap space?  - For larger datasets or use of large number of Monte Carlo simulations (i.e. 80,000 - 100,000) the default Java Virtual Machine (JVM) memory allocation may not be sufficient. In this case, more memory for the JVM can be allocated provided the system being used has the memory by using -Xms and -Xmx when executing the program. Example: java -Xms1024m -Xmx1536m -jar Genie.jar hapConstructor <.rgen file name>. The example will allocate a maximum of 1.5 Gb and a minimum of 1 Gb of memory for the JVM to use while executing the program. The maximum amount of memory allocation for 32 bit systems is 2 Gb.

A note on output files  - When running hapConstructor there are a number of output files generated. The build files, denoted with .build, are generated after each step is complete, and are named after the study name specified in the .rgen file. The build files only contain the test results that passed the specified threshold for that step, while the all_obs.final file contains all the tests results from the tests conducted during the build process. The all_obs.final file is continually being written to as the program runs. Moving this file during execution could create an error. It is also important to note that on subsequent runs none of the files generated previously will be overwritten, but rather output will simply append to the end of what already exists in those files. This means that if two different analyses are performed in the same folder, the all_obs.final file generated from the first run will now contain results from both analyses. The same occurs with the all_sims.final file.

Home   PedGenie