hapConstructor2.0 .rgen Parameter Descriptions

This is an XML file and it uses a DTD file, called ge-rgen.dtd to describe all of the data for the analysis. All elements in this file starts with “ge:”. This parameter file has a root element rgen, and a number of sub-elements and attributes. It can be described in two parts. The first part of the file is for setting up analysis parameters and the second part of the file defines the inheritance models to be analyzed. See general .rgen parameter file description for all other attribute descriptions.

  • Parameters (First Part)

The following table describes all of the required attributes and their values for root element when using hapConstructor2.0. rgen. All attribute values should be enclosed in “ “.

|Attribute|Att Value|Description| |———|———|———–| |rseed |number |Random number generator seed value. Specify rseed=”random” to have program randomly generate a seed value.| |nsims |number |Number of simulations| |top |classname|Use HapMCTopTogether| |drop |classname|Use HapMCDropTogether| |report |classname||

Report options; default is standard report(rgen_filename.report) with full tables and detail output. Specify report=”summary” for an Ascii space-delimited file (rgen_filename.summary) of results including seed value, specified statistics, corresponding p-values, and 95% confidence intervals for odds ratios for each data file followed by meta statistics, if requested. Specify report=”both” to generate standard and summary reports.

The following table describes the sub-element locus and its attributes and values.

|Attribute|Att Value|Description| |———|———|———–| |id |number |The locus id number in the data file| |marker |name |Allows user to attach a marker name to the locus id| |dist |number |Allows user to enter a recombination fraction or a distance between a marker and the proceeding marker. If the dist value is ≤0.5, the value is assumed to be a recombination fraction. If the dist value is >0.5, then the distance between the marker and the proceeding marker is assumed to be in cM| |gene |number |Gene id number specifies which gene the markers are from. This is necessary when analyzing gene-gene associations or interactions|

The following table describes the sub-element datafile and its attributes and values.

|Attribute|Att Value|Description| |———|———|———–| |studyname|name |Allows user to attach a study name to the genotype data file| |genotypedata|name|The directory path and genotype data file name for analysis. Specify each genotype data file with a separate datafile statement|

The following table describes the sub-element param and its attributes and values. |Attribute|Variable |Description| |———|———|———–| |ccstat# |classname|Statistical programs. You can run multiple statistics on the same set of data. Each statistic should have a different ccstat#| |metastat#|classname|Meta statisitcs for multiple study data files. Each meta statistic should have a different metastat#. |top-sample|all/founder|Method for calculating allele frequency for assignment to the pedigree founders for simulation. Two options: all, calculates allele frequencies based on all genotyped members in the pedigree data file, or founder, calculates allele frequencies on genotyped founders only. We recommend the all option if there are a large number of pedigrees and the number of genotyped founders in the resource is limited| |hapc_threshold|0.1, 0.05, 0.005, 0.0005|A single or list of values to specify the threshold for the p-values by which SNP sets move to the next step| |hapc_sigtesting|true/false|Option to use the Monte Carlo framework to establish the significance of the models found from the build process using the observed data. If true, the simulated datasets will go through the same build process as the observed data and run them through the same build process and track the p-values generated from all the runs to establish FDR and empirical p-values. This option is by default turned off| |hapc_backsets|true/false|Option for testing association with SNP backsets. Backsets are the locus subsets in a set that were not tested in the previous step. This option is more exhaustive in the search, and could considerably affect the run time| |hapc_models|HAdd, HRec, HDom, MSpecRed, IntxLD, IntxOR, CG|Option for specifying the models to construct for the haplotypes. See description page for more details about models. HAdd/Rec/Dom = haplotype additive,recessive, dominant; CG = composite genotypes (Dom and Rec combinations), MSpecRed = monotype specific reduction (specific haplotypes compared to the rest), IntxLD = interaction correlation between unlinked variants (must specify InteractionLD statistic), IntxOR = interaction odds ratios (must specify InteractionOddsRatios statistic)| |hapc_check_mostsignificant|true/false|Option for specifying whether the building process will stop once the most significant empirical p-value has been obtained from a test. If it is set to true it will check for the most significant p-value result and stop if found, otherwise it will continue to build. For example, if this option is set to true and 1,000 Monte Carlo simulations are used to establish the empirical p-values for the association tests and a test at the first step obtained a p-value of 0.001, then the build process would not continue to the second step. The default is set to true| |hapc_compositehaplotypes|true/false|Option for specifying if haplotypes are to be built up and tested from composite models, such as composite genotypes or gene-gene tests. For example, if two single markers were tested in a composite manner and their association p-value was beyond the threshold, then the next step could form a haplotype with one of the markers and leave the other marker as a single allele| |caseOnlyIntx|true/false|Option for specifying testing a case only interaction using the IntxLD design. This will test the correlation between two unlinked variants in cases-only|

  • List of available statistical programs and their class names

|Statistic |Class name | |—————|———————| |Chi Squared|ChiSquared| |Chi Squared Trend|ChiSquaredTrend| |Odds Ratio|OddsRatios| |CMH Chi Squared (meta)|CMHChiSquared| |CMH Chi Squared Trend (meta)|CMHChiSqTrend| |Meta Odds Ratio|MetaOddsRatios| |Interaction Odds Ratio|InteractionOddsRatios| |Interaction Linkage Disequlibrium|InteractionLD| |Meta Interaction Odds Ratio|MetaInteractionOR| |Meta Interaction Linkage Disequilibrium|MetaInteractionLD|

  • Subset Analyses (Second Part)

The second part of the .rgen parameter file defines the subset analyses and the models to be analyzed. Users may enter markers to be tested separately (i.e., a single locus at a time approach, where each marker is assumed to be in linkage equilibrium with other markers), as well as testing markers jointly in a composite genotype or haplotype analysis.

The cctable field has a sub-element col, or column definition. Within the col, the user can optionally assign a weight, wt, to a particular column. Thus, wt is an attribute of col and the value of wt is defined to be a number . The col has a further sub-element g, or allele group. The g has a further sub-element a, or allele definition. The a defines the genetic pattern to be tested in PedGenie at a single locus. Each a corresponds to a locus defined in the sub-element locus. All of the a’s are grouped together into a single g, the g’s are grouped together into a single col, and optionally weighted, wt. If more than one group, g, is in the col, an “or” regular expression will apply to all of the groups for testing in the column, col.

The following table describles the element cctable, its optional attributes and values.

|Attribute|Att Value|Description| |———|———|———–| loci|number(s)|Allows user to specify the locus or loci for a subset analysis based on the locus id number. Default is all loci| |stats|number(s)|Allows user to define which statistics to run for a particular subset analysis. The stats number is selected from the list of ccstat#’s. Default is all ccstat| |meta|number(s)|Allows user to define which meta statistics to run for a particular subset analysis. The meta number is selected from the list of metastat#’s. Default is all metastat| |model|text|Allows user to define a model for a subset analysis. Model name will be printed in the report for a particular analysis| |type|text|Allows user to specify the type of analyze, Genotype or Allele for this subset of data, default value is “Genotype”. If user specified type=”Allele”, a single allele code should be entered as the variable for the sub-element a, and each a corresponds to a locus. Default is type=”Genotype”|

  • Single locus at a time analysis approach

HapConstructor begins by considering single locus analyses, and constructing and testing haplotypes based upon the p-values generated. The single locus analyses are constructed as with analyses using PedGenie. One requirement is to use the correct model names for each table built. The model names are: Dom, Rec, Additive, Allele

|Model|Wt = 0|Wt = 1|Wt = 2| |Dominant|(1/1)|(1/2), (2/1), or (2/2)|| |Recessive|(1/1), (1/2), or (2/1)|(2/2)|| |Additive|(1/1)|(1/2) or (2/1)|(2/2)|

The weights may be modified to be any integer value. For programming purposes, a (1/.) indicates a genotype of 1 and any other value. Thus for this biallelic model, the code (1/.) will pull (1/1) and (1/2) genotype data. Care must be taken to ensure that this file has no errors. Please see the SingleLocus.rgen for the format of this file.

Home hapConstructor example Files