hapConstructor .rgen Parameter Descriptions

This is an XML file and it uses a DTD file, called ge-rgen.dtd to describe all of the data for the analysis. All elements in this file starts with “ge:”. This parameter file has a root element rgen, and a number of sub-elements and attributes. It can be described in two parts. The first part of the file is for setting up analysis parameters and the second part of the file defines the inheritance models to be analyzed. See general .rgen parameter file description for all other attribute descriptions.

  • Analysis Parameters (First Part)

The following table describes all of the required attributes and their values for root element when using hapMC rgen. All attribute values should be enclosed in “ “.

|Attribute|Att Value|Description| |———|———|———–| |rseed|number|Random number generator seed value. Specify rseed=”random” to have program randomly generate a seed value.| |nsims|number|Number of simulations| |top|classname|Use HapMCTopPhase| |drop|classname|Use HapMCDropPhase| |report|classname||

Report options; default is standard report(rgen_filename.report) with full tables and detail output. Specify report=”summary” for an Ascii space-delimited file (rgen_filename.summary) of results including seed value, specified statistics, corresponding p-values, and 95% confidence intervals for odds ratios for each data file followed by meta statistics, if requested. Specify report=”both” to generate standard and summary reports.

The following table describes the sub-element locus and its attributes and values.

|Attribute|Att Value|Description| |———|———|———–| |id|number|The locus id number in the data file| |marker|name|Allows user to attach a marker name to the locus id| |dist|number|Allows user to enter a recombination fraction or a distance between a marker and the proceeding marker. If the dist value is ≤0.5, the value is assumed to be a recombination fraction. If the dist value is >0.5, then the distance between the marker and the proceeding marker is assumed to be in cM|

The following table describes the sub-element datafile and its attributes and values.

|Attribute|Att Value|Description| |———|———|———–| |studyname|name|Allows user to attach a study name to the genotype data file.| |genotypedata|name|The directory path and genotype data file name for analysis. Specify each genotype data file with a separate datafile statement.|

The following table describes the sub-element param and its attributes and values.

|Attribute|Att Value|Variable|Description| |———|———|——–|———–| |name|ccstat#|classname|Statistical programs. You can run multiple statistics on the same set of data. Each statistic should have a different ccstat#| |name|metastat#|classname|Meta statisitcs for multiple study data files. Each meta statistic should have a different metastat#| |name|dumper|class name|The dumper class for dumping simulated data.| |name|top-sample|all/founder|Method for calculating allele frequency for assignment to the pedigree founders for simulation. Two options: all, calculates allele frequencies based on all genotyped members in the pedigree data file, or founder, calculates allele frequencies on genotyped founders only. We recommend the all option if there are a large number of pedigrees and the number of genotyped founders in the resource is limited| |name|tabletype|pseudo/original|This specifies how the cases and controls are tallied for the analysis. If pseudo is used, then all possible pseudocontrols that can be formed from case parents with sufficient genotype data are used as controls. If original is used then all cases and controls with sufficient genotype data are used explicitly| |name|hap-partition|n,0|A list of two integer values. The first value designates the number of markers in a partition to phase. The second value indicates the overlap for adjacent partitions. The default is to have one partition with all the markers input in to the analysis (n)| |name|pedphase_downcode_thresh|1e-6,1e-10,10|Options for controlling the haplotype downcoding during the haplotype phasing process. The first value indicates the threshold to keep or downcode a haplotype after phasing a partition. Once a haplotype is downcoded it will not be considered as a viable haplotype to be ligated with another partition as it is ligating two partitions together. This reduces the state space of the phasing algorithm, but it can also result in a potential error in removing a haplotype that is the only possible configuration. The next two values try to reduce the possibility of this error by providing a buffer of haplotypes that have estimated frequencies below the threshold value but above the second value. The third value establishes how many haplotypes to keep in the buffer| |name|pedphase_logconfigs|1.0|Option for writing the pedigree phase configurations to file with the haplotype estimates. Do not put this parameter in the .rgen file at all if you do not want to write the configurations to file. Using 1.0 only outputs the MLE configurations for each pedigree, while any value < 1.0 will output phase configurations that are greater than this value. The output file will be named ped_configs.log|

List of available statistical programs and their class names

|Statistic|Class Name| |———|———-| |Chi Squared|ChiSquared| |Chi Squared Trend|ChiSquaredTrend| |Odds Ratio ( no Confidence Intervals )|OddsRatios| |Odds Ratio with Confidence Intervals|OddsRatiosWithCI| |CMH Chi Squared (meta)|CMHChiSquared| |CMH Chi Squared Trend (meta)|CMHChiSqTrend| |Meta Odds Ratio (no Confidence Intervals)|MetaOddsRatios| |Meta Odds Ratio with Confidence Intervals|MetaOddsRatiosWithCI| |Trio TDT|TrioTDT| |Sib TDT|SibTDT| |Combined TDT|CombTDT| |Quantitative (difference in means test and ANOVA)|Quantitative| |Hardy Weinberg Equilibrium|HWE| |Q Test Odds Ratio Statistic|QTestOR|

  • Subset Analyses (Second Part)

The second part of the .rgen parameter file defines the subset analyses and the models to be analyzed. Users may enter markers to be tested separately (i.e., a single locus at a time approach, where each marker is assumed to be in linkage equilibrium with other markers), as well as testing markers jointly in a composite genotype or haplotype analysis.

cctable has a sub-element col, or column definition. Within the col, the user can optionally assign a weight, wt, to a particular column. Thus, wt is an attribute of col and the value of wt is defined to be a number . The col has a further sub-element g, or allele group. The g has a further sub-element a, or allele definition. The a defines the genetic pattern to be tested in PedGenie at a single locus. Each a corresponds to a locus defined in the sub-element locus. All of the a’s are grouped together into a single g, the g’s are grouped together into a single col, and optionally weighted, wt. If more than one group, g, is in the col, an “or” regular expression will apply to all of the groups for testing in the column, col.

The following table describles the element cctable, its optional attributes and values.

|Attribute|Att Value|Description| |———|———|———–| |loci|number(s)|Allows user to specify the locus, or loci, or a loci range for a subset analysis based on the locus id number. Default is all loci. For specifying loci range, enter the begining locus id, separated by a “-“, and folllow by the ending locus id.| |stats|number(s)|Allows user to define which statistics to run for a particular subset analysis. The stats number is selected from the list of ccstat#’s. Default is all ccstat.| |metas|number(s)|Allows user to define which meta statistics to run for a particular subset analysis. The meta number is selected from the list of metastat#’s. Default is all metastat.| |model|text|Allows user to define a model for a subset analysis. Model name will be printed in the report for a particular analysis.| |type|text|Allows user to specify the type of analyze, Genotype or Allele for this subset of data, default value is “Genotype”. If user specified type=”Allele”, a single allele code should be entered as the variable for the sub-element a, and each a corresponds to a locus. Default is type=”Genotype”.|

** Single locus at a time analysis approach

Various modes of inheritance may be modeled by weighting genotypes in a particular fashion. For a biallelic marker, a dominant (0,1,1), a recessive (0,0,1), and an additive mode of inheritance may be analyzed by simply weighting the genotype data as follows:

|Model|Wt = 0|Wt = 1|Wt = 2| |—–|——|——|——| |Dominant|(1/1)|(1/2), (2/1), or (2/2)|| |Recessive|(1/1), (1/2), or (2/1)|(2/2)|| |Additive|(1/1)|(1/2) or (2/1)|(2/2)|

The weights may be modified to be any integer value. For programming purposes, a (1/.) indicates a genotype of 1 and any other value. Thus for this biallelic model, the code (1/.) will pull (1/1) and (1/2) genotype data. Care must be taken to ensure that this file has no errors. Please see the SingleLocus.rgen for the format of this file.

Mulitallelic Markers The XML code for this file is flexible to allow any combination of markers or grouping of markers. For multiallelic markers, weights for a particular genotype are again used to indicate which group is the reference group and which is the comparison group. For example, given a locus that is multiallelic (Alleles 1, 2, and 3), a single allele (Allele 3) may be compared against all other alleles under a dominant mode of inheritance as follows:

|Model|Wt = 0|Wt = 1| |—–|——|——| |ant|(1/1), (1/2), (2/1), or (2/2)|(3/.), (./3)|

** Composite Genotype and Hapotype Analyses

Analysis of composite genotype and haplotype data are similar to the single locus at a time approach with a few exceptions. For both composite genotype and haplotype tests, haplotypes are dropped from the founders rather than alleles. The method HapFreqTopSim is entered as the Mendelian gene drop method (see above under top). The haplotype frequencies are entered into PedGenie as a separate file. PedGenie will look for a file in the same directory as the pedigree file with the same name as the pedigree file but with the extension .hap instead of .dat. Hence, good estimates of haplotype frequencies for both the composite genotype and haplotype analyses are recommended.

For composite genotype and haplotype analyses, linkage disequilibrium between markers should be taken into account. Under the sub-element locus, dist values indicating LD (i.e., <0.5) should be listed.

Composite Genotype: Composite genotype tests allow a user to enter multiple inheritance models for multiple loci. For example, one can test a model that requires a dominant inheritance at one SNP locus (i.e., 1/2, 2/1 or 2/2 vs. 1/1) and a recessive mode of inheritance at another locus (i.e., 2/2 vs. 1/2, 2/1 or 1/1). Weights are again used to indicate the groupings. See the PedGenieCompGenotype.rgen for examples of composite genotype tests. The various statistical tests that can be performed by PedGenie may be selected as desired to analyze the results. The advantage of using a composite genotype test is that phase information for the observed data is not required as individual genotypes are being compared rather than haplotypes. However, haplotypes are dropped from pedigree founders and LD between the markers is taken into account for the simulated data. Thus haplotype information is utilized for creation of the empirical null distribution, but statistical comparisons are made using unphased genotype data.

Haplotype Tests: For haplotype tests, phase information is required for the observed data and haplotypes are dropped from the pedigree founders to create the empricial null distribution. Thus, assignment of phased genotype data or haplotypes to pedigree members with a high probability in the observed data is essential. Again, LD between markers is taken into account by setting the dist ≤0.5. For testing purposes, a single haplotype may be compared to all other haplotypes or to the most common haplotype. See the PedGenieHaplotype.rgen for examples of haplotype tests.

Home   Example Files