.rgen Parameter File Definition

This is an XML file and it uses a DTD file, called ge-rgen.dtd to describe all of the data for the analysis. All elements in this file starts with “ge:”. This parameter file has a root element rgen, and a number of sub-elements and attributes. It can be described in two parts. The first part of the file is for setting up analysis parameters and the second part of the file defines the inheritance models to be analyzed.

  • Analysis Parameters (First Part)

The following table describes all of the required attributes and their values for root element rgen. All attribute values should be enclosed in “ “.

|Attribute|Att Value|Description| |———|———|———–| |rseed|number|Random number generator seed value. Specify rseed=”random” to have program randomly generate a seed value.| |nsims|number|Number of simulations| |top|classname|The program for generating simulated alleles or haplotypes for all of the top founders. Currently available: AlleleFreqTopSim, HapFreqTopSim, HapMCTopSeparate, HapMCTopTogether, GeneCounterTopSim, IndivWtTopSim and Conditional Gene Drop.| |drop|classname|The program for generating alleles or haplotypes based on the parent’s simulated genetic information. Currently available: DropSim, HapMCDropSeparate, HapMCDropTogether and IndivWtDropSim.| |report|classname|Report options; default is standard report(rgen_filename.report) with full tables and detail output. Specify report=”summary” for an Ascii space-delimited file (rgen_filename.summary) of results including seed value, specified statistics, corresponding p-values, and 95% confidence intervals for odds ratios for each data file followed by meta statistics, if requested. Specify report=”both” to generate standard and summary reports.|

The following table describes the sub-element locus and its attributes and values.

|Attribute|Att Value|Description| |———|———|———–| |id|number|The locus id number in the data file| |marker|name|Allows user to attach a marker name to the locus id| |dist|number|Allows user to enter a recombination fraction or a distance between a marker and the proceeding marker. If the dist value is ≤0.5, the value is assumed to be a recombination fraction. If the dist value is >0.5, then the distance between the marker and the proceeding marker is assumed to be in cM|

The following table describes the sub-element datafile and its attributes and values.

|Attribute|Att Value|Description| |———|———|———–| |studyname|name|Allows user to attach a study name to the genotype data file.| |genotypedata|name|The directory path and genotype data file name for analysis. Specify each genotype data file with a separate datafile statement.| |haplotype|name|The directory path and frequency data file name. This file allows user to specify allele or haplotype frequency. All frequencies should sum to 1.0.| |linkageparameter|name|The directory path and linkage parameter file name for GeneCounterTopSim option only.| |quantitative|name|The directory path and quantitative data file name for Quantitative Statistic only.|

The following table describes the sub-element param and its attributes and values.

|Attribute|Att Value|Variable|Description| |———|———|——–|———–| |name|ccstat#|classname|Statistical programs. You can run multiple statistics on the same set of data. Each statistic should have a different ccstat#| |name|metastat#|classname|Meta statisitcs for multiple study data files. Each meta statistic should have a different metastat#.| |name|covar#|number|The selected Covariate id number in the quantitative datafile| |name|dumper|class name|The dumper class for dumping simulated data. TDTDumper class is used with the QTDT interface. GenoDataDumper class for dumping simulated genotype data, output file has same format as Genie input genotype datafile. IndivDumper class is used to output weights for genotyped individuals from the datafile.| |name|top-sample|all/founder|Method for calculating allele frequency for assignment to the pedigree founders for simulation. Two options: all, calculates allele frequencies based on all genotyped members in the pedigree data file, or founder, calculates allele frequencies on genotyped founders only. We recommend the all option if there are a large number of pedigrees and the number of genotyped founders in the resource is limited.|

** List of available statistical programs and their class names

|Statistic|Class Name| |———|———-| |Chi Squared|ChiSquared| |Chi Squared Trend|ChiSquaredTrend| |Odds Ratio ( no Confidence Intervals )|OddsRatios| |Odds Ratio with Confidence Intervals|OddsRatiosWithCI| |CMH Chi Squared (meta)|CMHChiSquared| |CMH Chi Squared Trend (meta)|CMHChiSqTrend| |Meta Odds Ratio (no Confidence Intervals)|MetaOddsRatios| |Meta Odds Ratio with Confidence Intervals|MetaOddsRatiosWithCI| |Trio TDT|TrioTDT| |Sib TDT|SibTDT| |Combined TDT|CombTDT| |Quantitative (difference in means test and ANOVA)|Quantitative| |Hardy Weinberg Equilibrium|HWE| |Q Test Odds Ratio Statistic|QTestOR |

  • Subset Analyses (Second Part)

The second part of the .rgen parameter file defines the subset analyses and the models to be analyzed. Users may enter markers to be tested separately (i.e., a single locus at a time approach, where each marker is assumed to be in linkage equilibrium with other markers), as well as testing markers jointly in a composite genotype or haplotype analysis.

cctable has a sub-element col, or column definition. Within the col, the user can optionally assign a weight, wt, to a particular column. Thus, wt is an attribute of col and the value of wt is defined to be a number . The col has a further sub-element g, or allele group. The g has a further sub-element a, or allele definition. The a defines the genetic pattern to be tested in PedGenie at a single locus. Each a corresponds to a locus defined in the sub-element locus. All of the a’s are grouped together into a single g, the g’s are grouped together into a single col, and optionally weighted, wt. If more than one group, g, is in the col, an “or” regular expression will apply to all of the groups for testing in the column, col.

The following table describles the element cctable, its optional attributes and values.

|Attribute|Att Value|Description| |———|———|———–| |loci|number(s)|Allows user to specify the locus, or loci, or a loci range for a subset analysis based on the locus id number. Default is all loci. For specifying loci range, enter the begining locus id, separated by a “-“, and folllow by the ending locus id.| |stats|number(s)|Allows user to define which statistics to run for a particular subset analysis. The stats number is selected from the list of ccstat#’s. Default is all ccstat.| |metas|number(s)|Allows user to define which meta statistics to run for a particular subset analysis. The meta number is selected from the list of metastat#’s. Default is all metastat.| |model|text|Allows user to define a model for a subset analysis. Model name will be printed in the report for a particular analysis.| |type|text|Allows user to specify the type of analyze, Genotype or Allele for this subset of data, default value is “Genotype”. If user specified type=”Allele”, a single allele code should be entered as the variable for the sub-element a, and each a corresponds to a locus. Default is type=”Genotype”.|

** Single locus at a time analysis approach

Various modes of inheritance may be modeled by weighting genotypes in a particular fashion. For a biallelic marker, a dominant (0,1,1), a recessive (0,0,1), and an additive mode of inheritance may be analyzed by simply weighting the genotype data as follows:

|Model|Wt = 0|Wt = 1|Wt = 2| |–|–|–|–| |Dominant|(1/1)|(1/2), (2/1), or (2/2)|| |Recessive|(1/1), (1/2), or (2/1)|(2/2)|| |Additive|(1/1)|(1/2) or (2/1)|(2/2)|

The weights may be modified to be any integer value. For programming purposes, a (1/.) indicates a genotype of 1 and any other value. Thus for this biallelic model, the code (1/.) will pull (1/1) and (1/2) genotype data. Care must be taken to ensure that this file has no errors. Please see the SingleLocus.rgen for the format of this file.

Mulitallelic Markers The XML code for this file is flexible to allow any combination of markers or grouping of markers. For multiallelic markers, weights for a particular genotype are again used to indicate which group is the reference group and which is the comparison group. For example, given a locus that is multiallelic (Alleles 1, 2, and 3), a single allele (Allele 3) may be compared against all other alleles under a dominant mode of inheritance as follows:

|Model|Wt = 0|Wt = 1| |–|–|–| |Dominant  |(1/1), (1/2), (2/1), or (2/2)  |(3/.), (./3)  |

** Composite Genotype and Hapotype Analyses

Analysis of composite genotype and haplotype data are similar to the single locus at a time approach with a few exceptions. For both composite genotype and haplotype tests, haplotypes are dropped from the founders rather than alleles. The method HapFreqTopSim is entered as the Mendelian gene drop method (see above under top). The haplotype frequencies are entered into PedGenie as a separate file. PedGenie will look for a file in the same directory as the pedigree file with the same name as the pedigree file but with the extension .hap instead of .dat. Hence, good estimates of haplotype frequencies for both the composite genotype and haplotype analyses are recommended.

For composite genotype and haplotype analyses, linkage disequilibrium between markers should be taken into account. Under the sub-element locus, dist values indicating LD (i.e., <0.5) should be listed.

Composite Genotype: Composite genotype tests allow a user to enter multiple inheritance models for multiple loci. For example, one can test a model that requires a dominant inheritance at one SNP locus (i.e., 1/2, 2/1 or 2/2 vs. 1/1) and a recessive mode of inheritance at another locus (i.e., 2/2 vs. 1/2, 2/1 or 1/1). Weights are again used to indicate the groupings. See the PedGenieCompGenotype.rgen for examples of composite genotype tests. The various statistical tests that can be performed by PedGenie may be selected as desired to analyze the results. The advantage of using a composite genotype test is that phase information for the observed data is not required as individual genotypes are being compared rather than haplotypes. However, haplotypes are dropped from pedigree founders and LD between the markers is taken into account for the simulated data. Thus haplotype information is utilized for creation of the empirical null distribution, but statistical comparisons are made using unphased genotype data.

Haplotype Tests: For haplotype tests, phase information is required for the observed data and haplotypes are dropped from the pedigree founders to create the empricial null distribution. Thus, assignment of phased genotype data or haplotypes to pedigree members with a high probability in the observed data is essential. Again, LD between markers is taken into account by setting the dist ≤0.5. For testing purposes, a single haplotype may be compared to all other haplotypes or to the most common haplotype. See the PedGenieHaplotype.rgen for examples of haplotype tests.

Home   Example Files