The Bonferroni correction is commonly applied to multiple inferential statistical tests and controls the familywise error rate. Benjamini and Hochberg (1995) argue that this procedure is too conservative, and risks Type II error, failure to detect real effects. They propose an alternative procedure, the False Discovery Rate (FDR), which is more powerful, and which controls for the expected proportion of falsely rejected hypotheses. Thus we used FDR for the 144 comparisons across all the comparisons we made in this study, including the comparison with the NEO-FFI, and 0.05 as the critical p-value.

Pearson r correlations (two-tailed) between FTI scores and responses to three variables were carried out. The three variables were: (1) education, (2) political orientation, and (3) the extent to which one regards sex as an essential part of a successful relationship. Education level was coded as (1): Not a high school graduate; (2): High school graduate; (3): Some college; (4): Associate’s degree; (5): Bachelor’s degree; (6): Graduate school; (7): Doctorate. Participants were asked to describe their political orientation and given the options: “Very liberal,” Liberal,” “Conservative,” “Ultra conservative,” “Other.” To measure the degree to which one regards sex as an essential part of a successful relationship, participants rated their level of agreement to the statement, “Sex is an essential part of a successful relationship” by selecting one of four options: “Not at all,” “A little,” “Quite a bit,” “Very much so.”

T-tests were carried out to compare men and women on each dimension, and to compare “religious” and “non-religious.” Participants were classified as “religious” if they specified that they identified with a particular religion. Participants were classified as not religious if they chose the categories “atheist,” “agnostic,” “spiritual but not religious,” or “not religious.”

When t-tests were completed, tests for homogeneity of variance were performed, and tests for unequal variance were used where applicable. The test scores for each of the four scales showed a normal distribution, with a small deviation from normality at the low end of the scores. This was not a concern because t-tests are considered to be robust with respect to the normality assumption, particularly with large samples (Sawilowsky and Blair, 1992).

The odds ratio (OR 0.5 [95% Confidence Interval]), was calculated to estimate effect size in a large population. Pearson r correlations are also an effect size. Other effect sizes (? 2 ) were calculated for raw mean score comparisons. Effect size calculations are important in a study with a large number of participants, to help assess the functional significance of the statistical significance.

Questionnaire scores in the text are reported as mean ± SD and SE of the mean. Both measures of variability alert the reader to the variability in the data for this large sample, and the statistical significance of the relatively small effects. The figures show mean ± SE.

To replicate our basic questionnaire clustering results with a method different from factor analysis, an Eigen analysis on standardized scores was used. Software scripts in the R programming language were used on the open access Galaxy platform (Goecks et al., 2010). A topologic algorithm was used that treats each survey item as an independent attribute (vector) and employs Eigen analysis to identify distinct topologies. Each point in space (see Figure ? Figure4 4 ) demonstrates varied combinations of temperament affinities and disaffinities. Linear regression was used to compare the relative positions of each item in each dimension. To determine the stability and reproducibility of the identified population temperament structure using this method, the same analysis was performed on two independent, randomly sorted subsets of 50,000 responses.