Repetitive DNA sequences make up a significant portion of the human genome. Short tandem repeats (STRs), defined as DNA motifs ranging typically from 1 to 6 nucleotides in length that are repeated, usually between 5 to 200 times, account for 3% of the total human genome To date, greater than 30 genetic disorders have been identified resulting from STR expansions. One of the most prominent and well documented of these diseases is the fragile X syndrome, a syndromic form of intellectual disability and also autism. There is mounting evidence that the CGG STR may be associated with other neurological disorders. These include Jacobsen syndrome, Baratela-Scott syndrome, and FRAXE, DIP2B and AFF3 associated intellectual disability.
We hypothesise that there are multiple as of yet undisclosed CGG-repeat expansions in the human genome that contribute towards disease. We aim to catalogue all CGG repeats within the human genome using PCR-Free WGS data and recently developed STR genotyping algorithms. In doing so, we also hope to identify the characteristics of the known disease-causing CGG-repeats. We then aim to use this data to identify repeats which are also likely to be disease-causing based on their repeat characteristics and their associated genes. Then through an analysis of a large cohort of WGS data of patients with unexplained intellectual disability to identify novel repeat expansions and novel CGG repeat-causing disorders.