CGG trinucleotide short-tandem repeats in unexplained intellectual disability

Study code

Lead researcher
Dr Dale Annear

Study type
Data only

Institution or company
University of Antwerp (Centre of Medical Genetics), Belgium

Researcher type

Speciality area
Genomics and Rare Diseases, Neurological Disorders

Recruitment Site
N/A (data only)


Repetitive DNA sequences make up a significant portion of the human genome. Short tandem repeats (STRs), defined as DNA motifs ranging typically from 1 to 6 nucleotides in length that are repeated, usually between 5 to 200 times, account for 3% of the total human genome To date, greater than 30 genetic disorders have been identified resulting from STR expansions. One of the most prominent and well documented of these diseases is the fragile X syndrome, a syndromic form of intellectual disability and also autism. There is mounting evidence that the CGG STR may be associated with other neurological disorders. These include Jacobsen syndrome, Baratela-Scott syndrome, and FRAXE, DIP2B and AFF3 associated intellectual disability.

We hypothesise that there are multiple as of yet undisclosed CGG-repeat expansions in the human genome that contribute towards disease. We aim to catalogue all CGG repeats within the human genome using PCR-Free WGS data and recently developed STR genotyping algorithms. In doing so, we also hope to identify the characteristics of the known disease-causing CGG-repeats. We then aim to use this data to identify repeats which are also likely to be disease-causing based on their repeat characteristics and their associated genes. Then through an analysis of a large cohort of WGS data of patients with unexplained intellectual disability to identify novel repeat expansions and novel CGG repeat-causing disorders.


Annear DJ, Vandeweyer G, Sanchis-Juan A, Raymond FL, Kooy RF (2022). 'Non-Mendelian inheritance patterns and extreme deviation rates of CGG repeats in autism.' Genome Res. 32(11-12):1967-1980. (link)


As expansions of CGG short tandem repeats (STRs) are established as the genetic etiology of many neurodevelopmental disorders, we aimed to elucidate the inheritance patterns and role of CGG STRs in autism-spectrum disorder (ASD). By genotyping 6063 CGG STR loci in a large cohort of trios and quads with an ASD-affected proband, we determined an unprecedented rate of CGG repeat length deviation across a single generation. Although the concept of repeat length being linked to deviation rate was solidified, we show how shorter STRs display greater degrees of size variation. We observed that CGG STRs did not segregate by Mendelian principles but with a bias against longer repeats, which appeared to magnify as repeat length increased. Through logistic regression, we identified 19 genes that displayed significantly higher rates and degrees of CGG STR expansion within the ASD-affected probands (P < 1 × 10−5). This study not only highlights novel repeat expansions that may play a role in ASD but also reinforces the hypothesis that CGG STRs are specifically linked to human cognition.

Annear et al. 2022


Annear, D.J., Vandeweyer, G., Elinck, E. et al. (2021). 'Abundancy of polymorphic CGG repeats in the human genome suggest a broad involvement in neurological disease.' Sci Rep 11, 2515 (link)


Expanded CGG-repeats have been linked to neurodevelopmental and neurodegenerative disorders, including the fragile X syndrome and fragile X-associated tremor/ataxia syndrome (FXTAS). We hypothesized that as of yet uncharacterised CGG-repeat expansions within the genome contribute to human disease. To catalogue the CGG-repeats, 544 human whole genomes were analyzed. In total, 6101 unique CGG-repeats were detected of which more than 93% were highly variable in repeat length. Repeats with a median size of 12 repeat units or more were always polymorphic but shorter repeats were often polymorphic, suggesting a potential intergenerational instability of the CGG region even for repeats units with a median length of four or less. 410 of the CGG repeats were associated with known neurodevelopmental disease genes or with strong candidate genes. Based on their frequency and genomic location, CGG repeats may thus be a currently overlooked cause of human disease.

Annear et al 2021