The protein-coding regions of eukaryotic genes are fragmented into exons that, like the genes within which they are situated, can be duplicated, deleted, or reorganized. Cataloging and organizing within-gene exon similarities is necessary for a systematic study of exon evolution and its consequences. To facilitate the study of exon duplications, we present Exonize, a computational tool that identifies and classifies coding exon duplications in annotated genomes. Exonize implements a graph-based framework to handle clusters of related exons resulting from repeated rounds of exon duplication. The interdependence between duplicated exons or groups of exons across transcripts is classified. By identifying duplication events between exonic and intronic regions, Exonize can detect unannotated or degenerate exons. To aid in data parsing and downstream analysis, the Python module exonize_analysis
is provided. The application of Exonize to 20 eukaryote genomes identifies full-exon duplications in at least 4% of vertebrate genes, with more than 900 human genes having a full-exon duplication event.