Sample selection and sequencing

We selected 44 representative potato accessions, 3 of which are publicly accessible3,13, on the basis of phylogenetic relationships of 432 accessions (PRJNA378971, PRJNA394943 and PRJNA766763; genotype information is available at; Supplementary Fig. 1). To infer the phylogeny of the 432 accessions, reads were mapped to the DM v4 reference genome using BWA (0.7.5a-r405)49, and single-nucleotide polymorphisms (SNPs) were then extracted using SAMtools (v.1.9)50 and BCFtools (v.1.9)49. Fourfold degenerate SNPs with base quality ≥ 40 and mapping quality ≥ 30 were fed into IQ-TREE v.2.0.6 (ref. 51), with parameters ‘-st DNA -m 012345 -B 1000’. In addition, two non-tuber-bearing species from the Etuberosum section PG0019 (S. etuberosum) and PG0009 (S. palustre) were chosen to be used in phylogeny inference (Supplementary Table 1). Sequencing of these 44 potato accessions was performed on the Pacific Biosciences Sequel II platform, in the circular consensus sequencing (CCS) mode, and the two Etuberosum species were sequenced on the Pacific Biosciences Sequel II platform, in the continuous long read (CLR) mode. A total of 15.9–38.1 Gb of HiFi reads was generated using CCS ( for the 41 newly sequenced potato accessions. For the construction of Hi-C libraries, DNA was extracted from in vitro seedlings, of which PG5068, PG0019 and E86-69 were digested with the restriction enzyme MboI, and PG6359 was digested with HindIII using the previously described Hi-C library preparation protocol52. These Hi-C libraries were sequenced on an Illumina HiSeq X Ten platform. The total RNA of 23 accessions (Supplementary Table 1) from the tissues of roots, stems, leaves, stolons, tubers and flowers was extracted for the library construction. These libraries were subsequently sequenced on the DNBSEQ-T7 system at Annoroad Gene Technology, which produced around 6 Gb data for each tissue in each sample.

De novo genome assembly of 44 potato and 2 Etuberosum accessions

Genome heterozygosity was estimated using a k-mer-based approach by GenomeScope2.0 (ref. 53). Genomes of the 44 HiFi sequenced accessions were assembled by hifiasm54 (, using default parameters. The initial output of hifiasm (v.0.13) yielded a pair of assemblies: (1) the primary assembly (in hifiasm named p_ctg), representing a mosaic haplotype without purging; and (2) the alternate assembly (in hifiasm named a_ctg), which represents the alternate haplotype absent from the primary one. To facilitate downstream analyses, including inter-genomic alignment and comparison of gene copy numbers, we generated ‘monoploid’ genome assemblies, accompanied by their heterozygous assembled fragments. Haplotigs from the primary assembled contigs, with haplotypes collapsed (p_ctg.*), were then excluded using the purge_dups (v.1.01) software ( to generate the heterozygous-region-purged assemblies, which were then termed as monoploid assembled contigs (MTGs), indicative of monoploid genomes. The raw alternate assemblies from hifiasm (a_ctg.*), in addition to the contigs that have been removed by purge_dups, were concatenated as the alternate assembled contigs (ATGs) to be the heterozygous genomic segments (Supplementary Fig. 2). The two Etuberosum genomes PG0019 and PG0009 were assembled using CANU v1.8 (ref. 55), and then two rounds of Pilon v.1.23 (ref. 56) were applied for genome polishing, using available resequencing data. Pseudo-chromosomes of the seven potato accessions (A6-26, E4-63, PG6359, E86-69, RH, RH10-15 and PG5068) and one Etuberosum accession (PG0019) were built with Hi-C reads, using the Juicer (v.1.5) (ref. 57) and 3D-DNA (v.180922) (ref. 58) pipeline, with parameters ‘-m haploid -I 15000 -r 0’. The assembly completeness in genic regions was evaluated using the solanales_odb10 database (for Solanaceae species) of BUSCO v.4.1.4 (ref. 18), with default parameters.

Identification and annotation of repetitive elements

Transposable elements (TEs) were identified by the Extensive De-Novo TE Annotator (EDTA)59 v.1.9.4, and the non-redundant TE libraries for each accession were passed into RepeatMasker v.1.332 ( to mask potential genomic repeats together with simple repeats and satellites, by default parameters.

Prediction of protein-coding genes

Three distinct strategies, comprising ab initio prediction, homology search and expression evidence, were combined to generate the predicted gene models. HISAT2 (v.2.0.1-beta) (ref. 60) was used to perform splice alignment of RNA-sequencing (RNA-seq) reads to the assembled genomes, with ‘–dta’ parameter. Potential transcripts were then assembled, using StringTie (v.1.3.3b) (ref. 61) with parameter ‘–rf’. BRAKER2 v.2.1.6 (ref. 62) was then run to use the transcript assemblies as hints to generate predicted gene models from AUGUSTUS (v.3.4.0) ( and to train the hidden Markov model (HMM) of GeneMark-ET (v.3.67_lic) (ref. 63). The parameters set in BRAKER2 were ‘–nocleanup –softmasking’.

Non-redundant human-curated plant homologous protein sequences, downloaded from the UniProt Swiss-Prot database (, combined with published peptide sequences of tomato and potato8,10,11,13,35, were used as homologous protein sequences. These and the assembled transcripts from StringTie (v.1.3.3b) were passed to MAKER2 (v.2.31.11) (ref. 64). Putative gene structures were then inferred and subsequently used as the training set to generate the HMM in SNAP (v.2013-02-16) ( MAKER2 was then run again, combining previously generated SNAP HMM, GeneMark-ET HMM and AUGUSTUS tuned species settings, along with the predicted gene models produced from the first round of MAKER2, to synthesize the final gene annotations. The longest transcript of each predicated gene model was considered as the representative.

For gene functional annotation, InterProScan 5.34-73.0 (ref. 65) was applied to predict potential protein domains, based on sequence signatures, with parameters ‘-cli -iprlookup -tsv -appl Pfam’.

Analyses of the protein-coding-gene-based pan-genome

All-versus-all BLASTP (v.2.2.30+) (ref. 66) results of 2,701,787 peptide sequences of protein-coding genes, annotated from 44 potato accessions and the DM v.6.1 reference genome11, were input into OrthoFinder (v.2.5.2) (ref. 67) for gene clustering, in which the MCL algorithm19 was enabled by setting the inflation factor to 1.5, resulting in 51,401 non-redundant pan-gene clusters. We classified those clusters into 4 categories: core gene clusters that were conserved in all the 45 individuals; soft-core gene clusters, which were present in 42–44 samples in our collection; shell gene clusters, which were found in 2–41 accessions; and accession-specific gene clusters, which contained genes from only 1 sample. To facilitate these analyses, if genes from the DM reference were present in one cluster, this gene was selected as the representative; otherwise, the gene with the longest encoded protein was chosen.

Simulation of pan-genome size in terms of number of protein-coding genes was performed by PanGP (v.1.0.1) (ref. 68) using the totally random algorithm, with a number of combinations, at each given number of genomes, of 500, and with the sample replication time set to 30.

Non-synonymous/synonymous substitution ratios (Ka/Ks) within core, soft-core and shell gene clusters were computed using ParaAT (v.2.0) (ref. 69), with parameters ‘-m muscle -f axt -k’. The default parameter of KaKs_Calculator was set to estimate the Ka/Ks values, which means that the Ka/Ks value was the average of the output from 15 available algorithms comprising 7 original approximate methods (NG, method from Nei and Gojobori; LWL, method from Li, Wu and Luo; MLWL, modified method from Li, Wu and Luo; LPB, method from Li, Pamilo and Bianchi; MLPB, modified method from Li, Pamilo and Bianchi; YN, method from Yang and Nielsen; MYN, modified method from Yang and Nielsen), 7 gamma-series methods (γ-NG, γ-LWL, γ-MLWL, γ-LPB, γ-MLPB, γ-YN and γ-MYN) and one maximum likelihood method (GY, method from Goldman and Yang) (ref. 70). To simplify the calculation, we randomly selected 1,500 clusters from clusters of core, soft-core and shell categories. Within each cluster, Ka/Ks values between gene pairs from 50 randomly chosen combinations of 2 accessions were estimated. The non-parametric multiple comparisons Kruskal–Wallis test was used to perform significance analyses for sample median, using the agricolae package in R v.4.0.3 (, as these data did not comply with a normal distribution. Multiple comparisons were performed, using the Fisher’s least significant difference. The level of significance used in the post-hoc test was 0.001. Functional enrichment was performed, using Fisher’s exact tests in R. Those functional classes with P < 0.05 were regarded as significantly enriched.

Whole-genome alignment of 73 Solanum species

Whole-genome alignment of 73 accessions, comprising 44 potato MTGs and the genomes of DM, 2 Etuberosum species, 24 tomato accessions (,, and 2 outgroup species of S. americanum and S. melongena (,44,71,72 were performed by ProgressCactus (v.1.2.3) (ref. 73). The tomato genomes investigated in this study were all built using the third-generation sequencing technique (PacBio CLR and Nanopore) and are all assembled into 12 chromosomes, indicative of their relatively high qualities. The guide tree used in ProgressCactus was inferred by IQ-TREE, v.2.0.6 (ref. 51). To reduce the computation requirement, genome sequences were soft-masked and contigs shorter than 100 kb were discarded. To facilitate downstream analyses, we next used PHAST toolkit v.1.5 (ref. 74) to generate 73-way multi-alignment blocks in fasta format, relative to the DM genome.

Genome comparison of 44 HiFi-assembled potato accessions

The 44 MTGs were aligned to the DM reference genome, using the nucmer program in MUMmer v.4.00rc1 (ref. 75) software with the ‘–mum’ parameter, and alignments with an identity of less than 90% and length shorter than 1,000 bp were discarded. We used a modified version of dotPlotly ( for visualization. To assess the heterozygosity distribution of 41 diploids (excluding 3 homozygous inbred lines), their MTGs and ATGs were split into 5-kb fragments and were aligned to the DM reference genome, using the same approach described above, and alignments shorter than 5 kb were discarded to reduce potential noise.

Identification of syntenic genes

To identify syntenic gene pairs, BLASTP (v.2.2.30+) was used to calculate pairwise similarities (e-value < 1 × 10−5), and MCscanX76 with default parameters was then applied.

Phylogenetic analyses

To build a super-matrix tree of 29 species (32 accessions, in which 4 are from S. tuberosum), amino-acid sequences of the longest transcripts of their annotated gene models were first extracted from the MTG genomes of 25 potatoes, 3 tomato accessions (see Supplementary Table 1 for more details)35,44,71,72, 2 Etuberosum species, S. americanum and eggplant77. All-versus-all alignments were generated using DIAMOND (v. (ref. 78). The results were then passed to OrthoFinder (v.2.5.2)67 to infer orthology. A total of 3,971 single-copy orthologues gene clusters were then generated and 32-way protein alignments for these genes were computed using MAFFT (v.7.471) (ref. 79) with default parameters. Maximum likelihood inference of phylogenetic relationships was performed using IQ-TREE v.2.0.6 (ref. 51), by automatically calculating the best-fit amino-acid substitution model via the ‘-m MFP’ parameter. The consensus tree was generated specifying the number of bootstrap replicates as 1,000 by ultrafast bootstrap approximation80. We also constructed a phylogenetic tree using an additional 20 potato (including DM) and 21 tomato genomes by applying the same approach described above.

To minimize the effect of ILS, we applied a multi-species coalescent-based method incorporated in ASTRAL (v.5.7.8) (ref. 81) to generate a species tree. ASTRAL took 3,971 single-copy gene trees as input and generated a species tree estimated by searching for the species tree that was most congruent with quartets garnered from the input gene trees.

To infer the local phylogeny among the 32 representative accessions, considering the diverse nucleotide evolution rate of coding and non-coding regions, we masked coding regions according to the gene prediction in DM using the maskFastaFromBed command embedded in BEDTools (v.2.29.2) (ref. 82), and repetitive regions were then hard-masked. We split Cactus whole-genome alignment blocks into 100-kb non-overlapping windows and inferred tree topologies for each window, using IQ-TREE51 with the parameter ‘-m GTR’. Next, we filtered the window trees with three standards: (1) fully aligned length > 10 kb; (2) missing rate < 20%; (3) mean bootstrap values > 80. After filtering, we next re-estimated the tree topologies of the retained 1,899 windows, using the selected best substitution model for each window, using ModelFinder implemented into IQ-TREE (ref. 51). To help with visualization, 500 window trees were randomly selected, with an R script modified from a previous report83 ( The consensus tree topology was generated by IQ-TREE51, using concatenated single-copy protein-coding sequences identified by OrthoFinder67.

Estimation of the divergence time

BASEML and MCMCTREE in the PAML package (v.4.9) (ref. 84) were used to estimate the divergence time. To reduce the computational burden, coding sequences (CDSs) of single-copy genes from 10 representative species (S. melongena, S. americanum, PG0019, LA716, LA2093, Heinz 1706, PG6241, PG4042, PG5068 and DM) were selected for a rough estimation of the substitution rate using BASEML with model = 7. MCMCTREE was then applied to estimate the divergence time with parameters ‘model = 7, burnin = 500,000, sampfreq = 100, nsample = 20,000’. The divergence time of potato–tomato (7.3–8.0 Ma)35,85 and potato–eggplant (13.7–15.5 Ma)85,86,87 was used for calibration. The estimation was performed for two rounds and generated very similar results.

ABBA-BABA statistics

On the basis of the genome assemblies, around 20-fold short reads of the 25 representative Petota accessions, 2 Etuberosum species, 3 tomatoes and S. americanum were simulated using WGSc (, and reads were mapped to the outgroup reference genome S. melongena using BWA-mem49 with the default parameters. Bi-allelic SNPs were then identified using SAMtools50 and BCFtools49. Setting S. melongena as the outgroup, an ABBA-BABA test was performed between all possible triplets among potato, tomato and Etuberosum species, using the Dtrios program within Dsuite (v.0.4 r28)88, with the ‘-c’ parameter. The tree topology among these four species, inferred from the whole-genome data in Newick format, was also passed into Dtrios via the ‘-t’ parameter.

Inference of ILS

The level of ILS at a given bi-allelic SNP i from the above mentioned 32-way alignment was calculated as CABBA(i) and CBABA(i) divided by the total count of segregating sites: (CBAAA(i) + CABAA(i) + CAABA(i) + 2(CBBAA(i) + CBABA(i) + CABBA(i)))/3, as described previously27. The tree topology used was (((Lycopersicon, Petota), Etuberosum), S. melongena).

To evaluate the theta value for internal branch, which reflects the level of effective population size89, we divided the mutation units by coalescent units. The mutation units were inferred by IQ-TREE (ref. 51) and the coalescent units were inferred by ASTRAL.

ILS simulation

Simulation of 20,000 gene trees with ILS among six potato accessions (S. tuberosum Group Stenotomum, S. candolleanum, Solanum lignicaule, S. chacoense, Solanum cajamarquense and Solanum bulbocastanum) were performed by DendroPy (ref. 90). The branch lengths of the estimated species tree by ASTRAL were used as an input. Frequencies between the observed and simulated gene-tree topologies from all possible four-species groups among the six potato species were plotted. The correlations were computed using the correlation function ‘cor()’ in R using the ‘pearson’ method.

Identification of SVs

Both read-mapping-based and assembly-based approaches were applied to identify SVs (≥50 bp in length). SVIM (v.1.4.2) (ref. 91) was used to identify putative SVs, consisting of insertions, deletions, inversions, duplications and translocations. SVs with quality ≥ 10 and number of supported reads ≥ 5 were kept. Assembled genomes of each accession were first aligned to the DM v.6.1 reference using the nucmer program embedded in MUMmer v4.00rc1 (ref. 75), with the following parameters: ‘–batch 1 -c 500 -b 500 -l 100’. The alignments in delta format were passed to the delta-filter program to retain highly reliable alignments with length ≥ 100 bp and identity ≥ 90%. Assemblytics (v.1.2.1) (ref. 92) was subsequently applied to identity SVs from the filtered alignments, setting the minimum SV size to 50 bp. To make the false positive rate in our SV dataset as low as possible, we only kept SVs in terms of insertions, deletions, inversions, duplications and translocations < 10 kb in size, identified by SVIM. For SV ≥ 10 kb, only insertions and deletions reported in Assemblytics were retained. The two SV datasets for each sample were then combined, using SURVIVOR (v.1.0.7)93 merge with parameters ‘0 1 1 1 0 50’.

To detect megabase-scale inversion events among the 20 landraces and 4 CND accessions, we applied ragtag (v.2.1.0) (ref. 94) with the default parameters, to order and orient the contig-level assemblies into 12 chromosomes, using the DM genome as the reference. Inversions were next identified using SyRI (v.1.4) (ref. 95) with parameters ‘-k -F S’. Only those inversions that located in a single contig were retained for downstream analyses.

Identification of hemizygous genes

To identify regions present in MTGs but absent in ATGs, we mapped HiFi reads of each accession to its corresponding MTGs using minimap2 (v.2.21-r1071) (ref. 96), and heterozygous deletions were detected using SVIM (v.1.4.2) (ref. 91) with default parameters (length ≥ 50 bp, quality ≥ 10, number of supported reads ≥ 2). To identify sequences present in ATGs but absent in MTGs, we aligned ATGs to MTGs from each accession and extracted the inserted regions using Assemblytics92 with parameters ‘unique_anchor_length = 10,000, min_variant_size = 50, max_variant_size = 10,000,000’. These results were merged as heterozygous SVs, and genes overlapping with those SVs were considered as hemizygous genes, as applied in a previous report97.

Analyses of recombination events

Breakpoints of crossing-overs were inferred based on the 624 selfing progenies of PG6359 (ref. 3), using a method described previously7.

Reannotation and classification of nucleotide-binding resistance genes

NLR-annotator (v.0.7) (ref. 98) was first applied to identify genomic segments containing putative nucleotide-binding domain and leucine-rich repeat (NLR) genes for each accession. A total of 7,007 amino-acid sequences of high-confidence NLR gene models, downloaded from resistance gene enrichment sequencing (RenSeq)-based NLR genes of 15 tomato accessions99, putative NLR genes in Arabidopsis100 (annotation version Araprot11) and experimentally validated NLR genes obtained from PRGdb 3.0 (ref. 101) and RefPlantNLR102, were used as homologous protein evidence. Training sets from SNAP and AUGUSTUS for each accession were then inputted to MAKER2, together with the assembled transcripts, in GFF3 format, and the homologous proteins, to predict and synthesize the final gene models. The reannotated NLR gene models were then integrated with the whole-genome annotation results, and originally predicted genes overlapping with our reannotated NLRs were removed to avoid redundancy.

To examine the completeness of NLR loci generated by our pipeline, we produced three NLR datasets of tomato accession ‘Heinz 1706’ from the predicted high-confidence and representative gene models (annotation version ITAG 4.0), predicted models using our pipeline, and previously reported RenSeq-derived models99, respectively. The RGAugury (v.2.2) (ref. 103) pipeline was then used to classify putative nucleotide-binding site (NB-ARC) domain-encoding genes into different subgroups, on the basis of domain and motif structures: TN (Toll/interleukin-1 receptor (TIR) and NB-ARC), CN (coiled-coil (CC) and NB-ARC), NL (NB-ARC and leucine rich repeat (LRR)), CNL (CC, NB-ARC and LRR), NB (NB-ARC), TNL (TIR, NB-ARC and LRR).

For identification of putatively expanded NLR clusters in potato, the annotated NLR loci from 45 potato, 22 tomato and 2 Etuberosum genomes were classified into clusters, using the method described in the pan-genome analysis. The NLR copy numbers in potato, tomato and Etuberosum accessions, in each cluster, were compared by Wilcoxon rank-sum test using the R package exactRankTests. The clusters with P < 0.05 were extracted as the expanded clusters. For the potato-specific NLR analyses, the NLR protein sequences from 2 Etuberosum species and 22 tomato species were merged together, as a query, to blast against those from the 45 potato species. If the best hit of a potato NLR showed an identity < 75, the NLR was defined as potato-specific. NLRs with transcripts per kilobase of exon model per million mapped reads (TPM) ≥ 1 in potato stolon or tuber were extracted and considered as expressed in these tissues.

Gene expression profiling

RNA-seq reads of 23 accessions (Supplementary Table 1) as well as DM (SRA030516) were mapped to the corresponding assembled genome, using HISAT2 (v.2.0.1-beta) (ref. 60), with parameters ‘-x –dta’. StringTie (v.1.3.3b) (ref. 61) was applied to compute the expression level for each predicted gene in terms of TPM values, using ‘-e -G’ parameters.

Analyses of CNSs

Tools embedded in the PHAST package (v.1.5) (ref. 74) were used for CNSs analyses. The msa_view program was applied to extract fourfold degenerate synonymous sites and to prepare sufficient statistics, on the basis of multiple alignments and CDS annotation of the reference genome. PhyloFit was then used to train the un-conserved model, with sufficient statistics generated by msa_view. phastCons, with the parameter ‘–most-conserved’ used to identify conserved regions and assign an odds score for each site. Finally, conserved regions containing gaps and overlapping with CDS were removed to generate CNSs shared among potato species. To further remove CNSs shared within outgroup species, we identified CNSs from genomes of 45 potato, 24 tomato and 2 Etuberosum species, using the same pipeline presented above. The potato conserved CNSs that shared sequences with tomato and Etuberosum species were removed. In addition, short sequences (<5 bp) were excluded and sequences that were located within 10 bp of each other were merged to generate the final potato-specific CNSs. ChIPseeker v.1.24.0 (ref. 104) was adopted to annotate these CNSs, in which sequences 3 kb upstream from the start codon or 3 kb downstream from the stop codon of a certain gene were defined as promoter or downstream regions. Genes possessing CNSs within their promoters, introns, upstream regions, downstream regions or UTRs were defined as CNS-associated genes. pyGenomeTracks v.3.6 was applied to visualize the CNS region105. Sequences flanking IT1 CNS regions were extracted from the 71-way multiple alignment and were imported into MView (v.1.67) (ref. 106) to generate the multiple comparisons figure.

Generation of it1 mutants

The diploid S. tuberosum Group Phureja S15-65 clone was used for gene editing in this study. The 19-nucleotide single-guide RNA sequence for IT1 from the S15-65 clone was incorporated into the CRISPR–Cas9 vector pKSE401 (ref. 107). Three-week-old plantlets were used for transformation. Agrobacterium-mediated transformation of potato internodes was conducted as previously described6: after two days of pre-culture, the explants were co-cultured with Agrobacterium containing pKSE401 with the target sequence for two days, in the presence of 2 mg l−1 α-naphthaleneacetic acid and 1 mg l−1 zeatin, followed by callus induction and regeneration mediated by 0.01 mg l−1 α-naphthaleneacetic acid and 2 mg l−1 zeatin until shoot proliferation. Positive transformants were screened on the basis of growth on the medium containing 50 mg l−1 kanamycin.

Yeast-two-hybrid library screening

The cDNA of S15-65 stolons was used to construct the cDNA library for yeast-two hybrid by using the CloneMiner II cDNA Library Construction Kit. The library was screened, with the IT1 as bait, according to the Matchmaker Gold Yeast Two-Hybrid System User Manual. To further validate the interaction between IT1 and SP6A, the CDSs of IT1 and SP6A were inserted into pGADT7 and pGBKT7 vectors, respectively, and co-transfected into the yeast strain AH109, and the yeast cells were then plated on SD/−Leu/−Trp medium and SD/−Leu/−Trp/−His medium and cultivated at 30 °C for 5 days.

Firefly luciferase complementation imaging assay

The CDSs of SP6A and IT1 were fused in the pCAMBIA-nLUC-GW and pCAMBIA-cLUC-GW vectors, respectively108. The vectors were transformed into Agrobacterium strain GV3101, and infiltrated into Nicotiana benthamiana leaves. After 3 days, the detached leaves were sprayed with 100 mM luciferin and kept in the dark for 10 min. The leaves were observed under a low-light cooled charge-coupled device imaging apparatus, Lumazone 1300B (Roper Bioscience).

Quantitative PCR analysis of SP6A expression

The seeds of potato inbred line E4-63 and Etuberosum species PG0019 were planted in soil and cultivated under long-day conditions (16-h light, 8-h dark) for one month, and then half of the plants were transferred to short-day (8-h light, 16-h dark) conditions. When flower buds developed in the long-day plants (usually two months after sowing), the fourth leaf of both long-day and short-day plants was harvested at ZT4 to investigate SP6A gene expression. The total leaf RNA was extracted using an RNAprep Pure Plant Kit (DP441), and a PrimeScript RT Reagent Kit (RR047A) was used to reversely transcribe the RNA to cDNA. Quantitative PCR was performed using SYBR Premix Ex Taq II (RR820A) on a 7500 Fast Real-Time PCR system (Applied Biosystems), according to standard instructions. EF1A was used as the internal reference. The specific primers used are listed in Supplementary Table 17.

Syntenic analyses of R3a and SP6A loci

To plot syntenic relationships around the R3a locus, collinear blocks between the given two species were identified by MCScanX (Python version)109. Syntenic genes around R3a loci were extracted and plotted using in-house R scripts. For a synteny plot of the SP6A loci, the SP6A genomic sequences from different species were extracted and aligned using MAFFT, with ‘–auto’ parameter79. In-house Python scripts were used to transfer aligned regions between two species to the BED format required by MCScanX. The micro-synteny plot between the two species was then generated. Finally, synteny plots among different species were merged using Adobe Illustrator.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Source link