### Collection of morphological data

We obtained morphological data for 36 adult specimens representing 22 species (Fig. 2a) from frozen cadavers acquired from the Cowan Tetrapod Collection at the Beaty Biodiversity Museum (University of British Columbia, Vancouver, Canada). Sample size was a function of the availability and quality of specimens from the museum as we could only rely on fully-intact, well-preserved specimens. The cadavers were inspected to ensure adequate condition and completeness, after which we measured the full body mass, wingspan, and body length. Next, we disarticulated the wing at the shoulder joint, taking care to ensure that each wing’s skin, propatagial elements, and feathers remained intact. One wing from each cadaver was used to determine wing ROM and corresponding wing shape change (see ‘Determination of the elbow and wrist ROM’). The cadaver was further dissected to obtain length and mass measurements for the head, neck, torso, wing components, legs, and tail (refer to Supplementary Methods for details on each measurement). We obtained the centre of gravity coordinates for the torso (body without head, neck, tail, wings) by manually balancing the torso and measuring the distance from the clavicle reference point to the balanced position. Note that because of the preservation of the storm petrel specimens, we estimated the mass on the basis of humerus bone length and the torso centre of gravity as being proportional to that of the gull. Finally, we individually weighed and photographed each flight feather, enabling geometric parameters to be extracted using ImageJ software^{30}. Refer to the publicly available data for details on all assumptions used for extracting the morphological measurements. Note that this study consisted of a single experimental group and thus randomization and blinding was not necessary.

### Determination of the elbow and wrist ROM

To determine the wing ROM and corresponding shape change, we actuated the cadaver wings throughout the full range of extension and flexion of the elbow and wrist joints by hand (following methods established by Baliga et al.^{11}, Fig. 1g). We tracked the location of 10 reflective markers each 4 mm in diameter (grey and white points in Fig. 1c–f, refer to Supplementary Methods for details) with automated 3D data capture at 30 frames per second using a 4- or 5-camera tracking system (OptiTrack, NaturalPoint). Using tools from NaturalPoint, each recording was calibrated to have less than 0.5 mm overall mean reprojection error. Joint angles were calculated as the interior angle defined by three key points: points 1, 2 (vertex) and 3 for the elbow, and points 2, 3 (vertex) and 4 for the wrist (Supplementary Methods).

### Developing AvInertia

We developed an open source R package (AvInertia) to calculate the centre of gravity and moment of inertia tensor (**I**) for any flying bird (Fig. 1a) in RStudio^{31} (version 1.3.1093) running R^{32} (version 4.0.3). A high-level overview of the code methodology follows in this section. Further details are provided in the Supplementary Methods, as each individual component of the avian models required specific procedures and approximations.

To allow a generalized approach, we used a common methodology from mechanics to estimate the centre of gravity and inertia components using simple geometric shapes^{9}. We elected to use as many elements as possible to allow the best resolution. For each species, we first modelled the bird’s body without the wings as a composite of five components: head, neck, torso, legs and tail. To determine the inertial properties of the wings, we aligned each wing configuration extracted from the ROM measurements so that the wrist joint was in line with the shoulder joint along the *y* and *z* axes and so that the wrist joint was aligned with the first secondary feather (S1) along the *x* axis (extended wing: Fig. 1c, d; folded wing: Fig. 1e, f). Note that this positioning results in a different shoulder angle between each wing configuration and wings with extremely low elbow angles and high wrist angles being positioned at substantially different incidence angles than the body. Each wing was then modelled as a composite of twelve components: bones (humerus, radius, ulna, carpometacarpus/digit, radiale and ulnare), muscles (brachial, antebrachial and manus groups), skin, coverts, and tertiary feathers. In addition, each primary and secondary feather was modelled and positioned individually as a composite structure of five components: calamus, rachis (cortex exterior and medullar interior), and distal and proximal vanes. AvInertia permits a variable number of flight feathers. With our methodology, a bird with 10 primaries and 10 secondaries that flies with an extended neck will be represented by a composite model with 232 individual simple geometric shapes. In our study, we investigated only symmetric wing configurations for a full bird and considered the effects of a single wing independently. We assumed that anisotropic effects such as the air space within the body would have a minimal impact on the overall centre of gravity^{33}.

To calculate the final inertial characteristics of this composite bird, each component’s shape, mass, and positioning was informed by its corresponding morphological measurements. We began by determining the centre of gravity and **I** for one of the basic geometric shapes with respect to an origin and frame of reference that simplified the formulation of the centre of gravity and **I** for that shape. Next, AvInertia computed the mass-weighted summation of the centre of gravity of each object and shifted the origin to the bird reference point, located at the centre of the spinal cord when cut at the clavicle. The centre of gravity was then transformed into the full bird frame of reference, which is defined by Fig. 1c–f. We used the parallel axis theorem and the appropriate transformation matrices to transform **I** to be defined about the final centre of gravity within the full bird frame of reference.

### Validating AvInertia

We validated our methodology by comparing the maximum rotational inertia about the roll axis for a single wing (({{I}_{{xx}}}_{{rm{wing}}}), origin at the humeral head) to data from previous experimental studies that measured ({{I}_{{xx}}}_{{rm{wing}}}) by cutting an extended wing into strips^{25,34} (Fig. 1h). Our 95% confidence intervals on the exponent of body mass marginally overlapped with Berg and Rayner’s predictions^{25} but were significantly lower than Kirkpatrick’s predictions^{34}. However, Kirkpatrick used 10 wing strips while Berg and Rayner later found that at least 15 strips were necessary to minimize systematic error^{25,34}. Next, we directly compared results for the pigeon (*Columba livia*), the only species in common between the studies, and found ({{I}_{{xx}}}_{{rm{wing}}})(×10^{4}) was between 1.42 and 1.92 kg m^{2}, which encompasses values from previous studies^{25,34,35} (1.72 and 1.83 kg m^{2}). The pigeon wing’s maximum centre of gravity position along the *y*-axis (({y}_{{bar{{rm{C}}}{rm{G}}}_{{rm{w}}{rm{i}}{rm{n}}{rm{g}}}})) was only 3% of the half span more proximal than Berg and Rayner’s measurement^{25}. We expect minor differences because strip methods enforce that all wing mass is contained within the *x–y* plane while AvInertia accounts for out-of-plane morphology (Fig. 1h).

### Agility and stability metrics

We developed a pitch agility metric that estimates the change of the angular acceleration about the *y* axis ((triangle dot{q}), known as the time rate of change of the pitch rate) due to a degree change in the angle of attack (Δ*α*) as:

$$frac{{rm{bigtriangleup }}dot{q}}{{rm{bigtriangleup }}alpha }propto frac{left[left({left(frac{{mathop{x}limits^{ sim }}_{c/4}}{{c}_{{r}_{max}}}right)}^{0.8}{c}_{{r}_{max}}right)-{x}_{{rm{C}}{rm{G}}}right]{({m}^{0.12})}^{2}{S}_{max}}{{I}_{yy}}$$

(3)

Where *m* is the body mass, *c*_{r}_{max} is the maximum root chord for the specimen, *S*_{max} is the maximum single wing area for the specimen, *x*_{CG} is the centre of gravity position on the *x*-axis measured from the humeral head, and ({widetilde{x}}_{c/4}) is the quarter chord of the standard mean chord^{37} (defined in equation (7)). This equation was derived beginning from the rigid aircraft *y* axis rotational equation of motion assuming a symmetric configuration undergoing small disturbances^{5}:

$$triangle M={I}_{{yy}}triangle dot{q}$$

(4)

From this equation, we estimated the change in pitching moment (Δ*Μ*) with a Taylor series expansion method assuming that the largest effect is due to angle of attack and then non-dimensionalized as follows^{5}:

$$Delta M=frac{{rm{partial }}M}{{rm{partial }}alpha }Delta alpha =frac{1}{2}rho {V}^{2}(2{S}_{max}){c}_{{r}_{max}}frac{{rm{partial }}{C}_{M}}{{rm{partial }}alpha }Delta alpha =frac{1}{2}rho {V}^{2}(2{S}_{max}){c}_{{r}_{max}}frac{{rm{partial }}{C}_{M}}{{rm{partial }}{C}_{L}}frac{{rm{partial }}{C}_{L}}{{rm{partial }}alpha }Delta alpha $$

(5)

Where *ρ* is air density, *V* is the freestream scalar velocity, and *C*_{M} and *C*_{L} are the coefficients of pitching moment and lift, respectively. Because the pitching moment slope ((frac{{rm{partial }}{C}_{M}}{{rm{partial }}{C}_{L}})) is proportional to static margin^{16,36}, we estimated each configuration’s neutral point (indicated by the subscript NP) using our previous morphing gull wing-body aerodynamic results (see Supplementary Methods). This analysis revealed that the neutral point for a wing-body configuration scaled with:

$$,frac{{x}_{{rm{NP}}}}{{c}_{{r}_{{rm{max }}}}}approx {left(frac{{widetilde{x}}_{c/4}}{{c}_{{r}_{{rm{max }}}}}right)}^{0.8}$$

(6)

({widetilde{x}}_{c/4}) is the quarter chord of the standard mean chord defined as^{37}:

$${widetilde{x}}_{c/4}=frac{{int }_{0}^{b/2}c(,y,){x}_{c/4}(,y,){rm{d}}y}{{int }_{0}^{b/2}c(,y,){rm{d}}y}$$

(7)

Where *b* is the wingspan and, *c* and *x*_{c/4} are the chord and quarter chord location as a function of the span position (*y*), respectively. This equation was evaluated numerically for each of the bird wings modelled with 1,000 segments. In addition, we performed a sensitivity analysis on the exponent (see Supplementary Methods). With the estimated neutral point, we calculated the static margin as:

$${rm{S}}{rm{t}}{rm{a}}{rm{t}}{rm{i}}{rm{c}},{rm{m}}{rm{a}}{rm{r}}{rm{g}}{rm{i}}{rm{n}}=-frac{{rm{partial }}{C}_{M}}{{rm{partial }}{C}_{L}}=frac{{x}_{CG}-left({left(frac{{mathop{x}limits^{ sim }}_{c/4}}{{c}_{{r}_{max}}}right)}^{0.8}{c}_{{r}_{max}}right)}{{c}_{{r}_{max}}}$$

(8)

Refer to Supplementary Methods for further details pertaining to the aerodynamic assumptions.

For the pitch agility metric, we incorporated a previously established allometric scaling^{26} of cruise velocity ((Vpropto ) *m*^{0.12}). We assumed a constant air density (*ρ*) and constant lift slope ((frac{{rm{partial }}{C}_{L}}{{rm{partial }}alpha })) across species to obtain the final proportional relationship as:

$${rm{bigtriangleup }}Mpropto left[left({left(frac{{mathop{x}limits^{ sim }}_{c/4}}{{c}_{{r}_{max}}}right)}^{0.8}{c}_{{r}_{max}}right)-{x}_{{rm{C}}{rm{G}}}right]{({m}^{0.12})}^{2}{S}_{max}{rm{bigtriangleup }}alpha $$

(9)

This result was then returned to equation (4) and rearranged to obtain the pitch agility metric as seen in equation (3).

### Phylogenetic and statistical analyses

All phylogenetically informed analyses were carried out using the time-calibrated maximum clade credibility tree from Baliga et al.^{11}, which was pruned to the 22 focal taxa in this study. To determine the linear trends with body mass, we fit first-order phylogenetic generalized linear mixed models (PGLMM) to the data using the R package MCMCglmm^{38} where the random effects are informed by the phylogeny (Extended Data Table 1). Note that the linear trend of the pitch agility range with body mass remains significant even if the storm petrels are removed from the data. All PGLMM models had priors specified with the inverse Wishart scaling parameters **V** = 1 and *ν* = 0.02 and used 1.3 × 10^{7} Markov chain Monte Carlo iterations. As visualized by Fig. 2f, we cannot definitively exclude the possibility that the lower 95% confidence interval on ({x}_{bar{{rm{C}}}{rm{G}}}) may be positive which would indicate that ({x}_{bar{{rm{C}}}{rm{G}}}) scales greater than isometric predictions. However, multiple MCMCglmm runs returned an insignificant result. To determine the significance and effect of the elbow and wrist on the centre of gravity and **I** components, we independently fit first order interactive models to each specimens’ data with a constant scaling on the elbow and wrist angle. We calculated the effect size of the elbow and wrist using the R package effectsize^{23} and independently fit first order interactive models to each specimens’ data with scaled and mean centred elbow and wrist angles.

Next, to investigate the phenotypic optimum of the pitch agility and stability traits(,) we independently fit both Brownian motion and Ornstein Uhlenbeck models to the absolute data using the R package geiger^{39}. We assumed that all species belong to the same regime and thus, fit single-peak evolutionary models. This analysis revealed that there was evidence that the Ornstein Uhlenbeck model was a better model fit for all three of our selected traits (({x}_{bar{{rm{C}}}{rm{G}}}) maximum and minimum static margin) due to a lower Akaike information criterion with correction for smaller sample sizes (AICc). Because of the smaller sample size of our study^{40}, we ran a Monte Carlo simulation (*n* = 5,000) with the R package pmc^{41} to validate that selecting the Ornstein Uhlenbeck model over the Brownian motion model was appropriate (Extended Data Fig. 2). This method returns a distribution of likelihood ratios (twice the difference of the maximum log likelihood for each model) when the traits have been simulated *n* times under each model. These distributions are then compared to the observed likelihood ratio (black dashed vertical lines in Extended Data Fig. 2). For details, refer to Boettiger et al.^{41}. We found that the likelihood ratio predicted by a Brownian motion model was more extreme than the observed ratio for the minority of simulations (({x}_{bar{{rm{C}}}{rm{G}}}):0.2%, maximum static margin: 0.1%, minimum static margin: 1%). Further we had sufficient power to differentiate the two models as the majority of the simulations under the Ornstein Uhlenbeck model fell outside of 95th percentile of the Brownian motion distribution (({x}_{bar{{rm{C}}}{rm{G}}}) :73.8%, maximum static margin: 77.2%, minimum static margin: 67.2%). 95% confidence intervals were constructed for each reported metric of each trait (Extended Data Table 2). Together these results provide confidence that the observed likelihood ratio of each trait is more likely to occur under an Ornstein Uhlenbeck model than a Brownian motion model.

### Sensitivity analysis

Because both the pitch agility and stability metrics directly depend on ({x}_{{rm{C}}{rm{G}}}), we investigated the sensitivity caused by shifting the combined torso and tail centre of gravity forwards and backwards by up to 15% of the torso. Note that for some species there was a physical limit to the ability to relocate the centre of gravity while maintaining the known morphological properties and if the shifted distance was larger than 4 cm we removed it from the analysis as that was assumed to be an overestimate. The final estimated shift of the relative maximum and minimum static margin is shown in Extended Data Fig. 4. This sensitivity analysis revealed a minor effect on the parameters.

Finally, we wanted to investigate the potential effect of error in our measured centre of gravity metric on our key evolutionary results. To this end, we used a custom bootstrapping code (*n* = 5,000) and randomly sampled (with replacements) from each specimen’s centre of gravity error range used for the sensitivity analysis to recalculate the mean value of the minimum and maximum static margin for each species. With each of these new trait distributions, we re-fit an Ornstein Uhlenbeck model and extracted the optimal phenotype (Extended Data Fig. 3). We found that even allowing for this substantial centre of gravity error, all minimum static margin cases had an unstable optimum and all maximum static margin cases had a stable optimum (Extended Data Fig. 3). Note that this analysis is equivalent to both accounting for the same magnitude shift in the neutral point with a fixed centre of gravity as well as accounting for possible inter-specific variation within the error bounds shown in Extended Data Fig. 4.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.