On the indeterministic nature of star formation on the cloud scale

: Molecular clouds are turbulent structures whose star formation eﬀiciency (SFE) is strongly affected by internal stellar feedback processes. In this paper, we determine how sensitive the SFE of molecular clouds is to randomized inputs in the star formation feedback loop, and to what extent relationships between emergent cloud properties and the SFE can be recovered. We introduce the YULE suite of 26 radiative magnetohydrodynamic simulations of a 10 000 solar mass cloud similar to those in the solar neighbourhood. We use the same initial global properties in every simulation but vary the initial mass function sampling and initial cloud velocity structure. The final SFE lies between 6 and 23 per cent when either of these parameters are changed. We use Bayesian mixed-effects models to uncover trends in the SFE. The number of photons emitted early in the cluster’s life and the length of the cloud provide the strongest predictors of the SFE. The H II regions evolve following an analytic model of expansion into a roughly isothermal density field. The more eﬀicient feedback is at evaporating the cloud, the less the star cluster is dispersed. We argue that this is because if the gas is evaporated slowly, the stars are dragged outwards towards surviving gas clumps due to the gravitational attraction between the stars and gas. While star formation and feedback eﬀiciencies are dependent on non-linear processes, statistical models describing cloud-scale processes can be constructed. ABSTRACT Molecular clouds are turbulent structures whose star formation efﬁciency (SFE) is strongly affected by internal stellar feedback processes. In this paper, we determine how sensitive the SFE of molecular clouds is to randomized inputs in the star formation feedback loop, and to what extent relationships between emergent cloud properties and the SFE can be recovered. We introduce the YULE suite of 26 radiative magnetohydrodynamic simulations of a 10 000 solar mass cloud similar to those in the solar neighbourhood. We use the same initial global properties in every simulation but vary the initial mass function sampling and initial cloud velocity structure. The ﬁnal SFE lies between 6 and 23 per cent when either of these parameters are changed. We use Bayesian mixed-effects models to uncover trends in the SFE. The number of photons emitted early in the cluster’s life and the length of the cloud provide the strongest predictors of the SFE. The H II regions evolve following an analytic model of expansion into a roughly isothermal density ﬁeld. The more efﬁcient feedback is at evaporating the cloud, the less the star cluster is dispersed. We argue that this is because if the gas is evaporated slowly, the stars are dragged outwards towards surviving gas clumps due to the gravitational attraction between the stars and gas. While star formation and feedback efﬁciencies are dependent on non-linear processes, statistical models describing cloud-scale processes can be constructed.


INTRODUCTION
Stars form in dense, turbulent, self-gravitating clouds. The most massive stars produce large quantities of energy that drive outflows from the cloud, creating time-and space-dependent feedback loops in the cloud's star formation cycle (see reviews by, e.g. Mac Low & Klessen 2004;Ballesteros-Paredes et al. 2007;. The path a star-forming system takes between its initial and final states depends on a number of non-linear processes that are dynamically linked. The (magneto)hydrodynamic equations and the Poisson equation governing self-gravity underlying the dynamics of the cloud are non-linear. The gravitational collapse of turbulent structures in a cloud based on these equations gives rise to a statistical distribution called the Core Mass Function (IMF), which in turn leads to a stellar initial mass function (IMF). Stars produce quan-⋆ E-mail: sam.geen@uni-heidelberg.de tities of radiation and kinetic outflows dependent on the evolution of their atmospheres, and this feeds back into the cloud, affecting future structure evolution and star formation. Eventually, this feedback loop is terminated when all of the gas in the cloud is dispersed and only the young star cluster remains.
In this paper, we randomize the sampling of the stellar IMF and the turbulent velocity field of a single star-forming cloud, determine how much this affects the final mass in stars formed by the cloud, and identify whether there are any emergent linear trends between the early state of the cloud and the final mass of stars formed.
Coupling the 1D analytic and 3D hydrodynamic approaches into a single theory of star formation is somewhat difficult. Analytic models must make certain simplifications that omit features of the full 3D approach, with the ansatz that these features are unimportant in describing the full cycle of star formation in clouds. Meanwhile, in numerical simulations it is much harder to understand the underlying causal relationships between the modelled system components due to the complexity of the full set of equations being solved. Simulations are considerably more expensive than simpler models, and performing multiple simulations that capture the full parameter space of the problem is often prohibitively expensive.
The goal of this paper is to explore how sensitive star formation is to small changes in model inputs by performing multiple realizations of the same cloud with different choices of randomly sampled input parameters, and identifying relationships between input and emergent properties of the cloud and the final SFE.
The problem of statistical variation in astrophysical systems has been studied in different domains. Goodwin, Whitworth & Ward-Thompson (2004) investigate the effect of levels of turbulence on the formation of N-body star cluster simulations. Meanwhile, Girichidis et al. (2011Girichidis et al. ( , 2012a explore the role of cloud properties such as the density profile and turbulence inside the cloud on resulting cloud properties. The SLUG framework (da Silva, Fumagalli & Krumholz 2012, 2014Krumholz et al. 2015) has quantified the effect of stochastic IMF sampling on cluster properties such as star formation rate (SFR) tracers and photometry. Hoffmann et al. (2017) investigate the problem in the context of planet-forming discs. On a larger scale, Keller et al. (2018) study the effect of stochastic noise on cosmological galaxy formation simulations.
We focus in this case on the mass of massive stars sampled from the stellar IMF and the seeding of the initial turbulent velocity field in the cloud. We ask two broad questions.
One, for a given system, how much variation in the resulting global properties of the cloud is there when these two processes are sampled differently?
Two, can we naïvely recover any trends from the resulting cloud structure, or is the evolution of molecular clouds dominated by nonlinear processes?
We split this paper into the following sections. In Section 2, we present the simulations performed in this study and the numerical setup used. In Section 3, we discuss the global properties of each of our simulations and the evolution of the clouds over time. In Section 4, we identify trends in the cloud using a statistical model and suggest which emergent properties of the system can be used to explain the resulting SFE of the cloud. Finally, in Section 5, we discuss our results and provide scope for future work on this topic.

NUMERICAL SIMULATIONS
In this section, we introduce the YULE suite of simulations used in this paper. These are listed in Table 1. The suite is divided into two groups of 13 simulations, labelled STARS and TURB. 1 All of these simulations are given names with three letter codes, since there is no preferential ordering to the randomly sampled simulation parameters. For all of the simulations in this paper, we use the radiative magnetohydrodynamic (RMHD) Eulerian adaptive mesh refinement (AMR) code RAMSES (Teyssier 2002;Fromang, Hennebelle & Teyssier 2006;Rosdahl et al. 2013).

Initial conditions
Each of our simulations has an identical initial setup. Each cloud has an initial gas mass of 10 4 M ⊙ . The density distribution is identical to the 'L' cloud in Geen et al. (2017). In that paper, we show that the cloud's density structure is very similar to that of clouds in the nearby Gould Belt (Poppel 1997), < 500 pc away from the Sun. The cloud has an initially spherically symmetric density profile, with an isothermal profile up to a radius r ini = 7.65 pc. See Iffrig & Hennebelle (2015) and Geen et al. (2015bGeen et al. ( , 2016 for more detailed information about the initial density profile. A uniform sphere is placed out to 2r ini with a density 0.1 times that just inside r ini .The total length of the cubic volume simulated is 16 times r ini , or 122 pc. The cloud is initially virialized, with a turbulent velocity field. The initial field has a Kolmogorov power spectrum (P(k) ∝ k −5/3 ) with random phases. We generate 13 velocity fields, each with a different random seed. Labels identifying these fields are listed in Table 1. The global freefall time in our cloud t ff is 4.2 Myr.
The initial ratio of t ff to sound crossing time t sound on the cloud scale is 0.15 (note that for Jeans collapse, t ff < t sound ). The ratio of t ff to the crossing time of turbulent flows t rms , defined by the rms velocity V rms ,is2.
The initial ratio of t ff to Alfvén crossing time in the cloud is 0.2. The Alfvén crossing time is defined as R √ 4πρ/B, so a higher ratio for a given cloud means a larger magnetic field. The precise value of B is density dependent and evolves as gravity and HII regions compress structures in the cloud. Initially, the magnetic field is pointed along one coordinate axis with a peak strength of 457 μG, though over time the peak field strength rises to around 100 mG in the densest structures (Pellegrini et al. 2009, find magnetic field strengths of a few hundred mG around massive stars in Orion). We will explore the evolution of the magnetic field in the simulation more fully in future work.

Cloud evolution and sink formation
Each simulation is performed on an adaptively refined octree, in which each cell in a cubic grid with size 2 lmin is recursively subdivided into eight evenly sized cells up to an effective resolution of 2 lmax cells. Here, the minimum level l min = 7, giving a root grid of 128 3 cells, mainly designed to capture a large empty volume around the cloud that traces outflows. We refine a sphere of diameter 8r ini for a further two levels. For an additional three levels, up to l max = 12, any cell in the simulation volume is refined if it is either Jeans unstable by a factor of 10 over the Jeans stability limit, or more massive than 0.25 M ⊙ . The maximum spatial resolution is 0.03 pc, or 122 pc/2 12 .Att ff , our simulations typically have ∼10 6 cells at level 9 ( x = 0.24pc) and ∼20 000 cells at the highest level of refinement. Table 1. List of simulation names used in this paper. The names have no preferential ordering since the quantities being varied are sampled at random. Shortened names used for referencing are given in parentheses. The STARS simulations are run using the same turbulent initial velocity fields as in Geen et al. (2017). The TURB simulations use the same IMF sampling as in the HUR STARS simulation. Lists of stellar masses appearing in each cloud in the STARS set are given in Appendix A. The cloud is 'relaxed' by simulating without self-gravity for 0.5 t ff , so that the turbulent velocity and initially spherically symmetric density fields can couple (see Klessen, Heitsch & Mac Low 2000;Lee & Hennebelle 2016, amongst others). After this time we apply self-gravity to the cloud.
Once gas cells reach at least 0.1 of the Jeans density at the highest refinement level, we identify them in clumps. This is done by identifying contours from high to low density via the 'watershed' method, and then merging clumps identified in this first pass into larger structures by locating saddle points in the density field . If a clump exceeds the Jeans density, it forms a sink particle, which accretes 90 per cent of the mass above this density threshold . Sink particles have a variable mass since they accrete over time, but the average sink mass by the end of a simulation is between 50 and 100 M ⊙ .

Radiative transfer and cooling
Ionizing photons are propagated across the full adaptive grid using the M1 method described in Rosdahl et al. (2013). We use three 'grey' photon groups to bin the full spectrum of ionizing photons. These bins have lower bounds at the ionization energies of HI,HeI, and HeII. Each AMR grid cell tracks the photon density and flux of photons in each group. We couple the photons to the gas at every timestep and follow the ionization states of hydrogen and helium in every AMR grid cell. RAMSES requires a single photon energy and cross-section (weighted by photon energy and number) for each photon bin, so we pick representative values for each of them based on a stellar population, using identical values to Geen et al. (2017).
The number of photons we inject around each source is described in Section 2.5. We do not include photon energies lower than the ionization energy of hydrogen, or direct or scattered radiation pressure. We discuss these omissions in Section 5.7.
We use the radiative cooling module described in Geen et al. (2016Geen et al. ( , 2017. For gas in collisional ionization equilibrium, we use Audit & Hennebelle (2005)below10 4 K and a fit to Sutherland & Dopita (1993) above this limit. We perform cooling on photoionized hydrogen and helium as described in Rosdahl et al. (2013), with an additional fit to Ferland (2003) to capture photoionized metal cooling. All cells are assumed to be at solar metallicity.

Massive star formation
In Geen et al. (2017), we used a fit to a population model for ultraviolet (UV) photon emission to limit issues with stochasticity in our results. In this paper, we instead sample individual massive stars to study the role of statistical variation in the sampling of the IMF and to track individual massive stars within the cluster.
In our simulations, we track the mass distribution of all stars via sink particles. In addition, we follow the evolutionary state of stars above 8 M ⊙ in order to track how much ionizing radiation the stars produce. Below this mass stars do not produce significant quantities of ionizing radiation. The initial masses of the stars are distributed according to a Chabrier IMF (Chabrier 2003). In order to provide reproducible results, we pre-generate 13 lists of stellar masses by randomly sampling from the IMF until we reach 10 4 M ⊙ . We extract all stars above 8 M ⊙ in the order they were first sampled. Stars below this mass are accounted for as mass in sink particles. Every time the code generates a massive star, it draws from this list in order and is synchronized such that the same star is not selected by different CPUs in the same time-step. The lists of stars formed in each simulation are given in Appendix A.
Each sink particle tracks the amount of mass accreted onto it with the variable m sink , where the sum of m sink over all sink particles is m * . Every time m * exceeds 120 M ⊙ , we identify the sink particle with the largest m sink . We create a 'virtual' stellar object, representing a single massive star. This stellar object is attached to the sink particle and its position is always set to be the same as the sink's. The sink's m sink is then decremented by 120 M ⊙ and the process is repeated. We decrement 120 M ⊙ rather than the mass of the stellar object in order to account for stars below 8 M ⊙ in the mass distribution of the sinks. We use steps of 120 M ⊙ since in the IMF used there is one star above 8 M ⊙ per 120 M ⊙ of stars formed.
This method for sampling massive stars has been previously used by, e.g. Hennebelle & Iffrig (2014), Gatto et al. (2017), and Peters et al. (2017), although these authors create new stars when sinks accrete 120 M ⊙ on a per-sink basis rather than in the cluster as a whole, as we do. The reason for this is that they simulate a kpc-wide section of a galactic disc, which has more mass and lower spatial resolution than our cloud-scale simulations.
We have compared this method to, e.g. the Poisson sampling described in Sormani et al. (2016), and find properties such as the ionizing photon emission rate converge for large cluster masses. We discuss the consequences of this choice of massive star generation model in Section 5.7.

Stellar evolution and feedback model
We track the age of each massive star in the stellar objects described above, and the position of the sink it is attached to. At each timestep, we deposit photons in each energy bin described above onto the grid at the position of the sink and allow them to propagate. The emission rate of photons from a star of mass M s is calculated using a time-dependent stellar evolution model.
We extract spectra for individual stars from STARBURST99, Leitherer et al. (2014) using the rotating tracks from the Geneva model, Ekström et al. (2012) at solar metallicity. This is similar to the approach taken by da Silva et al. (2012) with more up-to-date stellar evolution tracks. Sample values for this model are given in Appendix B.
We integrate over the spectra in each photon energy bin to create a set of evolutionary tracks for each star at an interval of 5 M ⊙ , from 5 to 120 M ⊙ . In order to interpolate correctly between stellar tracks with different lifetimes, we divide the time in each track by the total lifetime of the star and interpolate between normalized stellar ages to find the photon emission rate for the star.
All of the simulations are stopped at the point that the first star in the cluster reaches the end of its lifetime (and should go supernova), or the point at which the mass in sink particles (here taken to be the cluster mass) plateaus, whichever happens earlier. We discuss the predicted contributions from the missing feedback processes to the cloud's evolution in Section 5.7.
We do not, in this first instance, implement other feedback processes such as stellar winds or supernovae, as in, e.g. Dale et al. (2014), Rey-Raposo et al. (2016), Gatto et al. (2017), or Peters et al. (2017. These processes heat the gas to high temperatures (∼10 8 -10 9 K) and significantly increase the cost of the simulations over simply including photoionization, which heats the gas to an equilibrium temperature of ∼10 4 K.

GLOBAL CLOUD PROPERTIES
In this section, we present an overview of the behaviour of each of the clouds in the YULE suite of simulations and their physical properties.

SFE versus IMF sampling
In Fig. 1, we plot the gas column density and sink particle positions for the 13 STARS simulations (in which we vary the IMF sampling used) at time 1.5 t ff ,i . e .t ff after gravity is turned on, where t ff = 4.2 Myr is the freefall time for the cloud as a whole. In all cases, there is a well-developed HII region. The white tracks represent where sinks containing stellar objects (i.e. our sampled massive stars) travel over the course of their lifetime. See Section 4.4 for discussion of which simulations have stars that travel furthest.

SFE versus initial velocity field
In the TURB set of simulations, we use the same IMF sampling but vary the random seed used to generate the initial turbulent velocity field. In Fig. 2, we plot all of these simulations t ff after the gravity is first turned on in the simulations. All of the clouds display a directionality, which is governed by the largest mode of the power spectrum used to generate this field. However, differences in the smaller modes change the small-scale structure of the cloud (see Fig. 3), and where the stars form, with some clusters being centrally concentrated and others being spread out along the longest axis of the cloud, as well as the resulting final SFE. Simulations YAG TURB and STL TURB show relatively little star formation after t ff , while in TIO TURB nearly all of the dense gas has been dispersed by ionizing radiation from the cluster.

Total star formation efficiency
In Fig. 3, we plot the total SFE of each simulation over time, which is calculated as the fraction of total initial gas mass of the cloud (10 4 M ⊙ ) that has been converted into sink particles. The sink particles represent the young star cluster. In the STARS simulations, the final SFE varies by a factor of ∼3 from 6 to 15 per cent, while in the TURB simulations, the final SFE varies from 5 to 23 per cent, though one of the lower SFE simulations is still forming stars at the point at which we stop the simulation. The time at which star formation begin varies in the TURB simulations, an effect also found in Girichidis et al. (2012a). The STARS simulations are identical before feedback begins, and so there is no variation in the time the first star is formed.
In the TURB runs, there is considerable difference between the times at which stars form and the SFR. In the STARS runs, since the initial structure of the cloud is the same and so the only difference is the number of photons emitted by the stars sampled from the IMF. The choice of stellar masses in the cluster produces less scatter in the SFE than the initial distribution of turbulent structures in the cloud. Girichidis et al. (2012a) also find that the initial gas structure is highly important in setting the SFRs of molecular clouds.
In the STARS simulations, the median SFE is 7.3 per cent ± 1.5 per cent. In the TURB simulations, the median SFE is 11.3 per cent ± 2.4 per cent (the errors here are the interquartile range). Note that these values are for individual studies changing a single randomly sampled effect, and the actual spread in SFE is likely to be larger as effects (IMF and turbulent velocity sampling) are combined. There are also some outliers, particularly BEF TURB , which has a SFE 50 per cent higher than the next highest (TIO TURB ) and 100 per cent higher than the median. It is possible that with more than 13 simulations in each set we would find much higher and lower values.
In Geen et al. (2017), where we simulate an identical cloud to the STARS set with and without feedback, the cloud without feedback tended towards a 100 per cent SFE over time. As authors such as Federrath et al. (2014) have shown, turbulence, magnetic fields and jets can suppress SFRs, however in an isolated system with no external forces, only feedback can prevent a SFE close to 100 per cent as an end state.
These results suggest that, naïvely, the SFE of a cloud similar to those in the nearby Milky Way environment (Geen et al. 2017)i s difficult to predict with reasonable accuracy. In Section 4, we determine whether linear relationships between initial cloud properties and the final SFE can be recovered.

Ionizing photon emission rate
We plot the total emission rate of ionizing photons in Fig. 4.O u r simulations implement a time-dependent ionizing photon emission rate based on stellar evolution models (see Section 2.5). As such, the number of photons emitted by a star will change over time. The global ionizing photon emission rate will also increase as new stars are formed. This can be seen in the TURB simulations shown in the right-hand panel of Fig. 4, where we use the same stellar masses whenever a new massive star is formed. A larger SFE thus results in more photons being emitted. The total ionizing photon emission rate is variable over time due to the stellar evolution model used (see Appendix B), which complicates the picture. We discuss the link between the number of photons emitted and the SFE in Section 4. Figure 1. Hydrogen column number density N H maps of each simulation in the STARS set of simulations, where the IMF sampling is varied but the initial cloud structure is the same. Each image is projected along the z-axis at t = 1.5t ff ,i.e.t ff after gravity is first turned on in the simulation. Each image is 78 pc across, zoomed into the central 70 per cent of the full box size (112 pc). White dots show the location of sink particles, with dot size proportional to sink mass. White lines show the track followed by each sink particle containing a massive star during the star's lifetime, with arrows at the end of each track.

HII region radius
In Fig. 5, we plot the radius of the HII region. We compare this to the time dependence of the analytically derived expansion rate given in Appendix C. In order to find a simple, robust HII region radius measurement, we locate all cells with hydrogen ionization fraction x HII > 0.1. We then find their ionized volume V = i V i x HII, i , where V i is the volume of the cell i. We then calculate the radius as (3V/4π ) 1/3 .
In the same figure, we overplot the slopes of the analytic model in equation (C1) assuming a flat (dotted line) and power-law (dashed line, with index −1.71) density field as fitted to Fig. 15.W es e t t = 0 as the time the first star is formed. Note that the expansion close to the edge of the box (≃60 pc) and beyond is inaccurate because material begins to leave the simulation volume (larger radii are found along directions diagonal to the Cartesian axes).
HII regions in denser clouds can stall or collapse due to pressure from gravoturbulence (see Geen et al. 2015bGeen et al. , 2016, for models of this process). This stalling is less important in the clouds modelled in this study due to the relatively low density of the simulated clouds (see Geen et al. 2017). For much denser clouds, we expect the ambient cloud medium to resist the HII region expansion more strongly.
The power-law solution for the radial expansion (dashed line) is a good match to the simulated radial expansion rate. The uniform cloud solution (dotted line) is shallower than the early Figure 2. Column density N H maps of each simulation in the TURB set of simulations, where the IMF sampling is kept the same but the initial turbulent velocity field is generated with a different random seed. The images are constructed as in Fig. 1. expansion, suggesting that indeed the expansion of HII regions should be described by considering the power-law density profile around the star(s) rather than measuring the average density of the cloud.
Note that we simply overplot the time dependence of the powerlaw solution in equation (C1), rather than a full solution for all simulations given in Geen et al. (2015a). There are some variations since the emission rate of photons changes (see Fig. 4), introducing an extra time dependence.
The ionization front reaches the edge of the cloud at 10 pc after roughly 0.3-2 Myr. However, dense clumps remain embedded and can continue to accrete. The position of the clumps in the cloud in relation to the position of ionizing radiation sources is thus key to understanding the final SFE of the cloud. We introduce models for this in Geen et al. (2015a), based on the clump evaporation models of Bertoldi & McKee (1990), though a more accurate model requires a more detailed understanding of where the clumps are as a function of time with respect to the ionizing sources.

Total momentum
All of the clusters generate considerable amounts of momentum in the cloud (up to 10 44 gc ms −1 ,s e eF i g .6) before flows leave the box and their momentum is no longer tracked. Most simulations exhibit a similar rate of momentum increase, although clusters with less efficient feedback (i.e. higher SFE) also generate less momentum.
Unlike the radius of the HII region discussed in Section 3.5, the analytically derived momentum injection rate shows less dependence on the density profile, with the fits for a uniform and  On the left are simulations in which we vary the mass of massive stars formed. On the right are simulations where the initial random seed of the turbulent velocity field is varied. We stop the simulations at the point when the first star dies, since we do not explicitly include a supernova model in these simulations. power-law profile giving similar gradients. Looking at the solutions for a uniform density field in Matzner (2002), the radial expansion of a cloud has a weaker dependence on time and photon emission rate than momentum, but a stronger density dependence. Based on this, we suggest that momentum of ionization fronts is more strongly influenced by the structure of the gas around it than the photon emission rate of the source, although this is important to obtain an accurate solution.
Much of this momentum is transferred to the ambient medium in the cloud. The whole box including the external medium has an initial mass of approximately 10 5 M ⊙ .1 0 44 gc ms −1 spread over 10 5 M ⊙ gives an average speed of 5 km s −1 . However, initially the HII region travels supersonically, since equation (C1) can givė r i >c i if w is large enough.
In this paper, we do not include momentum from direct radiation pressure. The largest photon emission rate in our study is approximately 3 × 10 49 s −1 (see Fig. 4). Assuming direct momentum deposition from photons over 4 Myr, we obtain 3 × 10 42 gcms −1 if the average photon energy is 13.6 eV (the ionization energy of hydrogen) or 10 43 gcms −1 if the average photon energy is 55 eV (the average energy in our HeII-ionizing photon bin). The maximum possible momentum deposition is thus non-negligible compared to the momentum in flows in the cloud prior to the first star being formed, but is an order of magnitude smaller than the momentum of the D-type photoionization front.
The picture so far is of HII regions that overtake their host clouds on the order of 0.3-2 Myr and escape into the external medium. At some point, these expanding HII regions stop the accretion onto The radius is calculated as follows. We remove all cells with an ionization fraction x HII lower than 0.1. We then calculate the volume of the ionized gas in these cells as V = i V i x HII, i ,w h e r eV i is the volume of the cell i and x HII, i is the ionization fraction in the cell. Finally, we calculate the radius as (3V/4π ) 1/3 .AsinFig.4, on the left are simulations in which we vary the IMF sampling. On the right are simulations where the initial random seed of the turbulent velocity field is varied. Note that above 60 pc (half the box length), flows begin to leave the box. Radii higher than this are diagonal to the Cartesian axes. We overplot the gradient of equation (C1) (i.e. the normalization is arbitrary, since it depends on various other factors). We solve equation (C1) for a flat density profile (dotted black line) and for a profile of w = 1.71, the power-law fit to Fig. 15 (dashed black line). We do this to illustrate the time dependence of the solution.  Fig. 4, on the left are simulations in which we vary the mass of massive stars formed. On the right are simulations where the initial random seed of the turbulent velocity field is varied. Note that all simulations have some momentum from turbulence before the first HII regions drive radial outflows. The drop in momentum around 10 44 gcms −1 is due to flows leaving the simulation volume. We overplot the gradient of the momentum solution derived from equation (C1) for a flat density profile (dotted black line) and for a profile of w = 1.71, the power-law fit to Fig. 15 (dashed black line), in order to illustrate the time dependence of the solution.
their host clouds and freeze out the star formation to give a final SFE. In the next section, we discuss whether or not we can identify trends in how this final SFE is obtained.

PREDICTING THE SFE OF A CLOUD
Thus far, we have established that it is difficult to accurately predict the SFE of a cloud knowing only its initial global properties such as mass, radius, or virial parameter. However, there may be some causal link between emergent properties of the cloud and cluster, and the resulting SFE. In other words, we wish to know whether a predictive, quantitative model for the feedback loop of star formation in clouds can be formulated, or whether star formation relations are the statistical combination of events that are mostly random on cloud scales. This process is complicated by the interaction between randomized model choices (i.e. IMF sampling and initial turbulent velocity field) and non-linear effects that will amplify small perturbations to the system.

Parameter selection
To understand which parts of the system are important for predicting the SFE, we must first identify which parameters are well correlated with the final SFE of each cloud, and which show no correlation. For some parameters, we must select a time at which the parameter is measured. We discuss the consequences of this choice of time in Section 4.4. We attempt to capture a large range of cloud properties that are relevant to the global cloud structure and cluster as a whole. In future work, we will revisit the data set to quantify more detailed properties of individual system components that influence the star formation feedback loop.
In the STARS set of simulations, we identify the following parameters: the most massive star (M max ), cluster compactness (R rms ,the rms distance of each sink particle from the centre of mass of the cluster after 1.5 t ff ,i . e .t ff after gravity is turned on), the mass of the first star formed (M first ), the peak photon emission rate (S * ,max ), the total number of photons emitted by the cluster (N tot ), the total number of photons emitted by the cluster in the first freefall time, i.e. 0.5 t ff after gravity is turned on (N ff ), and the mean distance travelled by a massive star during its lifetime (D track ).
In the TURB set of simulations, we focus on the global structure of the cloud. We make an ellipsoid fit to the gas density field using the algorithm described in HALOMAKER (Tweed et al. 2009). To do this, we include every cell above a density threshold n thresh and weight by the mass of the cell. We perform this fit in each simulation at 0.5, 1.0, 1.5, and 2.0 t ff . We identify the axes of the ellipsoid L E , M E, and S E , from longest to shortest. We then calcu-late the following derived structural parameters: triaxiality We chose a value of n thresh = 10 cm −3 , encompassing most of the cloud material above the background density of 1 cm −3 .Larger values led to worse correlations. Observational work (e.g. Lada, Lombardi & Alves 2010) and theoretical work (Geen et al. 2017) suggests that dense gas alone is a better predictor of recent star formation rather than the total star formation history of the cloud. We discuss this difference further in Section 5.5. We limit ourselves to the list of parameters mentioned above in the first instance, with the intention of expanding the scope of future parameter studies once we have understood the response of the system to this parameter set.
In Figs 7-9, we plot these parameters for each of the clouds against SFE. It is possible to visually identify various correlations in certain parameters, and poor correlations in others, although in all cases there is scatter in the results. In the next section, we discuss a Bayesian model used to determine which parameters have strong relationships with the SFE, and which have no clear effect on the outputs of the system.

Setup of model comparison
To identify relationships between each of these parameters and the SFE, we fit a series of generalized linear mixed models (GLMMs) using a Bayesian framework and Markov Chain Monte Carlo (MCMC) methods (Watson et al. 2018) .AG L M Mi saf o r mo f linear regression that allows for random effects in the linear predictors that do not have to follow a normal distribution. In this analysis, we assume a power-law relationship between the input variables and SFE, i.e. linear relationships in log space, where y ≡ log(SFE), I is the intercept, β is a list of fixed effects (i.e. regression coefficients corresponding to gradients in log space between SFE and each input variable, or power-law indices in linear space for each model), and v is a list of predictors or input variables in log space, such as the number of photons emitted in t ff (N ff )o r the length of the ellipsoid fit to the cloud (L E ). Each effect has a distribution of possible values, where zero indicates no relationship (i.e. the gradient between a given variable and SFE is flat). The posterior mean is the mean value of the effect, i.e. the expected regression coefcficient or intercept. The credible intervals (CIs) are here equivalent to the confidence interval, and are discussed below. See Van de Schoot & Depaoli (2014) and Harrison et al. (2018)for an introduction. The GLMMs are computed using 'RStudio', 2 a development environment for the statistical computing language 'R', 3 with the package 'MCMCglmm' (Hadfield 2010). MCMC chains were run for 10 5 iterations, with a burn-in period of 1000 iterations and a thinning interval of 10 iterations. The burn-in period is the number of iterations before any results can be sampled, to reduce the influence of the random starting point of the sampling process. The thinning interval is the interval between which samples are rejected to reduce autocorrelation between samples. All models are fit with uninformative priors, i.e. making no prior assumptions about the response of SFE to the predictor variables. To this end, we use a univariate inverse Wishart distribution with V = 1andν = 0.002, which approximates to a delta function at 0. All models use a Gaussian distribution and identity link function, i.e. using a normal distribution to link the predictor variables to the model output. Model convergence is assessed visually using trace plots of posterior distributions and acceptably low levels of autocorrelation are achieved by ensuring that all estimated parameters have an effective sample size of over 5000.
We examine variance inflation factors (VIF) to determine whether predictors have high collinearity and remove them accordingly. The VIF is the ratio of variance in a model with multiple terms and the 2 https://www.rstudio.com/ 3 https://www.r-project.org/ variance of a model with one term only. A high VIF means that a parameter can be linearly predicted from another parameter with high accuracy, and is thus unlikely to be an independent variable. In the STARS analysis, three variables (M max , N TOT ,andS * ,max )have a VIF of greater than 10 (since they can be calculated directly from the stellar evolution model for a fixed set of stellar masses) and are therefore removed from analysis. VIF for all variables in the TURB analysis are below 10.
For each analysis, we first run a 'Full' model containing all possible fixed effects (variables that SFE is thought to be dependent upon), a 'Null' model containing no fixed effects and a suite of models containing each possible combination of fixed effects to determine which configuration best predicts the data. To select the best-fitting model, we take an information-theoretic approach (see Burnham & Anderson 2003), meaning that a deviance information criterion (DIC) is computed from each model and compared. The DIC is a measure of quality for a given model, which maximizes the probability that a given model is correct using the maximum likelihood estimate for a set of parameters, while minimizing the number of parameters included in the model. A lower DIC indicates a better fit for the data. Where there is no clear best-fitting model, the most parsimonious (fewest predictors) model with the lowest DIC is chosen. In this paper, we use the Spiegelhalter et al. (2002) definition for DIC. Whether a given model fits the data better than another is determined by whether there is a substantial difference (>2) in DIC between competing models. See Table 2 for a summary, and Appendix D for model outputs and DIC values from the Full, Null, and Best-fitting models for each analysis.
In the TURB case, we carry out two further analyses. In the first, we fit a model using only R E as a predictor. In the second, we fit a model using only V E as a predictor. These factors could not be included in the previous analyses because they are derived from L E , M E ,and S E , which are already included in those models. Fixed effects are determined as having an important influence according to whether the '95 per cent CI' of their posterior distribution crossed zero. If a variable has a negligible effect, we expect its posterior distribution to be centred close to zero. An influential variable is expected to be shifted away from and not substantially overlapping zero (where zero indicates a flat gradient, i.e. the variable has negligible influence on the SFE).

Results of model comparison
In the STARS analysis, the best-fitting model contains only N ff (the number of photons emitted in the first t ff ). There was good evidence that N ff has a negative relationship with SFE (see Table 2 and Fig. 10). We do not include the mean distance travelled D track in the model. This is because it is an output parameter of the simulation rather than an initial condition such as the cluster properties or gas distribution. However, D track and SFE do show reasonable correlation, which we discuss in Section 4.5.
We perform the TURB analysis at a number of times to determine whether the evolution of the cloud affects the ability for the model to recover an SFE. At 0.5 and 1.0 t ff , the best-fitting models contain only L E as a predictor and there is good evidence that this variable has an important effect on SFE (see Table 2 and Fig. 11). At 0.5 and 1.0 t ff , in their respective models R E and V E are also found to have robust effects on SFE, since they are dependent on L E . At 1.5 and 2.0 t ff , there is no strong evidence of an effect of any parameter in the TURB set on the SFE.
We cross-check these results with a frequentist approach using the Pearson correlation coefficient and found reasonable agreement.

Physical significance
Of all the parameters derived from the IMF sampling (the STARS analysis), only the number of photons emitted in the first t ff = 4.2 Myr (i.e. 0.5 t ff after the gravity is first turned on) is a good predictor of the final SFE of the cloud. In Fig. 3, most of the star formation in the STARS set (left-hand panel) has happened before this time. Any further photon emission affects only residual star formation, or the ability of the cloud to recollapse and form more stars (Rahner et al. 2017a).
In the TURB analysis, clouds that are more compact have a higher SFE (see Fig. 11). It is curious that the longest axis L E is a better predictor than the other axis lengths M E or S E . One explanation is that our clouds are, by visual inspection and by looking at the distribution of triaxiality T, often filamentary (or close to spherical) rather than discy. This is supported by observations, where Konyves et al. (2015) find that 75 per cent of pre-stellar cores are in filamentary structures. For a fixed mass, longer filaments will have a lower concentration assuming that the width does not also strongly vary.
The fact that there is no clear relationship between the SFE and global cloud properties at later times (1.5 t ff and later) adds weight to the argument that the early evolution of clouds is most important in setting the star formation properties of the system. We discuss the significance of our results to observed measures of SFE in Section 5.5.

Distance travelled by the stars
In Fig. 8, there is a link between the SFE and the distance travelled by the stars D track . This suggests that when feedback is inefficient (i.e. SFE is high), stars travel further from the point at which they are formed. Fig. 1 supports this argument, since simulations that at 1.5t ff retain some amount of dense gas close to the cluster also show the longest distances over which the stars travel, shown as white lines. Gavagnin et al. (2017) find this effect and argue that it is due to N-body interactions between stars. Our mass resolution in this study is not sufficient to resolve the full stellar cluster N-body dynamics. Instead, as in Geen et al. (2017), we offer the following explanation. The star produces ionizing radiation that evaporates material from the gas clump. This causes the gas clump to accelerate away from the star via the rocket effect (Oort & Spitzer 1955). However, since the star and gas clump are gravitationally bound, the star also moves in the same direction. We illustrate this process with a cartoon in Fig. 12. This continues until the gas clump is completely evaporated or the rocket acceleration exceeds the gravity between the star and gas clump.
We visualize this process in the STU STARS simulations in Fig. 13, which has one of the longer track lengths in Fig. 8. Massive stars accelerate over their lifetimes in the direction of the nearest gas clumps. The second generation of stars form in clumps already moving away from the centre of the cloud due to HII regions from the first generation of stars, but this initial velocity accounts for only ∼10 per cent of the distance travelled by the stars. This picture is greatly simplified, since there is a complex interaction between a given clump-star pair and other stars and gas structures in the cloud. Table 2. Summary outputs for each reported model in Section 4.2. We determine in each case whether power laws between the input variables and SFE exist, i.e. linear relationships in log space (see equation 1). Each effect has a distribution of possible values, where zero indicates no relationship (i.e. the gradient between a given variable and SFE is flat). A pair of 95 per cent CIs that are both above or below zero indicate that there is a 95 per cent probability that the predictor has an effect on SFE. The posterior mean is the mean value of the effect, i.e. the expected regression coefficient or intercept. We sample the cloud structure variables at various times to determine whether there is a time evolution in the fit, i.e. whether the SFE can be predicted better at earlier or later times.  10. Predicted values of SFE (as a percentage of the initial gas mass converted into stars) against N ff (see Table 2). The red shaded area is the 95 per cent CI. Points show the sampled values in each of the STARS simulations.
This gives the counterintuitive result that, in the low SFE regime studied in this paper, inefficient feedback causes the stellar cluster to disperse, while efficient feedback that quickly removes most of the cloud material causes the cluster to remain closer to where it was formed. We do not resolve the full N-body dynamics of the cluster, since we do not form individual stars with masses below 1 M ⊙ ,so we cannot comment on whether this would alter our results.

DISCUSSION
In this section, we discuss the significance of our results, and how they should be interpreted in a wider context.

How predictable is the SFE?
This paper suggests that the SFE of clouds similar to those in the solar neighbourhood (by comparison of column density distributions in Geen et al. 2017) is difficult to predict. This makes a pure simulation approach to determining the final SFE of clouds across a wider parameter space of interstellar medium (ISM) properties prohibitively expensive, since the full parameter space of the problem requires multiple realizations to capture the range of values relevant to each stochastic process.
To first order, the SFE is closely linked to how many ionizing photons are emitted during the peak of star formation and how compact the cloud structure is. This is, at first glance, in agreement with existing models that include feedback processes that end star formation in a cloud (e.g. Matzner 2002;Krumholz et al. 2006;Goldbaum et al. 2011). However, it is clear from this work and others that the clouds are more complex than a spherical system with a central point source representing an accreting cluster, as   (Oort & Spitzer 1955), with the system as whole conserving momentum. If the gravitational force between the star and the clump is strong enough, the system remains bound, and the star-clump system is accelerated towards the left of the image until the clump is completely evaporated or the rocket acceleration exceeds the gravitational acceleration between the star and clump. many analytic models assume. While we recover some trends in SFE, scatter remains in our results, showing that the systems are not completely predictable, even when randomized model inputs are well constrained.
We have established that the ionizing photon emission rate at early times is linked to the SFE. In turn, the SFE sets the number of massive stars produced. Since our cloud mass is fixed at 10 4 M ⊙ , we form a new massive star every time the SFE increased by 1.2 per cent. Our median SFE in the STARS runs was 7.3 per cent, which equates to six massive stars. This means that there is a large Poisson sampling error on the results (the Poisson error for six samples is ∼40 per cent). As shown in Fig. 14,t h e stochastic effect of IMF sampling will become smaller in more massive clusters where the sampling of the high end of the IMF is more complete. For more massive clusters, Grudić et al. (2018)find a relatively clean correlation between initial surface density and SFE.
In this paper, we use a statistical model to understand to first order which properties of a cloud affect its evolution. A further stage will be to understand in more detail how these parameters feed into more predictive models to explain at what point star formation is terminated by feedback in molecular clouds. We have shown that existing theory for the expansion of HII regions in idealized conditions can be applied to explain more complex cloud systems, at least for simple aspects such as the expansion rate of and momentum injection from HII regions in non-uniform clouds.

Dispersion of ionizing photon emission rates
Stars form with masses distributed according to an IMF. In this paper, we randomly sample a Chabrier IMF 13 times to generate our STARS set of simulations. In Fig. 14, we provide some context for how this random sampling affects the ionizing photon emission rate S * . Here, we generate 100-1000 clusters for each cluster mass bin M cluster , and calculate S * for the cluster as a whole. Here, M cluster is the total mass of a cluster of stars. We do this by calculating S * for each star using the values given in Sternberg, Hoffmann & Pauldrach (2003), assuming that all stars are Class V (main sequence), and summing over the whole cluster.
We find a double power-law relationship in the median S * against M cluster . This turnover occurs around 1000 M ⊙ , and happens when the probability for the maximum stellar mass m max (=120 M ⊙ ) exceeds 0.5. The slope below 1000 M ⊙ thus depends heavily on the maximum stellar mass sampled, while the slope above 1000 M ⊙ is a linear relationship following the emission rate for a well-sampled population of stars. This turnover in m max is also found in Weidner, Kroupa & Bonnell (2009)   . Sequence of hydrogen column number density N H maps STU STARS simulation, demonstrating the motion of massive stars and nearby massive gas clumps moving outwards from the site of star formation. Each image is 78 pc across, zoomed into the central 70 per cent of the full box size (112 pc). White dots show the location of sink particles, with dot size proportional to sink mass. White lines show the track followed by each sink particle containing amassive star during the star's lifetime, with arrows at the end of each track. This is a highly simplified picture, which also contains effects from other stars and the shell around the HII region. the maximum mass of stars in a cluster of a given mass, although there is some dependence on the IMF used.
A key difference between our model assumptions and Weidner et al. (2009) is that for clusters between 1000 and 4000 M ⊙ the latter paper does not find many stars above 25 M ⊙ , suggesting a physical link between the distribution of the IMF and the host gas reservoir. Since we cannot resolve the full IMF ab initio in our simulations while producing a large sample of clouds, we cannot comment directly on whether this has a statistical or physical origin. In addition, we cannot comment on whether there is indeed a maximum stellar mass [e.g. star R136a1 has a mass over 300 M ⊙ (Crowther et al. 2016), although such stars are rare]. A closer examination of the link between the gas reservoir, the IMF and ionizing radiation from massive stars must be left to future work.
In this study, we find values for the final SFE ranging from 6 to 23 per cent, or M cluster = 600-2300 M ⊙ , which we overplot on Fig. 14. On the bottom panel of this figure, we overplot 50 and 200 per cent of the median S * for each mass bin. At the lower mass end, there is a factor of 4 range of values in the 25-75 per cent limit, with more convergence at higher masses. Precisely which cluster mass the simulation ends at depends on a complex interaction between the processes of gas accretion onto sinks and the dispersal of dense gas by HII regions driven by ionizing photons, and the steep profile of S * in this mass regime makes accurate SFE predictions difficult. Figure 14. The distribution of hydrogen-ionizing photon emission rates S * from stochastically sampled clusters of different masses. In each cluster mass bin, we generate N cluster clusters by stochastically sampling a Chabrier IMF until its mass reaches the desired cluster mass. Since there is more scatter at lower cluster masses, we generate more clusters for lower masses. To this end, N cluster is chosen heuristically using a step function where N cluster = 100 if M cluster > 500 M ⊙ , N cluster = 500 if 200 M ⊙ < M cluster < 500 M ⊙ ,and N cluster = 1000 otherwise. We calculate S * in a cluster by summing over all stars using a fit to Sternberg et al. (2003) for Class V/main-sequence stars.

Comparison to systemic variations in cloud properties
We should caution that in our analysis, we separate the IMF sampling and initial cloud structure as variables, and limit our analysis to a single cloud mass with global properties similar to clouds in the solar neighbourhood. In previous studies of photoionizing feedback, cloud mass has been found to influence properties such as the SFE, cluster unbinding (e.g. Dale et al. 2012;Dale, Ercolano & Bonnell 2013), and UV escape fraction (e.g. Howard et al. 2018). For example, in Dale et al. (2012), a very dense, massive cloud achieves an SFE close to 100 per cent, since photoionization feed-back becomes less effective. In most other cases, they find an SFE close to 10 per cent, and argue for a window in cloud surface density where photoionization feedback sets the final SFE of the cloud.
Protostellar jets can also reduce the short-term SFE of a clump by returning gas back into the cloud. Federrath et al. (2014)finda 25 per cent reduction in the SFE of a clump when jets are included, although this material is usually recycled back into the host cloud rather than being ejected from the system entirely.
In clouds formed by accretion from larger volumes, Vazquez-Semadeni et al. (2010) find an SFE from a few per cent to 20 per cent when feedback is included, which is somewhat consistent with our result, where we introduce random variation by hand into isolated clouds. Walch et al. (2011) argue that low-metallicity galaxies have low SFR and also exhibit more bursty star formation, with cold clump survival influenced by chemistry. Federrath & Klessen (2013) study the SFE of turbulent boxes by varying the form of the turbulence used. Since they do not include any processes where star formation is frozen out by feedback, they instead study the properties of clouds with a given SFE at a certain point in time, which allows comparison with observed clouds in a single state. They explore a range of 0-20 per cent SFE, with noticeable differences to the density PDFs when magnetic field strength, Mach number, and initial compressive versus solenoidal turbulence ratios are varied. This is in agreement with analytic models of SFRs given in, for example Hennebelle & Chabrier (2008), Federrath & Klessen (2012), Salim, Federrath & Kewley (2015), and Burkhart & Blakesley (2018).
It is difficult to compare our results directly to other authors since there are many systematic differences between the simulation setups and theoretical models used. A well-constrained comparison study would be needed to determine the role played by, e.g. differences in simulation methods (SPH, grids, and moving mesh), physics included, resolution, etc., as well as systematic differences in the bulk properties of the cloud. However, it seems clear that where feedback is effective (see Dale et al. 2012), the SFE freezes out at anywhere between a few to a few tens of percent, roughly on the order of our results.

Internal versus external processes
The question of how clouds are formed and their relationship to the host galaxy as a whole is an important one to address. Rey-Raposo, Dobbs & Duarte-Cabral (2015) have already raised the issue of cloud geometries, arguing that clouds extracted from a simulation of a galactic disc behave differently to isolated spheres. The importance of choice of initial turbulence has also been demonstrated by Goodwin et al. (2004) and Girichidis et al. (2011Girichidis et al. ( , 2012a. Our results clearly show a link between SFE and cloud geometry, with more elongated clouds producing fewer stars, although we cannot comment on how the galactic origin of the clouds affects their subsequent evolution. Ibáñez-Mejía et al. (2017) and Seifried et al. (2018) recently argued using simulations of kpc-wide chunks of a galactic disc that external influences, such as supernovae and tidal fields, are less important than internal processes. Dobbs & Pringle (2013) found that spiral arms play an important role in cloud formation, though.

Observational significance
We discussed in Geen et al. (2017) the relationship between the theoretical definition of SFE (the total amount of gas in a cloud Figure 15. The radial density distribution in the cloud at the time at which the first star forms in the STARS simulations, 3.38 Myr. On the left is the profile centred on the sink particle hosting the first star to form. On the right is the profile centred on the geometric centre of the simulation volume. To generate the profiles, we cast 10 4 rays through the volume with 1000 radial bins. The solid black line shows the median density in each radial bin. The dotted black lines show the 25 and 75 per cent percentiles. The grey lines show the maximum and minimum densities at each radius. The red shading is applied in bands of 2, with redder values being closer to the median. The dashed black line is the mean density profile, measured by binning 10 7 points uniformly sampled inside 12 pc around the origin. Note that our maximum spatial resolution is 0.03 pc, and values close to this radius suffer from low number statistics. converted into stars), and the observational definition (the ratio of recent star formation versus current gas mass above some column density threshold). This is significant because there is no clear link between the two definitions. An analysis of observable quantities in synthetic observations of the YULE simulations is left for future work. However, it will be valuable to extend our analysis to a comparison with observed clouds in order to obtain better statistics that allow us to interpret resolved star-forming regions in the Milky Way and Magellanic Clouds.
Observational studies typically measure SFE as the recent star formation in a system rather than an idealized total. Lada et al. (2010), Heiderman et al. (2010, Gutermuth et al. (2011), and others establish some link between local density peaks and recent star formation, with weaker or no correlation to global cloud properties. In selecting parameters in the TURB set of simulations for the statistical model in this study, we found a better relationship with total SFE when we sample the whole cloud using a low density cut-off than when we sample only the densest parts of the cloud using a higher density cut-off. One possible explanation is that star formation has trends on both short and long time-scales, and that the total SFE of a cloud is better correlated to large-scale cloud properties, while recent star formation is more closely linked to small-scale structures with shorter freefall times.
In Geen et al. (2017), we found that the observationally derived SFE as defined by Lada et al. (2010) varies in a single cloud identical to the STARS simulations from 8 to 30 per cent, a little larger than the observed variations in the nearby Gould Belt. We stress, however, that this measurement has no link to the total SFE of the cloud studied in this work and indeed varies with time, since the measurement considers only stars younger than 2 Myr and only gas above an extinction threshold of A k = 0.8.

Galactic context
Molecular clouds are embedded in the ISM of a galaxy, and the internal processes of these clouds set both the SFE and the amount of radiation and kinetic flows from the cloud as a function of time. The combined SFE of clouds in turn sets the SFE of a galaxy as a whole. The interface between molecular clouds and a galactic ISM regulates a number of additional quantities, such as galactic wind rates and ionizing photon escape fractions. Certain problems, such as the epoch of reionization, can be very sensitive to the precise behaviour of unresolved structures in cosmological galactic simulations (see e.g. Rosdahl et al. 2018), since both high escape fractions and high ionizing photon emission rates are required simultaneously.
Feedback propagating inside molecular clouds follows an analytic solution described by a power-law density field, since the star 'sees' this distribution as it sits on a density peak. This has two consequences. First, ionization fronts expand more quickly in a power-law density field than in a uniform medium, even exceeding the sound speed in the ionized gas. Secondly, however, the initial density that the front expands into is higher than the average density of the cloud. In denser clouds, this may make it harder for ionization fronts to expand if they are strongly countered by accretion onto the protostellar cores.
The SFE of our clouds has a large scatter, making naïve predictions difficult. On a Galactic scale, the large number of such clouds will smooth the distribution of SFE (e.g. Kruijssen & Longmore 2014;Leroy et al. 2016), although an accurate prediction of this distribution is still required. With ∼10 simulations in each sample, we are able to predict the efficiency of star formation given a set of initial conditions, but we cannot comment on the role of various issues such as subgrid model choices. We have also shown that some correlations can be uncovered when certain emergent parameters are measured, such as cloud geometry or photon emission rates early in the cloud's lifetime. Further research into these correlations will allow us to develop a clearer picture of how the parameter space of cloud-scale physics can be used to explain galaxy-scale SFRs and feedback efficiencies.

Limiting numerical choices
As always, we must make certain limiting choices in order to fit the problem into available computational resources. In this study, we emphasize the need for repeatability to provide statistics on cloud behaviour, which is not possible if only one simulation can be completed with the available resources. None the less, we can speculate how our results should change with a more complete physical model. This paper does not attempt to reproduce the IMF directly in the mass distribution of sink particles, as in, for example, Bate (2012). The advantage of this is that we are able to arbitrarily change the sampled IMF to test model predictions. However, there are limitations to this approach, particularly if we wish to explore whether clump evaporation by radiative feedback affects the IMF in the cloud. Gavagnin et al. (2017) suggest that ionizing radiation does indeed suppress the high mass end of the IMF. One important question is when massive stars form compared to other stars. This is still an open question, although Cyganowski et al. (2017)fi n d observational evidence of stars of all masses forming at similar times.
Another simplification is the lack of stellar winds. Winds, with a characteristic velocity of 1000 km s −1 or more, compared to 10 km s −1 for photoionized flows, significantly reduce the simulation time-step and thus increase the computational demand on the project. They can transfer significant momentum to the ISM (see, e.g. Fierlinger et al. 2015), but the interaction between winds and radiative processes is complex (e.g. Capriotti & Kozminski 2001;Rahner et al. 2017b). Dale et al. (2014) suggest that for the conditions of star formation considered here, photoionization dominates over winds on a cloud scale. On a kpc scale, Peters et al. (2017) find that the combined effect of radiation, winds, and supernovae causes a slightly lower SFR than when only winds and supernovae are included (in all cases including supernovae), although they do not discuss a case where only radiation is considered. In controlled studies around individual stars, Haid et al. (2018) find that stellar winds dominate the HII region only in conditions where the gas is already heated to ∼10 4 K, such as in the warm interstellar medium.
In addition, we omit radiation pressure. For more massive clusters, analytic models by, e.g. Rahner et al. (2017b) argue that winds, supernovae, and radiation pressure dominate the expansion of HII regions and cloud outflows. On its own, Reissl et al. (2018)findthat the radiation pressure on dust grains almost never disrupts clouds with an SFE below 50 per cent over a mass range from 10 4 to 10 7 M ⊙ .H a w o r t he ta l .( 2015) and Kim, Kim & Ostriker (2018) also argue that photoionization is indeed the dominant effect from photons in HII regions in both uniform density fields and turbulent clouds under the conditions studied in this paper.
All of our clouds are eventually destroyed and reach a plateau in SFE before the first star dies. We thus do not expect supernovae to play a role in shaping these clouds. However, for longer lived clouds such as in the Large Magellanic Cloud (e.g. Kawamura et al. 2009), some supernovae might occur within the lifetime of the cloud, affecting its evolution. In Geen et al. (2015a), we find that supernovae exploding into pre-existing HII regions transfer significantly more momentum to the ISM, since a low-density region has been carved out. However, in Geen et al. (2016), where we simulate a dense cloud and clumps remain embedded in the pre-supernova HII region, the supernova produces less momentum than in studies where gravoturbulence is less important, foexample Iffrig & Hennebelle (2015), Kim & Ostriker (2015), Martizzi, Faucher-Giguere & Quataert (2015), and Körtgen et al. (2016).
All of our models use the Geneva rotating star evolution tables. A discussion of variable stellar rotation, binary interactions, etc., is beyond the scope of our model, but will, of course, be important if we wish to understand the full parameter space of star formation.

CONCLUSIONS
In this paper, we present the YULE suite of simulations, which are 3D RMHD simulations of the same cloud performed with different IMF sampling and initial turbulent velocity seeds, using an initial density distribution similar to clouds in the solar neighbourhood. We present a model for the expansion of HII regions into molecular clouds that shows good agreement with the behaviour of our clouds.
The main result of this paper is that the SFE of the cloud (measured as the total fraction of the initial mass converted to stars) can be anywhere between 6 and 23 per cent depending on the parameters that statistically vary within a cloud with a fixed set of initial global properties. This suggests that the SFE of clouds in the solar neighbourhood is difficult to predict. In addition to capturing a large parameter space of physical properties, simulations or other theoretical models must also capture the full range of statistical variation in the input parameters.
We compute a Bayesian model that identifies relationships between the SFE and various emergent properties of the cloud and cluster. When we vary the sampling of the IMF, the most significant parameter is the total number of photons emitted in the first freefall time, or approximately 2 Myr after the first star was formed. More photons give a lower SFE.
When varying the initial velocity structure, the most important parameter is the long axis of the ellipsoid fit to the cloud, since most of our clouds evolve more filamentary structures. Derived quantities such as the mean radius and volume were also significant. The more filamentary or extended the cloud, the lower the SFE. We suggest that the length of the cloud is important because the mass in our sample is fixed, so longer clouds are on average less compact. We find that sampling the whole cloud rather than only the densest regions provides a better relationship with the total SFE. This is in contrast to observational studies, which find a correlation between density peaks and recent star formation.
There is also an apparent link between the SFE and the distance massive stars travel. We suggest that this is because if feedback is inefficient (resulting in high SFE), dense gas clumps remain embedded for longer inside the cloud, and these clumps remain gravitationally bound to nearby stars. When these stars produce ionizing radiation the surface of these clumps is evaporated, and the star-clump system is accelerated via the rocket effect, where momentum is conserved with the photoevaporated gas ejected in the opposite direction. This has the counterintuitive result that, in this regime, efficient feedback results in less cluster dispersal, while inefficient feedback causes the cluster to expand. This work has been funded by the European Research Council under the European Community's Seventh Framework Programme (FP7/2007(FP7/ -2013. SG and RSK have received funding from grant agreement no. 339177 (STARLIGHT) of this programme. RSK further acknowledges support from the Deutsche Forschungsgemeinschaft in the Collaborative Research Centre SFB 881 'The Milky Way System' (subprojects B1, B2, and B8) and in the Priority Program SPP 1573 'Physics of the Interstellar Medium' (grant nos KL 1358/18.1 and KL 1358/19.2). PH has received funding from grant agreement no. 306483. JR acknowledges support from the ORAGE project from the Agence Nationale de la Recherche, grant ANR-14-CE33-0016-03. Figure B1. Ionizing UV photon emission rates versus time for a sample of star masses in stellar evolution model (our model interpolates between tracks sampled every 5 M ⊙ from 5 to 120 M ⊙ ). Solid lines show the lowest energy bins that can ionize hydrogen, dashed lines show the next highest energy bin that can ionize helium once, and dotted lines show the highest energy bin that can ionize helium twice.  This paper has been typeset from a T E X/L A T E X file prepared by the author.