Maximally-localised Wannier functions (MLWFs) are routinely used to compute from first-principles advanced materials properties that require very dense Brillouin zone integration and to build accurate tight-binding models for scale-bridging simulations. At the same time, high-throughput (HT) computational materials design is an emergent field that promises to accelerate reliable and cost-effective design and optimisation of new materials with target properties. The use of MLWFs in HT workflows has been hampered by the fact that generating MLWFs automatically and robustly without any user intervention and for arbitrary materials is, in general, very challenging. We address this problem directly by proposing a procedure for automatically generating MLWFs for HT frameworks. Our approach is based on the selected columns of the density matrix method and we present the details of its implementation in an AiiDA workflow. We apply our approach to a dataset of 200 bulk crystalline materials that span a wide structural and chemical space. We assess the quality of our MLWFs in terms of the accuracy of the band-structure interpolation that they provide as compared to the band-structure obtained via full first-principles calculations. Finally, we provide a downloadable virtual machine that can be used to reproduce the results of this paper, including all first-principles and atomistic simulations as well as the computational workflows.

The combination of modern high-performance computing, robust and scalable software for first-principles electronic structure calculations, and the development of computational workflow management platforms, has the potential to accelerate the design and discovery of materials with tailored properties using first-principles high-throughput (HT) calculations^{1–4}.

Wannier functions (WFs) play a key role in contemporary state-of-the-art first-principles electronic structure calculations. First, they provide a means by which to bridge lengthscales by enabling the transfer of information from the atomic scale (e.g., density-functional theory and many-body perturbation theory calculations) to mesoscopic scales at the level of functional nano-devices (e.g., tight-binding calculations with a first-principles-derived WF basis)^{5,6}. Second, the compact WF representation provides a means by which advanced materials properties that require very fine sampling of electronic states in the Brillouin zone (BZ) may be computed at much lower computational cost, yet without any loss of accuracy, via Wannier interpolation^{7}.

Among several variants of WFs^{8}, maximally-localised Wannier functions (MLWFs), based on the minimisation of the Marzari–Vanderbilt quadratic spread functional Ω, are those most employed in actual calculations in the solid state^{8}. One ingredient in the canonical minimisation procedure is the specification of a set of initial guesses for the MLWFs. These are typically trial functions localised in real-space that are specified by the user, based on their experience and chemical intuition. As shall be described in more detail later, in the case of an isolated manifold of bands, the final result for the MLWFs is almost always found to be independent of the choice of initial guess^{9}. In the case of entangled bands^{10}, however, this tends not to be the case and the choice of initial guess strongly affects the quality of the final MLWFs, presenting a challenge to the development of a general-purpose approach to generating MLWFs automatically without user intervention.

Several approaches have been put forward to remove the necessity for user-intervention in generating MLWFs, including the iterative projection method of Mustafa et al.^{11}, the smooth orthonormal Bloch frames of Levitt et al.^{12}, and the automated construction of pseudo-atomic orbitals rather than WFs as the local basis to represent the target space, as described by Agapito et al.^{13–15}. In addition, some ^{16–19}.

A recently proposed algorithm by Damle et al.^{20,21}, known as the selected columns of the density matrix (SCDM) method, has shown great promise in avoiding the need for user intervention in obtaining MLWFs. Based on QR factorisation with column pivoting (QRCP) of the reduced single-particle density matrix, SCDM can be used without the need for an initial guess, making the approach ideally suited for HT calculations. The method is robust, being based on standard linear-algebra routines rather than on iterative minimisation. Moreover, the authors have proposed an efficient algorithm for the QRCP factorisation that operates on a smaller and numerically more tractable matrix than the full density matrix. Finally, SCDM is parameter-free for an isolated set of composite bands, and requires only two parameters in the case of entangled bands together with the choice of the target dimensionality for the disentangled subspace (i.e., the number of MLWFs required). We emphasise here that the SCDM method can be seen as an extension to solid-state periodic systems of the Cholesky orbitals approach of Aquilante et al.^{22}, that has been developed from a quantum-chemistry molecular perspective for finite systems. SCDM focuses instead on periodic systems, and it is based on a real-space grid discretisation of the wavefunctions. We discuss in more detail this equivalence in the “The SCDM algorithm and its physical interpretation” section and in the “Methods” section of the

In this article, we present a fully-automated protocol based on the SCDM algorithm for the construction of MLWFs, in which the two free parameters are determined automatically (in our HT approach the dimensionality of the disentangled space is fixed by the total number of states used to generate the pseudopotentials in the density functional theory (DFT) calculations). We have implemented the SCDM algorithm in the PW2WANNIER90 interface code between the Quantum ESPRESSO software package^{23} and the WANNIER90 code^{24}. We have used our implementation as the basis for a complete computational workflow for obtaining MLWFs and electronic properties based on Wannier interpolation of the BZ, starting only from the specification of the initial crystal structure. We have implemented our workflow within the AiiDA^{25,26} materials informatics platform, and we used it to perform a HT study on a dataset of 200 materials.

We anticipate here that our scheme works extremely well for our purposes, i.e., band-structure interpolation of both insulating and metallic systems with Wannier functions, but is less suitable for other applications where, for instance, a specific symmetry character of the WFs is required. It is worth mentioning that there are other approaches for constructing Wannier functions, which are based on a minimisation procedure and therefore require an initial guess^{27–29} and which could also be automated in a similar fashion. In this work however, we focus only on the automatic generation of maximally-localised Wannier functions. We also note that there exist efficient non-Wannier-based techniques for band-structure interpolation, e.g., Shirley interpolation^{30,31}. Whilst these approaches have their own advantages, they do not provide the same insight afforded by a real-space, localised description of the electronic structure, which can often be very helpful for understanding and computing advanced properties.

The manuscript is organised as follows. First, we present a summary of the background theory, starting with MLWFs for isolated and entangled bands followed by the SCDM algorithm, where we focus in particular on providing a physical interpretation of the method. In the “Results and discussions” section, we first provide a preliminary comparison, for a few well-known materials, between MLWFs obtained via the conventional method (i.e., with user-defined initial guesses) and those obtained from SCDM. We then proceed to show the validation of the SCDM method and our workflow for the valence bands of 81 insulating materials. We then discuss our automated protocol to determine the free parameters in the case of entangled bands and validate it on a dataset of 200 semiconducting and metallic materials. Finally, details on the implementation of the SCDM method in PW2WANNIER90 and of the AiiDA workflow are presented in the “Methods” section.

We summarise in this section the main concepts and notations related to maximally-localised Wannier functions that will be useful in the rest of the paper, following the notation in ref. ^{8}.

A Wannier function associated to a band ^{32}

The gauge freedom of the Bloch state under multiplication by a ^{8,9}. In order to obtain a minimal TB basis set it is therefore beneficial to select the optimal phases that minimise the total spread, so that overlaps and Hamiltonian matrix elements between different Wannier functions decay rapidly to zero as a function of the distance between their centres. Since the integral transformation in Eq. (^{33}.

For an isolated set of ^{(k)} is a unitary matrix that, at each wave vector ^{(k)} such that ^{34}). Different approaches have been put forward^{35–39} to generate well-localised WFs. In the Marzari–Vanderbilt (MV) approach^{9}^{(k)} is chosen to minimise the sum of the quadratic spreads of the WFs, given by_{n} ≡ 〈_{n0}∣ ⋅ ∣_{n0}〉 and ^{40–42} in quantum chemistry.

The total quadratic spread Ω may be separated into two positive-definite terms: ^{8,9} Ω_{I} is gauge invariant, whereas ^{(k)}). For an isolated group of bands, therefore, Ω_{I} is evaluated once and for all in the initial gauge and minimising the total spread Ω is equivalent to minimising only the gauge-dependent part

For crystalline solids with translational symmetry, it is natural to work in reciprocal space, henceforth referred as ^{33} for the representation of the position operator ^{2} in ^{9}_{b} come from the finite difference representation of the gradient operator in _{k}), and ^{(k,b)} is given by^{8,9}).

Interestingly, even though the global minimisation of Ω fixes the gauge, a certain degree of non-uniqueness may remain for instance if the minimum is very shallow or flat as in the case of LiCl^{9}. This results in different configurations to be degenerate and therefore different solutions (usually related by a global rotation of the MLWFs) can be obtained depending on the initial guess. Moreover, MLWFs are only defined modulo a lattice vector by definition.

In many applications, the group of bands of interest are “entangled”, i.e., are not separated by an energy gap from other bands throughout the whole Brillouin zone.

Souza, Marzari, and Vanderbilt^{10} (SMV) proposed a “disentanglement” strategy that involves two steps. In the first step, one defines an energy window that encompasses the states of interest and which contains _{I}. Heuristically, Ω_{I} represents the “change of character” of the states across the Brillouin zone. (For a rigorous derivation see ref. ^{9}.) The subspaces ^{dis(k)} are rectangular _{J} being the ^{dis(k)} to minimise Ω_{I}, which, as discussed earlier, is a measure of the “spillage” between neighbouring subspaces ^{10}.

In the second step, having defined a ^{8,10}.

The iterative minimisation of Ω_{I} starts with an initial guess for the subspaces ^{9} (in the absence of spin–orbit coupling, the WFs at the global spread minimum are expected to be real^{43}). For gradient-based minimisation methods, thus, the ability to reach the global minimum strongly depends on the choice of an appropriate starting point, sufficiently close to the final solution. To this aim, if one has a chemical intuition of the target _{n}(^{dis(k)} can then be obtained by orthonormalising the projected guess orbitals ^{(k)}:_{n}(_{nk}(^{(k)} are the (random) phases of the Bloch states that are computed by the ab initio code. In the case of isolated bands, even a poor initial choice such as this is often sufficient to reach the global minimum of the spread functional (with enough iterations of the minimisation algorithm). Conversely, in the case of entangled bands, the two-step “disentanglement” procedure is usually unable to reach the global minimum of the spread functional unless the initial trial orbitals are already quite close to the final solution.

This strong dependence of the SMV minimisation algorithm on the initial trial functions, and hence on the user’s intuition and intervention, has been the main obstruction in the development of fully-automated workflows for generating MLWFs for high-throughput applications.

An alternative method to the SMV approach described in the “Introduction” has recently been proposed by Damle et al.^{20,21} in the form of the aforementioned selected columns of the density matrix (SCDM) algorithm. The method uses a QR factorisation with column pivoting (QRCP)^{44} of the single-particle density matrix (DM),^{20,21} for additional details.

For clarity, we start by considering a system sampled at a single

Let us first recall that ^{45–47}. In particular, this means that for a given fixed _{0}, and that this projection is an exponentially-localised orbital.

To understand the numerical implementation of the method, we consider from now on the real-space discretised version of the DM. The _{nk}(_{G} points in real space _{G} ×

With this definition, the orthonormality condition is written as ^{†}_{J}, while the density matrix (which in discretised form is an _{G} × _{G} matrix) can be written as ^{†}, i.e.,

We can now interpret the _{j} that is zero everywhere except at the _{j}). This statement is the discretised version of the projection of a delta function in Eq. (_{j} is the discretised version of _{j}). Therefore, thanks to the near-sightedness principle, the orbitals represented by the columns of the DM are localised.

This statement is at the core of the SCDM method. In fact, when searching for Wannier functions, we are looking for a complete and orthogonal basis set of _{G} and the set of all these _{G} orbitals is redundant. In addition, these orbitals are not orthogonal—intuitively, projecting on delta functions centred at two neighbouring points will typically result in a large overlap between the projected orbitals—and not normalised (e.g., in the limiting case of a delta function centred at a position in space where there is no charge density, the resulting projection will have zero norm). Selecting any set of ^{48}). Equivalently, as every column is the projection of a delta-like test orbital centred at _{j}, we can say that the SCDM algorithm selects _{G} grid points, that define the “most representative” localised projected orbitals.

To achieve this goal, SCDM uses the standard linear algebra QRCP method^{44}, which factorises a matrix _{ij} = 1, and all of the other elements in the

QRCP (a greedy algorithm) selects columns as follows: since _{11}∣^{2} and must be the largest possible, therefore the algorithm will choose the column with the largest norm. The second column _{22}∣^{2} that, due to the properties of ^{22}, that applies to finite (non-periodic) systems and for a different basis set (a basis of atomic orbitals rather than a real-space grid discretisation). In particular, the Cholesky algorithm used in ref. ^{22} is a refined version of the original Cholesky decomposition specifically adapted for positive semi-definite matrices, i.e., Cholesky decomposition with full column pivoting (CholCP) ^{44}. Finally, the two methods use undoubtedly related ideas but they are not direct analogues since there are multiple “variants” of SCDM when using localised orbitals.

For an effective practical implementation of the method, a final step is required. In fact, the _{G} can be of the order of 100,000 or more (while ^{†} and that the original columns of Ψ are orthonormal, one can prove (see Methods section of the ^{†} (of size _{G}), with a computational cost that scales as ^{†}Π) may be used as the _{mn} projection matrix of Eq. (

Finally, it is worth noting the connection with the “canonical” approach of user-defined initial guesses (e.g., atomic-like orbitals at specified centres): the SCDM method may be thought of as using as initial guesses a set of extremely localised

We now extend the discussion to the case of ^{43,49}, and it is also proven that WFs with an exponential decay exist^{50}; numerical studies for the specific case of MLWFs have confirmed this claim for several materials^{50,51}, and recently there has been a formal proof for 2D and 3D time-reversal-invariant insulators^{43}. The SCDM method has been extended also to the case of ^{21} and named in this case “SCDM-_{k}. Reference^{21} discusses extensively how the method can be extended to a _{k=Γ} only. Then, this selection of columns can be used for all other

Finally, the extension to the entangled case (e.g., for metals or when considering also the conduction bands of insulators and semiconductors) has been proposed in ref. ^{21}. In this case, a so-called quasi-density matrix is defined,_{nk}) is an occupancy function. The isolated-bands case can be recovered by setting _{nk}) = 1 for energy values _{nk} within the energy range of the isolated bands, and zero elsewhere. For the typical cases of interest of this work (metals, and valence bands and low-energy conduction bands in semiconductors and insulators), one needs bands up to a given energy (typically slightly above the Fermi energy). Then, as suggested in ref. ^{21},

The algorithm then proceeds as in the case for isolated bands, computing the QRCP factorisation on the quasi-density-matrix or, in practice, on the matrix _{k} a diagonal matrix with matrix elements ^{(k)}, and the ^{dis(k)} matrices of Eq. (

The SCDM algorithm is able to robustly generate well-localised functions that are used to generate Wannier functions without the need for an initial guess. Whilst this makes the algorithm well-suited for direct integration within HT frameworks, the selection of the columns cannot be controlled by external parameters (at least for isolated bands), and therefore it is not possible to enforce constraints that might be desirable, such as point symmetries. On the contrary, when explicitly specifying atomic-like initial projections, these (if appropriately chosen) provide at least some degree of chemical and symmetry information. In the “SCDM vs MLWFs in well-known materials” section, we discuss how this affects the WFs obtained by the algorithm. Our aim is to leverage on the ability of SCDM to automatically generate a good set of localised functions, and to use these to seed the MV algorithm for the minimisation of the total spread functional, which will give in turn an automated protocol to generate MLWFs. Being able to automatically generate MLWFs will also allow users to seamlessly exploit the set of computational tools that have been developed in recent years for MLWFs and implemented in various codes, such as WANNIER90. In practice, this entails employing the SCDM algorithm to compute the ^{(k)} matrices of Eq. (_{n} are obtained from the first _{Γ}(_{Γ}.

It is worth noting that the SCDM method can be also combined with the SMV disentanglement procedure, as a means of seeding the initial subspace projection. However, this introduces two additional parameters associated with the SMV approach, namely _{outer}, and _{inner}, giving a total of four parameters (together with _{outer} defines the upper limit of the so-called “outer” energy window discussed in the “Introduction” section, and _{inner} defines the upper limit of a smaller energy window contained within the outer energy window. This inner window is used to “freeze” the Bloch states within during the minimisation of Ω_{I}, such that they are fully preserved within the selected subspaces ^{10} for a comprehensive description of the outer and inner energy windows). Each additional parameter makes it increasingly difficult to find a robust and automated protocol for obtaining MLWFs. Consequently, when combining SCDM with SMV disentanglement, an optimal selection of all the parameters can be achieved only in an ad hoc, non-automatic fashion (hence only for few materials). As shown in the “The SCDM algorithm and its physical interpretation” section, SCDM employs a generalised form of the density matrix, Eq. (

As a precursor to the fully-automated high-throughput study on a set of 200 materials that focuses on automatic Wannierisation and band interpolation from SCDM projections and which will be presented in the “Entangled bands” section, in this section we consider in greater depth and detail the performance of the SCDM method on a small set of simple systems with well-known Wannier representations of the electronic structure. Specifically, we compare quadratic spreads, centres and symmetries of the WFs computed from the SCDM gauge (as described in the “The SCDM algorithm and its physical interpretation” section) with the ones computed from carefully chosen initial projections. Comparative studies between SCDM localised functions and MLWFs on well-known materials have recently appeared in the literature^{21,29}. However, here we expand on different aspects, focusing in particular on the combination of the SCDM and the MV approaches (SCDM+MLWFs), to better assess its range of applicability, for instance for beyond-DFT methods, e.g., ab initio tight-binding^{52,53}, DFT+U^{54–56}, and DMFT^{57,58}, where the symmetries of the Wannier functions are important.

All DFT calculations have been carried out with Quantum ESPRESSO, using the PBE exchange-correlation functional and Vanderbilt ultrasoft pseudopotentials^{59}. MLWFs are generated from Bloch states calculated on a 10 × 10 × 10 Monkhorst–Pack grid of ^{24,60}, as explained in “Methods”. WANNIER90 is used throughout this work to generate the WFs on a real-space grid and to perform the interpolation of band structures in reciprocal space.

We consider four different schemes for generating Wannier functions: (1) Full minimisation of Ω using the SMV disentanglement algorithm to minimise Ω_{I} and the MV algorithm to minimise _{I} only, using the SMV algorithm (DIS); (3) Minimisation of

We start by studying the Wannierisation of a manifold of bands consisting of the four valence bands plus the four low-lying conduction bands in silicon, the latter being entangled with bands at higher energies. For the SCDM method, we use ^{21}, taking into account a shift in the absolute energy scale, which shifts the value of _{outer} = 17.0 eV and _{inner} = 6.5 eV.

When using initial projections onto atomic-like orbitals, we find that the spread functional Ω has three minima that are very close to each other and each of which gives eight real MLWFs. The global minimum corresponds to four ^{3}-type MLWFs per Si atom in the two-atom unit cell, oriented in a back-bonding (BB) configuration, i.e., with the major lobes of the ^{3}-type MLWFs pointing towards the tetrahedral interstitial sites. A representative example of one such BB MLWF is shown in the isosurface plots in the first row of Fig. ^{3}-type MLWFs to be in a front-bonding (FB) configuration, i.e., with the major lobes pointing towards the vertices of the tetrahedra centred on the two non-equivalent Si atoms, as shown in the isosurface plots in the second row of Fig. ^{3}-type MLWFs that are in the BB configuration on one Si atom in the unit cell and four ^{3}-type in the FB configuration on the other Si atom. At variance with what is stated in ref. ^{29}, all these cases can be found by specifying as initial projections four appropriately oriented ^{3}-type orbitals on each Si atom in the unit cell. For the BB configuration: four ^{3}-type orbitals centred on the Si atom at (0.0, 0.0, 0.0) (Si1), and four rotated ^{3}-type orbitals centred on the other Si atom (Si2) at (_{1} = (−5.10, 0.00, 5.10), _{2} = (0.00, 5.10, 5.10) and _{3} = (−5.10, 5.10, 0.00) (in _{0}). In the WANNIER90 code this can be specified in the projection block of the input file as: Si1:sp3:z = 0,0,−1:x = 0,1,0; Si2:sp3. For the FB configuration: same as above but with the labels 1 and 2 on the Si atoms interchanged.

First row: the initial subspace is defined by projecting the Bloch states ψ_{nk}(^{3}-type orbitals giving back-bonding (BB) MLWFs in all cases. Second row: as above but with different orientations for the ^{3}-type orbitals, resulting in front-bonding (FB) MLWFs in all cases. Third row: the initial subspace is obtained from the SCDM method. Here, the eight ^{3}-type WFs are in the BB configuration only when a full minimisation is performed. In all other cases, a mixture of configurations is obtained instead. The values below each WF isosurface (isovalue = ±0.45 Å^{−3/2}) is the value of the individual spread in Å^{2}.

With these initial projections, the four different minimisation options described earlier give the same qualitative results. Going from the DIS+MLWF case to DIS to MLWF to proj-ONLY, the spreads of the MLWFs increase, as expected, but the FB/BB character is consistently present (see the top two rows of Fig. ^{2}) is reported underneath each isosurface plot). Performing the SMV disentanglement step results in a reduction of Ω_{I} from 26.54 to 20.06 Å^{2} in both the FB and BB cases, showing that the initial and final selected subspaces from the two different choices of projection have the same intrinsic smoothness.

Instead, starting from SCDM to define the initial subspace, we obtain different qualitative results for the four different minimisation schemes. Wannier functions in the BB configuration are found when a full minimisation is performed (i.e., SCDM followed by SMV and MV minimisation). A representative example of one such WF is shown in the third row and first column of Fig. _{I} = 27.54 Å^{2}) than specifying atomic orbital initial projections (26.54 Å^{2}), but the final spreads are the same as in the equivalent BB case with atomic orbital initial projections. We also observed that in the case of SCDM, the minimisation of both Ω_{I} and

When looking at the interpolated band structure, however, a different picture emerges. In the case of choosing atomic orbital projections, the interpolation is very poor if no SMV disentanglement step is included in the minimisation. This shows the importance of disentangling the correct manifold and it is in agreement with what has been previously reported in the literature^{8}. On the other hand, in the case of an SCDM-generated initial subspace, the interpolation is only marginally affected by the minimisation scheme employed (see Fig.

To summarise, in silicon SCDM performs very well when combined with full spread minimisation, both in terms of the symmetries of the WFs and band interpolation (see Fig.

Copper presents a paradigmatic case of a noble metal where a set of bands (e.g., of ^{53}, DFT+U^{55}, and DMFT^{58}, which deal with strong correlation in a local subspace, e.g., the subspace spanned by ^{10}, in order to generate a faithful representation of the band structure around the Fermi level, we work with a manifold of dimension _{outer} = 38.0 eV and _{inner} = 19.0 eV. For SCDM, we set ^{10}, appropriately selected initial projections are five _{h} point group (which is isomorphic to the site-symmetry group of the origin), with the usual _{2g} and _{g} character (see Fig. _{1}) of _{d} (which is the site-symmetry group of the tetrahedral interstitial sites), as shown in Fig.

First column: three representative MLWFs obtained from using atomic orbital projections to define the initial subspace (see main text for description). Panel (_{g} character; panel (^{2} is reported. Isosurfaces are plotted with an isovalue of ±0.45 Å^{−3/2}.

When using SCDM projections, the symmetries of the _{2g}/_{g} character (this is a feature of all five

Until here, we have looked into the details of the Wannier functions that can be obtained from SCDM projections, by focusing on the paradigmatic examples of silicon and copper (see “SCDM vs MLWFs in well-known materials”). We focused on comparing Wannier functions as obtained by adopting different initial projections, given that good atomic-like projections can often be easily identified through chemical intuition. Now we take a complementary perspective, by considering any given crystal structure, where we face the problem of finding good initial projections without any prior chemical knowledge of the system. This is particularly relevant for high-throughput studies, where crystal-structure databases are systematically screened with first-principles simulations. In order to produce high-throughput Wannier functions, it is fundamental to provide an algorithm that does not require human interaction in the choice of the initial projections. In addition, such an algorithm must be able to use only information that is either contained in the crystal structure and the pseudopotential, or that can be computed by a simple first-principles simulation, such as the projected density of states. To this aim, human-specified atomic-like projections are not suitable, and we propose the SCDM method as the workhorse for the automated choice of the initial projections.

In order to ascertain the effectiveness of the SCDM method in generating well-localised Wannier functions in an automated way, we start by testing the algorithm for isolated manifolds. We compare Wannier interpolations and direct DFT calculations for the band structure of the valence bands of a set of 81 insulating bulk crystalline materials spanning a wide range of chemical and structural space, for the full list the Reader is referred to ref. ^{61}. We quantify the differences between two band structures by introducing a simple metric that is inspired by the so-called “bands distance” introduced in ref. ^{62}. Here we define the distance between DFT and Wannier-interpolated bands as:^{62}, to take into account the possibility that significant differences between band structures may occur only in sub-regions of the Brillouin zone or in small energy ranges, we also compute

For each of the 81 structures of the benchmark set, we first perform a variable-cell optimisation and we then compute the band structure on a high-symmetry path using DFT. The cell and the path are standardised using seekpath according to the prescription of ref. ^{63}. The ground-state charge density is obtained using a ^{−1} in the irreducible Brillouin zone (unless otherwise stated). Band structures are then calculated using the charge density frozen from the earlier calculation and sampling the high-symmetry path with a spacing of 0.01 Å^{−1}. Then we compute the WFs and the real-space Hamiltonian with WANNIER90, starting from a non-self-consistent field (NSCF) DFT calculation performed on a possibly different ^{64} on the same

All DFT calculations are carried out using the Quantum ESPRESSO distribution^{23}, employing the PBE functional^{65} and a beta version of the SSSP v1.0 efficiency pseudopotential library^{62,66–70}, where the norm-conserving ONCV pseudopotentials^{71} are recompiled using version 3.3.1 of the code, and the pseudopotentials for Ba and Pb are replaced by Ba.pbe-spn-kjpaw_psl.1.0.0.UPF and Pb.pbe-dn-kjpaw_psl.0.2.2.UPF of the pslibrary. In Fig. _{k} = 0.15, 0.2, 0.3, and 0.4 Å^{−1}, used in the NSCF step to construct Wannier functions. We stress that for an isolated set of bands, such as for the valence bands of an insulator, the SCDM method involves no free parameters and the only parameter to set is the _{k} of a uniform grid that is used to diagonalise the Hamiltonian. Hence it is fundamental to elaborate a strategy for the choice of _{k}, as this finally removes every free parameter from the construction of Wannier functions for isolated bands.

Top (bottom) panel: average (max) band distance _{k}. The MLWF procedure improves the interpolation accuracy, although SCDM-only Wannier functions perform already remarkably well. The histograms focus on the most relevant interval and few outliers are not shown, in particular at _{k} = 0.2 Å^{−1} 98% (79/81) of the SCDM+MLWF bands and 96% (78/81) of the SCDM-only bands exhibit

The SCDM method is found to work well for all of the 81 systems studied, with the exception of two that have very poor interpolation. Notably, these two structures (three if we consider the SCDM-only method) are the ones that exhibit the highest initial spread Ω per Wannier function. Although a large initial spread does not necessarily imply poor interpolation, it certainly correlates with a potential risk of poor Wannierisation and it could be used as a marker for triggering a check on the quality of bands interpolation within the calculation workflow. We postpone the discussion on the causes of the poor performance of the SCDM method in these systems until the end of this section, where we also provide possible solutions that can be automated.

To get a sense of the typical quality of a good SCDM+MLWF interpolation, we report in Fig. _{3}Mg_{2} (_{k} = 0.2 Å^{−1}; the direct and interpolated band structures are essentially indistinguishable (e.g., the largest difference in energy between the bands in the case of CaO is of

Wannier-interpolated (solid red) and full DFT band structure (black dots), using the MLWF procedure on SCDM projections and _{k} = 0.2 Å^{−1}. The dashed line labels the valence band maximum (VBM). _{3}Mg_{2} (

Figure _{k} = 0.2 Å^{−1} is typically sufficient to provide accurate interpolated band structures, in particular 96% of the materials (78/81) for SCDM-only and 98% (79/81) for SCDM+MLWF show

Those systems with

Number of interpolated bands showing _{k}.

| SCDM-only | SCDM+MLWF |
---|---|---|

0.15 | 3 | 2 |

0.2 | 3 | 2 |

0.3 | 6 | 2 |

0.4 | 16 | 8 |

As we will discuss shortly, the superior performance of SCDM+MLWF is linked with the increased localisation associated with the MLWF procedure. As mentioned before, localisation is also related to the poor interpolation of the outliers: at all

Figure _{k} = 0.2 Å^{−1}. The SCDM projections are found to perform better, leading to narrower distributions: 98% of the materials (79/81) show

Top (bottom) panel: average (max) band distance _{k} = 0.2 Å^{−1}. SCDM projections perform better than random projections when used in conjunction with the MLWF procedure. The histograms focus on the most relevant interval and few outliers are not shown, in particular the 96% (78/81) of the SCDM+MLWF bands and the 83% (67/81) of the random+MLWF bands exhibit an

We now elaborate on the differences between random and SCDM initial projections. First, random projections typically generate a much higher initial spread (7.5 Å^{2} per WF) compared to SCDM (1.0 Å^{2} per WF). We find that the MLWF procedure is often sufficient to localise Wannier functions even in the case of large initial spreads: for 63 out of 81 materials the MLWF procedure brings both the random projections and the SCDM projections cases to the same minimum spread value. Notably, it never happens that the spread is similar and the quality of the interpolation is very different, while the opposite happens only in the case of He, a pathological case (1 atom and 2 electrons per cell) where random projections give a poorly localised Wannier function while still being able to provide a very good interpolation. For 15 materials (16 if we include He), random projections provide a very poor starting point and the MLWF procedure remains trapped in a local minimum with large spread. In these cases, instead, SCDM projections are a good starting point with low spread and the MLWF procedure further reduces it and a higher-quality interpolation is achieved, as demonstrated by the lower _{2}Os, we have checked that excluding the semi-core states greatly improves the performance and the quality of the interpolated bands. We believe that the reason lies in the fact that, if semi-core states are present, then there are some projections, centred on the same site, that possess the same symmetry character, e.g.,

In the other case, Se_{2}Sn, there are no semi-core states. Here instead, some SCDM projections show an initial value of Ω_{D}—the sum of the diagonal elements of _{D} > 0.5 Å^{2}), which could be used as a diagnostic indicator for problematic systems. In particular, SCDM+MLWF seems to get trapped in a state in which there are a number of well-localised WFs and two that are diffuse and spread over multiple sites. This set of WF are real with a total spread of 28 Å^{2} and Ω_{D} of 2 Å^{2}. We found that a possible solution to recover a good interpolation is to add some noise (adding small random numbers to the search direction components, as implemented in WANNIER90) during the minimisation to help the algorithm escape from the unwanted local minimum.

We propose some technical solutions that could be easily added to a workflow:

Automatically detect and exclude semi-core states (if any). This is generally a safe choice as these states are not physically interesting for most applications. Alternatively, one could retain the semi-core states and increase the cutoff energy (or equivalently the density of the real-space grid).

If the problem is not in describing semi-core states, then check the value of Ω_{D}, if it is above a given threshold (e.g., >1.0 Å^{2}) for one or more initial projections, introduce some noise in the minimisation.

If none of the above work, switch to random+MLWF projections, which may give a better final result.

To study now more in detail the effect of minimising the spread, we start by comparing the total spread Ω obtained using SCDM+MLWF and SCDM-only, by computing:^{SCDM} and Ω^{MLWF} are the total spreads obtained with SCDM-only and SCDM+MLWF, respectively. As reported in Fig.

The data has been obtained considering the valence bands of our set of 81 insulators, with _{k} = 0.2 Å^{−1}. The SCDM+MLWF procedure provides Wannier functions that are moderately more localised with respect to SCDM-only, with a relative variation within 10–20% for most materials.

An interesting question is whether the difference in spread due to the MLWF procedure correlates with the difference in the quality of the interpolation. To assess this, we compute the quantity^{SCDM} and ^{MLWF} are the band distances obtained with SCDM-only and SCDM+MLWF respectively. Figure ^{MLWF}, showing that a reduction in the spread typically implies an improvement in the quality of the interpolation (Δ

The dataset consists of the 81 insulators described in the main text (only 61 out of 81 visible in the axes range). Δ^{MLWF} represent the quantitative deviation between SCDM+MLWF and SCDM-only in terms of band structures and total spreads, respectively. Maximally-localised Wannier functions give comparable and often more accurate interpolated bands.

Before discussing the case of entangled bands, we summarise here the main conclusions that can be drawn for isolated bands. All the results we obtained, displayed in Figs _{k} = 0.2 Å^{−1} as the standard protocol for producing accurate and efficient Wannier Hamiltonians describing the valence bands of bulk insulating crystals.

We now consider the case of entangled bands. With the intent of describing a fully automatic protocol, we limit ourselves to the case of Wannier interpolation of all states up to a given energy (excluding, if appropriate, manifolds of low-lying semicore states that are isolated in energy from the rest of the band structure) and we do not consider the case of computing Wannier functions for a manifold of bands of given symmetry within a narrow energy window (e.g.,

In the case of entangled bands, the SCDM method demands the choice of three free parameters: ^{10} in the SCDM method, it is not guaranteed that a subspace that includes the physically-relevant lowest-lying bands will be selected because the greedy QRCP algorithm, owing to an inappropriate choice of

To identify appropriate values of _{nk}, which measures how well each Bloch state

Similarly to Agapito et al.^{14}, we choose as our localised functions the set of _{PAO} pseudo-atomic orbitals (PAO) _{Ilm}(_{μ} is the number of lattice vectors

The projectability of each Bloch state onto _{nk} ≤1. The projections

As the first step of our protocol, we choose _{PAO} considered in the sum of Eq. (_{PAO} is a conservative choice, as the number of PAOs is usually greater or equal to the number of valence bands plus few conduction bands.

We then use the values of the projectability to inform the choice of _{nk}, as shown in Fig. _{nk} ~ 1 for low-energy states, which are well-represented by the chosen pseudo-atomic orbitals, and _{nk} ~ 0 for high-energy states that originate either from free-electron-like states or from localised states with an orbital character that is not included in the set listed in Table _{fit} and _{fit}. The core of our protocol lies on the actual choice of the _{fit} measures the typical energy spread of the bands originating from states within _{fit}, however, produces extremely poor interpolation of the bands for most of the materials that we have tested, see “High-throughput verification”. The reason is that it gives too great a weight in Eq. (_{nk} < 1). As a consequence the SCDM algorithm might select columns representing better these states rather than those with projectability close to 1 at low energy, that are essential and physically relevant to include. In these cases, the corresponding band interpolation shows large oscillations and has large errors with respect to the DFT band structure in large portions of the BZ. We need therefore to choose a smaller value _{fit}. On the other hand, however, we note that the weight of states much above

Each blue dot represents the projectability as defined in Eq. (_{fit} while the vertical green line represents the optimal value of _{opt} = _{fit} − 3_{fit}. The value of the Fermi energy is also shown for reference (black line).

We need, therefore, a general and automatic recipe for choosing an appropriate, intermediate value of _{fit} − _{fit} is guided by the consideration that states that start to have a significant component of their character outside _{fit} have more than 50% of their character outside

In order to explain better our specific choice of

Left panel: bands distance _{fit} − 3_{fit} − 3_{fit}. The smearing function to compute

In particular, the value of _{fit} − 3_{fit} − 3_{fit}, _{fit}) on this line. Our target is to have

On the other hand, also moving to the region of small _{k}_{k}, therefore removing important information needed by the method for a good interpolation.

Our choice of _{fit}, allows us to locate our choice of (^{61}.

We also emphasise here that the choice of _{I}.

In this section, we present the results of the high-throughput calculations for the general case of 200 materials that have been chosen to cover a large region of structural (12 different Bravais lattices) and chemical (67 different elements) space. The free parameters in the SCDM method have been chosen by the automatic procedure outlined in the previous section. The structure of this section parallels the one for isolated bands; in particular, we make use of the bands distance _{nk}(

To verify up to which energy the interpolation is accurate (for the number of PAOs in the pseudopotentials chosen in this work, see Table _{F} + Δ, with Δ = 1, 2, 3, 4, 5 eV, and _{nk} which significantly increases the value of the band distance. The distribution becomes much narrower and closer to _{F} + 1.0 eV, 98% of the materials have _{F} + 1.0 eV (for entangled bands) as a reliable measure of the quality of the interpolation in the energy region of interest.

The chemical potential is defined as _{F} + Δ (Δ = 1, 2, 3, 4, 5 eV) and the smearing _{k} = 0.2 Å^{−1}.

As in the case of isolated bands, the first step is to study the effect of the _{k} = 0.2 Å^{−1} is typically sufficient to provide accurate interpolated band structures: in particular, 94% of the materials (187/200) for SCDM-only and 97% (193/200) for SCDM+MLWF show _{k} to 0.2 Å^{−1} for further analysis in this section.

Top (bottom) panel: histogram of average (max) band distance _{k}. The MLWF procedure slightly worsens the accuracy of the interpolation when compared to SCDM-only Wannier functions. The histograms focus on the most relevant interval and few outliers are not shown, in particular at _{k} = 0.2 Å^{−1} 98% (196/200) of the SCDM+MLWF bands and 99.5% (199/200) of the SCDM-only bands exhibit

Figure _{3}Mg_{2} (and these can be compared with Fig.

Wannier-interpolated bands are in solid red and full DFT bands are in solid black. Panel (_{3}Mg_{2}. Panel (_{k} = 0.2 Å^{−1}. Note that, while we show all Wannier-interpolated bands, the band distance

Unlike the case of isolated bands, for entangled bands the MLWF procedure substantially increases the localisation of the resulting Wannier functions from SCDM projections, giving for instance a

Histogram of the relative variation of the total quadratic spread Ω before and after the MLWF procedure for the band structures of our set of 200 materials, obtained for _{k} = 0.2 Å^{−1}. The SCDM+MLWF procedure provides Wannier functions that are substantially more localised with respect to SCDM-only, with a relative variation between 20 and 60% for most materials.

We now look at how the difference in spread due to the MLWF procedure correlates with the difference in the quality of the interpolated band structures. Although the correlation is not as strong as in the case of isolated bands, it can be seen (Fig. ^{MLWF} > 20 meV. Moreover, for the remaining 19 (out of 182) systems the MLWF procedure improves the bands interpolation, notably yielding ^{MLWF} < 20 meV for all of them. Finally, for the remaining 18 systems (9%), the MLWF scheme worsens the results with ∣Δ^{MLWF} > 20 meV). In all these cases, a possible reason for failure might be related to the choice of columns in the SCDM algorithm, which is performed only at Γ (see discussion in “SCDM for periodic systems: SCDM-

The dataset consists of all 200 + 81 materials, with entangled bands (red dots, 148 out of 200 visible in the axes range) and with isolated bands (blue dots, 64 out of 81 visible) showing Δ^{MLWF}, that is the quantitative deviation between SCDM+MLWF and SCDM-only in terms of band structures and total spreads, respectively. Maximally-localising Wannier functions give potentially more accurate interpolated bands for valence bands only, whereas for entangled bands the trend is reversed.

We have presented an approach to generate a set of maximally localised Wannier functions in an automated way that has the advantage of being simple, robust and applicable also in the more general case of so-called entangled bands. The high sensitivity of iterative minimisation algorithms to the initial conditions, which was a long-standing problem in particular for the entangled-band case, is overcome by employing the selected columns of the density matrix^{20,21} (SCDM) algorithm to automatically choose the initial subspace. For the Wannierisation of isolated bands, SCDM is a parameter-free method, whereas for entangled bands two real numbers

To make the method available to any researcher, we have implemented the SCDM algorithm in PW2WANNIER90, part of the open-source Quantum ESPRESSO distribution, and added corresponding functionality to the open-source WANNIER90 code. We have also discussed how the full procedure is implemented as AiiDA^{25,26} workflows, encoding the knowledge that is needed to perform all steps (DFT simulations, selection of the parameters, Wannierisation) into an automated software. This enables MLWFs to be obtained and used to calculate material properties by providing the crystal structure of a material as the only input. Furthermore, we are distributing publicly and freely all codes and workflows discussed in this work within a virtual machine^{61} preconfigured with the open source codes AiiDA, Quantum ESPRESSO, and WANNIER90. This VM allows anyone to explore and reproduce straightforwardly the present results without the need to install or configure anything, and without the need of implementing again workflows and algorithms, in the true spirit of FAIR data sharing^{72} and Open Science. In addition, interested researchers are not constrained to re-run the calculations performed in this work, but can perform their own simulations, either with different parameters or on new materials. To the best of our knowledge, this is the first time that such level of reproducibility is offered accompanying a scientific paper in the field of DFT simulations.

We have demonstrated the robustness of the present approach by carrying out high-throughput calculations on a dataset of 200 bulk crystalline materials, of which 81 are insulators, spanning a wide chemical and structural space. The main metric we used to assess the results is the so-called band distance^{62}, quantifying the difference between the Wannier-interpolated band structures and the corresponding direct DFT band structures. In particular, we obtain excellent interpolations: for entangled bands, 97% of the materials show an average bands distance

We believe that this work is a significant step forward towards completely automated high-throughput calculations of advanced materials properties exploiting Wannier functions.

AiiDA^{25,26} is a python materials’ informatics platform to automate, manage and coordinate simulations and workflows, and to encourage sharing of both the resulting data and the workflow codes used to generate them. While general in its design, its plugins cover many materials science codes, including Quantum ESPRESSO^{73} and WANNIER90^{74}.

Our implementation of the SCDM method inside the open-source code Quantum ESPRESSO makes it available to any researcher. Moreover, our protocol for the choice of the SCDM parameters discussed in “Protocol” describes an effective procedure to automatically compute the Wannier functions of any material. However, the actual computation starting only from the crystal coordinates is non-trivial. The choice of numerical parameters (cutoffs,

Furthermore, AiiDA keeps track of the provenance of the data generated in the simulations in a fully automated way, in the form of a directed graph (see Fig.

The graph has been generated by running a WANNIER90 calculation using Quantum ESPRESSO as the input code for an InSe crystal, top green node (link labels have been removed for clarity). Red arrows represent caller–called relationships between a workflow and a subworkflow or a calculation; continuous lines connect calculations on a supercomputer (light blue ellipses) to their inputs and to the outputs they create, while dotted lines connect workflows (dark blue ellipses) to the data they return. Other data nodes are represented as yellow rectangles. In the top-right part of the graph, a set of workflows drive variable-cell relaxations of the initial structure via Quantum ESPRESSO; the central part contains the self-consistent, non-self-consistent and band-structure Quantum ESPRESSO calculations; in the bottom-left part are located the calculations computing the projection of the wavefunctions on a localised atomic basis set. At the bottom of the graph, we can find the WANNIER90 calculation, producing a set of output nodes that includes the Wannier-interpolated band structure (bottom green node).

The AiiDA workflows that we have written start by calling existing subworkflows available in the AiiDA-quantumespresso^{73} plug-in that, given a crystal structure, perform a variable-cell atomic relaxation to obtain the converged DFT charge density. These workflows also contain useful heuristics and recovery mechanisms to reach convergence in case of common problems (e.g., by changing the diagonalisation algorithm) as well as automatic selection of parameters, including pseudopotentials and cutoffs from the SSSP library^{62}. Once the charge density is computed, the workflow first standardises the cell using the symmetry-detection library spglib^{75} and the seekpath^{63} library that, in addition, provide a standardised band-structure path. Then, it proceeds along two parallel branches: on one side, it computes the DFT band structure along the suggested path. In parallel, it computes the Wannier functions: if first computes wavefunctions on a full uniform grid using a non-self-consistent Quantum ESPRESSO calculation, and then computes the PDOS, the projectabilities, and fits them to obtain the ^{(k)} matrices obtained with the SCDM method. Finally, the workflow drives the execution of WANNIER90 to compute the (maximally-localised) Wannier functions and produce the output quantities of interest (spreads, interpolated band structure on the same path of the DFT code, plots of the Wannier functions, etc.).

In an effort to improve the verification and dissemination of computational results, and in order to make the present work available to all, we are distributing all codes and workflows discussed here within a preconfigured virtual machine (VM)^{61} based on the Quantum Mobile VM^{76} available on the Materials Cloud^{77}. The relevant quantum codes (Quantum ESPRESSO, WANNIER90) and the informatics’ platform AiiDA come pre-installed and configured in the VM, ready to run through the workflows described above. A simple README file guides new users in the installation of the VM and in the execution of the workflow, to compute—with essentially no user intervention—the interpolated band structure of a material of choice.

V.V. acknowledges support from the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 676531 (project E-CAM). G.P., A.M., and N.M. acknowledge support by the NCCR MARVEL of the Swiss National Science Foundation and the European Union’s Centre of Excellence MaX “Materials design at the Exascale” (Grant No. 824143). G.P., A.M., and N. M. acknowledge PRACE for awarding us simulation time on Piz Daint at CSCS (project ID 2016153543) and Marconi at CINECA (project ID 2016163963). V.V. and A.A.M. acknowledge support from the Thomas Young Centre under grant TYC-101. J.R.Y. is grateful for computational support from the UK national high performance computing service, ARCHER, for which access was obtained via the UKCP consortium and funded by EPSRC Grant Ref EP/P022561/1. V.V. acknowledges Prof. Mike Payne for support, and Prof. Lin Lin and Dr. Anil Damle for useful discussions. G.P. acknowledges Dr. Francesco Aquilante for useful discussions. A.M. acknowledges Prof. Ivo Souza for useful comments on the manuscript. The authors acknowledge Norma Rivano for testing the virtual machine and Dr. Sebastiaan P. Huber for the implementation of the AiiDA-Quantum ESPRESSO workflow for geometry relaxations.

V.V. implemented and tested the SCDM method on selected materials. G.P. and A.M. developed the automation protocols and the workflows, run the high-throughput simulations, and generated the Virtual Machine. N.M., A.A.M., and J.R.Y supervised the project. All authors analysed the results and contributed to writing the manuscript.

All data generated for this work can be obtained by downloading the publicly available Virtual Machine (VM) on the Materials Cloud (

All codes used for this work are open-source and hence available to any researcher. In particular, the latest stable version of WANNIER90 can be downloaded at

The latest stable version of Quantum ESPRESSO can be found at

Likewise, for the AiiDA code the latest stable version can be found at

The authors declare no competing interests.

_{4}Ba

_{2}Cu

_{2}O

_{10}: an ab initio Wannier function analysis

Supplementary Information