Abstract
We review detection methods that are currently in use or have been proposed to search for a stochastic background of gravitational radiation. We consider both Bayesian and frequentist searches using groundbased and spacebased laser interferometers, spacecraft Doppler tracking, and pulsar timing arrays; and we allow for anisotropy, nonGaussianity, and nonstandard polarization states. Our focus is on relevant data analysis issues, and not on the particular astrophysical or early Universe sources that might give rise to such backgrounds. We provide a unified treatment of these searches at the level of detector response functions, detection sensitivity curves, and, more generally, at the level of the likelihood function, since the choice of signal and noise models and prior probability distributions are actually what define the search. Pedagogical examples are given whenever possible to compare and contrast different approaches. We have tried to make the article as selfcontained and comprehensive as possible, targeting graduate students and new researchers looking to enter this field.
Introduction
The real voyage of discovery consists not in seeking new landscapes, but in having new eyes. Marcel Proust
It is an exciting time for the field of gravitationalwave astronomy. The observation, on September 14th, 2015, of gravitational waves from the inspiral and merger of a pair of black holes (Abbott et al. 2016e) has opened a radically new way of observing the Universe. The event, denoted GW150914, was observed simultaneously by the two detectors of the Laser Interferometer Gravitationalwave Observatory (LIGO) (Aasi et al. 2015). [LIGO consists of two 4 kmlong laser interferometers, one located in Hanford, Washington, the other in Livingston, LA.] The merger event that produced the gravitational waves occured in a distant galaxy roughly 1.3 billion light years from Earth. The initial masses of the two black holes were estimated to be \(36^{+5}_{4}\ \mathrm{M}_\odot \) and \(29^{+4}_{4}\ \mathrm{M}_\odot \), and that of the postmerger black hole as \(62^{+4}_{4}\ \mathrm{M}_\odot \) (Abbott et al. 2016f). The difference between the initial and final masses corresponds to \(3.0^{+0.5}_{0.5}\ \mathrm{M}_\odot c^2\) of energy radiated in gravitational waves, with a peak luminosity of more than ten times the combined luminosity of all the stars in all the galaxies in the visible universe! The fact that this event was observed only in gravitational waves—and not in electromagnetic waves—illustrates the complementarity and potential for new discoveries that comes with the opening of the gravitationalwave window onto the universe.
GW150914 is just the first of many gravitationalwave signals that we expect to observe over the next several years. Indeed, roughly 3 months after the detection of GW150914, a second event, GW151226, was observed by the two LIGO detectors (Abbott et al. 2016d). This event also involved the inspiral and merger of a pair of stellar mass black holes, with initial component masses \(14.2^{+8.3}_{3.7}\ \mathrm{M}_\odot \) and \(7.5^{+2.3}_{2.3}\ \mathrm{M}_\odot \), and a final black hole mass of \(20.8^{+6.1}_{1.7}\ \mathrm{M}_\odot \). The source was at a distance of roughly 1.4 billion lightyears from Earth, comparable to that of GW150914. Advanced LIGO will continue interleaving observation runs and commissioning activities to reach design sensivity around 2020 (Aasi et al. 2015), which will allow detections of signals like GW150914 and GW151226 with more than three times the signaltonoise ratio than was observed for GW150914 (which was 24). In addition, the Advanced Virgo detector (Acernese et al. 2015) (a 3 kmlong laser interferometer in Cascina, Italy) and KAGRA (Aso et al. 2013) (a 3 kmlong cryogenic laser interferometer in Kamioka mine in Japan) should both be taking data by the end of 2016. There are also plans for a third LIGO detector in India (Iyer et al. 2011). A global network of detectors such as this will allow for much improved position reconstruction and parameter estimation of the sources (Abbott et al. 2016i).
Motivation and context
GW150914 and GW151226 were single events—binary black hole mergers that were observed with both templatebased searches for compact binary inspirals and searches for generic gravitationalwave transients in the two LIGO detectors (Abbott et al. 2016e, d). The network matchedfilter signaltonoise ratio (Owen and Sathyaprakash 1999) for these two events, using relativitistic waveform models for binary black holes, was 24 and 13, respectively. The probability that these detections were due to noise alone is \({<} 2\times 10^{7}\), corresponding to a significance greater than \(5\sigma \)—the standard for socalled “goldplated” detections. But for every loud event like GW150914 or GW151226, we expect many more quiet events that are too distant to be individually detected, since the associated signaltonoise ratios are too low.
The total rate of merger events from the population of stellarmass binary black holes of which GW150914 and GW151226 are members can be estimated^{Footnote 1} by multiplying the local rate estimate of 9–240 \(\mathrm{Gpc}^{3}\, \mathrm{year}^{1}\) (Abbott et al. 2016g) by the comoving volume out to some large redshift, e.g., \(z\sim 6\). This yields a total rate of binary black hole mergers between \({\sim }1\) per minute and a few per hour. Since the duration of each merger signal in the sensitive band of a LIGOlike detector is of order a few tenths of a second to \({\sim } 1\) s, the duty cycle (the fraction of time that the signal is “on” in the data) is \({\ll } 1\). This means that the combined signal from such a population of binary black holes will be “popcornlike”, with the majority of the individual signals being too weak to individually detect. Since the arrival times of the merger signals are randomlydistributed, the combined signal from the population of binary black holes is itself random—it is an example of a stochastic background of gravitational radiation.
More generally, a stochastic background of gravitational radiation is any random gravitationalwave signal produced by a large number of weak, independent, and unresolved sources. The background doesn’t have to be popcornlike, like the expected signal from the population of binary black holes which gave rise to GW150914 and GW151226. It can be composed of individual deterministic signals that overlap in time (or in frequency) producing a “confusion” noise analogous to conversations at a cocktail party. Such a confusion noise is produced by the galactic population of compact white dwarf binaries. (For this case, the stochastic signal is so strong that it becomes a foreground, acting as an additional source of noise when trying to detect other weak gravitationalwave signals in the same frequency band). Alternatively, the signal can be intrinsically random, associated with stochastic processes in the early Universe or with unmodeled sources, like supernovae, which produce signals that are not described by deterministic waveforms.
The focus of this review article is on data analysis strategies (i.e., detection methods) that can be used to detect and ultimately characterize a stochastic gravitationalwave background. To introduce this topic and to set the stage for the more detailed discussions to follow in later sections, we ask (and start to answer) the following questions:
Why do we care about detecting a stochastic background?
Detecting a stochastic background of gravitational radiation can provide information about astrophysical source populations and processes in the very early Universe, which are inaccessible by any other means. For example, electromagnetic radiation cannot provide a picture of the Universe any earlier than the time of last of scattering (roughly 400,000 years after the Big Bang). Gravitational waves, on the other hand, can give us information all the way back to the onset of inflation, a mere \({\sim } 10^{32}~\mathrm{s}\) after the Big Bang. (See Maggiore 2000 for a detailed discussion of both cosmological and astrophysical sources of a stochastic gravitationalwave background).
Why is detection challenging?
Stochastic signals are effectively another source of noise in a single detector. So the fundamental problem is how to distinguish between gravitationalwave “noise” and instrumental noise. It turns out that there are several ways to do this, as we will discuss in the later sections of this article.
What detection methods can one use?
Crosscorrelation methods can be used whenever one has multiple detectors that respond to the common gravitationalwave background. For single detector analyses e.g., for the Laser Space Interferometer Antenna (LISA), one needs to take advantage of null combinations of the data (which act as instrument noise monitors) or use instrument noise modeling to try to distinguish the gravitationalwave signal from instrumental noise. Over the past 15 years or so, the number of detection methods for stochastic backgrounds has increased considerably. So now, in addition to the standard crosscorrelation search for a “vanilla” (Gaussianstationary, unpolarized, isotropic) background, one can search for nonGaussian backgrounds, anisotropic backgrounds, circularlypolarized backgrounds, and backgrounds with polarization components predicted by alternative (nongeneralrelativity) theories of gravity. These searches are discussed in Sects. 7 and 8.
Table 1 summarizes the basic properties of various analysis methods that have been used (or proposed) for stochastic background searches. Despite apparent differences, all analyses use a likelihood function, e.g., for defining frequentist statistics or for calculating posterior distributions for Bayesian inference (as will be described in more detail in Sect. 3), and take advantage of crosscorrelations if multiple detectors are available (as will be described in more detail in Sect. 4).
What are the prospects for detection?
The prospects for detection depend on the source of the background (i.e., astrophysical or cosmological) and the type of detector being used. For example, a spacebased interferometer like LISA is guaranteed to detect the gravitationalwave confusion noise produced by the galactic population of compact white dwarf binaries. Pulsar timing arrays, on the other hand, should be able to detect the confusion noise from supermassive black hole binaries (SMBHBs) at the centers of merging galaxies, provided the binaries are not affected by their environments in a way that severely diminishes the strength of the background (Shannon et al. 2015). Detection sensitivity curves are a very convenient way of comparing theoretical predictions of source strengths to the sensivity levels of the various detectors (as we will discuss in Sect. 10).
Searches across the gravitationalwave spectrum
The frequency band of groundbased laser interferometers like LIGO, Virgo, and KAGRA is between \({\sim }10~\mathrm{Hz}\) and a few kHz (gravity gradient and seismic noise are the limiting^{Footnote 2} noise sources below 10 Hz, and photon shot noise above a couple of kHz). Outside this band there are several other experiments—both currently operating and planned—that should also be able to detect gravitational waves. An illustration of the gravitationalwave spectrum, together with potential sources and relevant detectors, is shown in Fig. 1. We highlight a few of these experiments below.
Cosmic microwave background experiments
At the extreme lowfrequency end of the spectrum, corresponding to gravitationalwave periods of order the age of the Universe, the Planck satellite (ESA 2016c) and other cosmic microwave background (CMB) experiments, e.g., BICEP and Keck (BICEP/Keck 2016) are looking for evidence of relic gravitational waves from the Big Bang in the Bmode component of CMB polarization maps (Kamionkowski et al. 1997; Hu and White 1997; Ade et al. 2015a). In 2014, BICEP2 announced the detection of relic gravitational waves (Ade et al. 2014), but it was later shown that the observed Bmode signal was due to contamination by intervening dust in the galaxy (Flauger et al. 2014; Mortonson and Seljak 2014). So at present, these experiments have been able to only constrain (i.e., set upper limits on) the amount of gravitational waves in the very early Universe (Ade et al. 2015a). But these constraints severely limit the possibility of detecting the relic gravitationalwave background with any of the higherfrequency detection methods, unless its spectrum increases with frequency. [Note that standard models of inflation predict a relic background whose energy density is almost constant in frequency, leading to a strain spectral density that decreases with frequency.] Needless to say, the detection of a primordial gravitationalwave background is a “holy grail” of gravitationalwave astronomy.
Pulsar timing arrays
At frequencies between \({\sim }10^{9}~\mathrm{Hz}\) and \(10^{7}~\mathrm{Hz}\), corresponding to gravitationalwave periods of order decades to years, pulsar timing arrays (PTAs) can be used to search for gravitational waves. This is done by carefully monitoring the arrival times of radio pulses from an array of galactic millisecond pulsars, looking for correlated modulations in the arrival times induced by a passing gravitational wave (Detweiler 1979; Hellings and Downs 1983). The mostlikely gravitationalwave source for PTAs is a gravitationalwave background formed from the incoherent superposition of signals produced by the inspirals and mergers of SMBHBs in the centers of distant galaxies (Jaffe and Backer 2003). These searches continue to improve their sensitivity by upgrading instrument backends and discovering more millisecond pulsars that can be added to the array. These improvements have led to more constraining upper limits on the amplitude of the gravitationalwave background (Shannon et al. 2015; Arzoumanian et al. 2016), with a detection being likely before the end of this decade (Siemens et al. 2013; Taylor et al. 2016b).
Spacebased interferometers
At frequencies between \({\sim }10^{4}~\mathrm{Hz}\) and \(10^{1}~\mathrm{Hz}\), corresponding to gravitationalwave periods of order hours to minutes, proposed spacebased interferometers like LISA can search for gravitational waves from a wide variety of sources (Gair et al. 2013). These include: (i) inspirals and mergers of SMBHBs with masses of order \(10^6~\mathrm{M}_\odot \), (ii) captures of compact stellarmass objects around supermassive black holes, and (iii) the stochastic confusion noise produced by compact whitedwarf binaries in our galaxy. In fact, hundreds of binary black holes that are individually resolvable by LISA will coalesce in the aLIGO band within a 10 year period, opening up the possibility of doing multiband gravitationalwave astronomy (Sesana 2016).
The basic spacebased interferometer configuration consists of three satellites (each housing two lasers, two telescopes, and two test masses) that fly in an equilateraltriangle formation, with arm lengths of order several million km. A variant of the original LISA design was selected in February 2017 by the European Space Agency (ESA) as the 3rd large mission in its Cosmic Vision Program (ESA 2016a). The earliest launch date for LISA would be around 2030. A technologydemonstration mission, called LISA Pathfinder (ESA 2016b), was launched in December 2015, meeting or exceeding all of the requirements for an important subset of the LISA technologies (Armano et al. 2016).
Other detectors
Finally, in the frequency band between \({\sim }0.1~\mathrm {Hz}\) and \(10~\mathrm {Hz}\), there are proposals for both Earthbased detectors (Harms et al. 2013) and also secondgeneration spacebased interferometers—the BigBang Observer (BBO) (Phinney et al. 2004) and the DECIhertz interferometer Gravitationalwave Observatory (DECIGO) (Ando et al. 2010). Such detectors would be sensitive to gravitational waves with periods between \({\sim }10~\mathrm {s}\) and \(0.1~\mathrm {s}\). The primary sources in this band are intermediatemass (\(10^3\)–\(10^4~M_\odot \)) binary black holes, galactic and extragalactic neutron star binaries, and a cosmologicallygenerated stochastic background.
Goal of this article
Starting with the pioneering work of Grishchuk (1976), Detweiler (1979), Hellings and Downs (1983), and Michelson (1987), detection methods for gravitationalwave backgrounds have increased in scope and sophistication over the years, with several new developments occuring rather recently. As mentioned above, we have search methods now that target different properties of the background (e.g., isotropic or anisotropic, Gaussian or nonGaussian, polarized or unpolarized, etc.). These searches are necessarily implemented differently for different detectors, since, for example, groundbased detectors like LIGO and Virgo operate in the smallantenna (or longwavelength) limit, while pulsar timing arrays operate in the shortwavelength limit. Moreover, each of these searches can be formulated in terms of either Bayesian or frequentist statistics. The goal of this review article is to discuss these different detection methods from a perspective that attempts to unify the different treatments, emphasizing the similarities that exist when viewed from this broader perspective.
Unification
The extensive literature describing stochastic background analyses leaves the reader with the impression that highly specialized techniques are needed for groundbased, spacebased, and pulsar timing observations. Moreover, reviews of gravitationalwave data analysis leave the impression that the analysis of stochastic signals is somehow fundamentally different from that of any other signal type. Both of these impressions are misleading. The apparent differences are due to differences in terminology and perspective. By adopting a common analysis framework and notation, we are able to present a unified treatment of gravitationalwave data analysis across source classes and observation techniques.
We will provide a unified treatment of the various methods at the level of detector response functions, detection sensitivity curves, and, more generally, at the level of the likelihood function, since the choice of signal and noise models and prior probability distributions are actually what define the search. The same photon timeofflight calculation underpins the detector response functions, and the choice of prior for the gravitationalwave template defines the search. A matchedfilter search for binary mergers and a crosscorrelation search for stochastic signals are both derived from the same likelihood function, the difference being that the former uses a parameterized, deterministic template, while the latter uses a stochastic template. Hopefully, by the end of this article, the reader will see that the plethora of searches for different types of backgrounds, using different types of detectors, and using different statistical inference frameworks are not all that different after all.
Outline
The rest of the article is organized as follows: We begin in Sect. 2 by specifying the quantities that one uses to characterize a stochastic gravitationalwave background. In Sect. 3, we give an overview of statistical inference by comparing and contrasting how the Bayesian and frequentist formalisms address issues related to hypothesis testing, model selection, setting upper limits, parameter estimation, etc. We then illustrate these concepts in the context of a very simple toy problem. In Sect. 4, we introduce the key concept of correlation, which forms the basis for the majority of detection methods used for gravitationalwave backgrounds, and show how these techniques arise naturally from the standard templatebased approach. We derive the frequentist crosscorrelation statistic for a simple example. We also describe how a null channel is useful when correlation methods are not possible.
In Sect. 5, we go into more detail regarding the different types of detectors. In particular, we calculate singledetector response functions and the associated antenna patterns for groundbased and spacebased laser interferometers, spacecraft Doppler tracking, and pulsar timing measurements. (We do not discuss resonant bar detectors or CMBbased detection methods in this review article. However, current bounds from CMB observations will be reviewed in Sect. 10). By correlating the outputs of two such detectors, we obtain expressions for the correlation coefficient (or overlap reduction function) for a Gaussianstationary, unpolarized, isotropic background as a function of the separation and orientation of the two detectors. In Sect. 6, we discuss optimal filtering. Section 7 extends the analysis of the previous sections to anisotropic backgrounds. Here we describe several different analyses that produce maps of the gravitationalwave sky: (i) a frequentist gravitationalwave radiometer search, which is optimal for point sources, (ii) searches that decompose the gravitationalwave power on the sky in terms of spherical harmonics, and (iii) a phasecoherent search that can map both the amplitude and phase of a gravitationalwave background at each location on the sky. In Sect. 8, we discuss searches for: (i) nonGaussian backgrounds, (ii) circularlypolarized backgrounds, and (iii) backgrounds having nonstandard (i.e., nongeneralrelativity) polarization modes. We also briefly describe extensions of the crosscorrelation search method to look for nonstochasticbackgroundtype signals—in particular, longduration unmodelled transients and continuous (nearlymonochromatic) gravitationalwave signals from sources like Sco X1.
In Sect. 9, we discuss realworld complications introduced by irregular sampling, nonstationary and nonGaussian detector noise, and correlated environmental noise (e.g., Schumann resonances). We also describe what one can do if one has only a single detector, as is the case for LISA. Finally, we conclude in Sect. 10 by discussing prospects for detection, including detection sensitivity curves and current observational results.
We also include several appendices: In Appendix A we discuss different polarization basis tensors, and a Stokes’ parameter characterization of gravitationalwaves. In Appendices B and C, we summarize some standard statistical results for a Gaussian random variable, and then discuss how to define and test for nonstationarity and nonGaussianity. In Appendix D we describe the relationship between continuous functions of time and frequency and their discretelysampled counterparts. Appendices E, F, G are adapted from Gair et al. (2015), with details regarding spinweighted scalar, vector, and tensor spherical harmonics. Finally, Appendix H gives a “Rosetta stone” for translating back and forth between different response function conventions for gravitationalwave backgrounds.
Characterizing a stochastic gravitationalwave background
When you can measure what you are speaking about, and express it in numbers, you know something about it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarely, in your thoughts, advanced to the stage of science. William Thomson, Baron Kelvin of Largs
In this section, we define several key quantities (e.g., fractional energy density spectrum, characteristic strain, distribution of gravitationalwave power on the sky), which are used to characterize a stochastic background of gravitational radiation. The definitions are appropriate for both isotropic and anisotropic backgrounds. Our approach is similar to that found in Allen and Romano (1999) for isotropic backgrounds and for the standard polarization basis. For the planewave decomposition in terms of tensor spherical harmonics, we follow Gair et al. (2014, 2015). Detailed derivations can be found in those papers.
When is a gravitationalwave signal stochastic?
The standard “textbook” definition of a stochastic background of gravitational radiation is a random gravitationalwave signal produced by a large number of weak, independent, and unresolved sources. To say that it is random means that it can be characterized only statistically, in terms of expectation values of the field variables or, equivalently, in terms of the Fourier components of a planewave expansion of the metric perturbations (Sect. 2.3.1). If the number of independent sources is sufficiently large, the background will be Gaussian by the central limit theorem. Knowledge of the first two moments of the distribution will then suffice to determine all higherorder moments (Appendix B). For nonGaussian backgrounds, third and/or higherorder moments will also be needed.
Although there is general agreement with the above definition, there has been some confusion and disagreement in the literature (Rosado 2011; Regimbau and Mandic 2008; Regimbau and Hughes 2009; Regimbau 2011) regarding some of the defining properties of a stochastic background. This is because terms like weak and unresolved depend on details of the observation (e.g., the sensitivity of the detector, the total observation time, etc.), which are not intrinsic properties of the background. So the answer to the question “When is a gravitationalwave signal stochastic?” is not as simple or obvious as it might initially seem.
In Cornish and Romano (2015), we addressed this question in the context of searches for gravitationalwave backgrounds produced by a population of astrophysical sources. We found that it is best to give operational definitions for these properties, framed in the context of Bayesian inference. We will discuss Bayesian inference in more detail in Sect. 3, but for now the most important thing to know is that by using Bayesian inference we can calculate the probabilities of different signalplusnoise models, given the observed data. The signalplusnoise model with the largest probability is the preferred model, i.e., the one that is most consistent with the data. This is the essence of Bayesian model selection.
So we define a signal to be stochastic if a Bayesian model selection calculation prefers a stochastic signal model over any deterministic signal model. We also define a signal to be resolvable if it can be decomposed into separate (e.g., nonoverlapping in either time or frequency) and individually detectable signals, again in a Bayesian model selection sense.^{Footnote 3} If the background is associated with the superposition of signals from many astrophysical sources—as we expect for the population of binary black holes which gave rise to GW150914 and GW151226—then we should subtract out any bright deterministic signals that standout above the loweramplitude background, leaving behind a residual nondeterministic signal whose statistical properties we would like to determine. In the context of Bayesian inference, this ‘subtraction’ is done by allowing hybrid signal models, which consist of both parametrized deterministic signals and nondeterministic backgrounds. By using such hybrid models we can investigate the statistical properties of the residual background without the influence of the resolvable signals.
We will return to these ideas in Sect. 8.1, when we discuss searches for nonGaussian backgrounds in more detail.
Planewave expansions
Gravitational waves are timevarying perturbations to the spacetime metric, which propagate at the speed of light. In transversetraceless coordinates, the metric perturbations \(h_{ab}(t,\vec {x})\) corresponding to a gravitationalwave background can be written as a superposition of sinusoidal plane waves having frequency f, and coming from different directions \(\hat{n}\) on the sky:^{Footnote 4}
For a stochastic background, the metric perturbations \(h_{ab}(t,{\vec {x}})\) and hence the Fourier coefficients \(h_{ab}(f,\hat{n})\) are random variables, whose probability distributions define the statistical properties of the background.
Polarization basis
Typically, one expands the Fourier coefficients \(h_{ab}(f,\hat{n})\) in terms of the standard \(+\) and \(\times \) polarization tensors:
where
and \(\hat{l}\), \(\hat{m}\) are the standard angular unit vectors tangent to the sphere:
(See Fig. 2). Searches for stochastic backgrounds having alternative polarization modes, as predicted by modified (metric) theories of gravity, will be discussed in Sect. 8.3.
Tensor spherical harmonic basis
It is also possible to expand the Fourier coefficients \(h_{ab}(f,\hat{n})\) in terms of the gradient and curl tensor spherical harmonics (Gair et al. 2014):
where
In the above expressions, a semicolon denotes covariant differentiation, \(g_{ab}\) is the metric tensor on the sphere, and \(\epsilon _{ab}\) is the LeviCivita antisymmetric tensor. In standard spherical coordinates \((\theta ,\phi )\),
The normalization constant
was chosen so that \(\{Y^G_{(lm)ab}(\hat{n}), Y^C_{(lm)ab}(\hat{n})\}\) is a set of orthonormal functions (with respect to the multipole indices l and m) on the 2sphere. Appendix G contains additional details regarding gradient and curl spherical harmonics.
Note that we have adopted the notational convention used in the CMB literature, e.g., Kamionkowski et al. (1997), by putting parentheses around the lm indices to distinguish them from the spatial tensor indices a, b, etc. In addition, summations over l and m start at \(l=2\), and not \(l=0\) as would be the case for the expansion of a scalar field on the 2sphere in terms of ordinary (i.e., undifferentiated) spherical harmonics. In what follows, we will use \(\sum _{(lm)}\) as shorthand notation for \(\sum _{l=2}^\infty \sum _{m=l}^l\) unless indicated otherwise.
Relating the two expansions
The gradient and curl spherical harmonics have been used extensively in the CMB community for decomposing CMBpolarization maps in terms of Emodes and Bmodes (corresponding to the gradient and curl spherical harmonics). The most relevant property of the gradient and curl spherical harmonics is that they transform like combinations of spinweight \(\pm 2\) fields with respect to rotations of an orthonormal basis at points on the 2sphere. Explicitly,
where \({}_{\pm 2}Y_{lm}(\hat{n})\) are the spinweight \(\pm 2\) spherical harmonics (Appendix E). Using this relationship between the tensor spherical harmonic and \((+,\times )\) polarization bases, one can show (Gair et al. 2014):
or, equivalently,
These two expressions allow us to go back and forth between the expansion coefficients for the two different bases.
Statistical properties
The statistical properties of a stochastic gravitationalwave background are specified in terms of the probability distribution or moments (Appendix B) of the metric perturbations:
or similar expressions in terms of the Fourier coefficients \(h_A(f,\hat{n})\), where \(A\equiv \{+,\times \}\) labels the standard polarization modes of general relativity, or \(a^P_{(lm)}(f)\), where \(P\equiv \{G,C\}\) and (lm) label the multipole components for the gradient and curl tensor spherical harmonic decomposition. Without loss of generality we can assume that the background has zero mean:
We will also assume that the background is stationary (Appendix C). This means that all statistical quantities constructed from the metric perturbations at times t, \(t'\), etc., depend only on the difference between times, e.g., \(tt'\), and not on the choice of time origin. We expect this to be true given that the age of the universe is roughly 9 orders of magnitude larger than realistic observation times, \({\sim }10~\mathrm {year}\). It is thus unlikely that a stochastic gravitationalwave background has statistical properties that vary over the time scale of the observation.
For Gaussian backgrounds we need only consider quadratic expectation values, since all higherorder moments are either zero or can be written in terms of the quadratic moments (Appendix B). For nonGaussian backgrounds (Sect. 8.1), third and/or higher order moments will also be needed.
Beyond our assumption of stationarity, the specific form of the expectation values will depend, in general, on the source of the background. For example, a cosmological background produced by the superposition of a large number of independent gravitationalwave signals from the early Universe is expected to be Gaussian (via the central limit theorem), as well as isotropicallydistributed on the sky. Contrast this with the superposition of gravitational waves produced by unresolved Galactic whitedwarf binaries radiating in the LISA band (\(10^{4}~\mathrm{Hz}\) to \(10^{1}~\mathrm{Hz}\)). Although this confusionlimited astrophysical foreground is also expected to be Gaussian and stationary, it will have an anisotropic distribution, following the spatial distribution of the Milky Way. The anistropy will be encoded as a modulation in the LISA output, due to the changing antenna pattern of the LISA constellation in its yearly orbit around the Sun. Hence, different sources will give rise to different statistical distributions, which we will need to consider when formulating our data analysis strategies.
Quadratic expectation values for Gaussianstationary backgrounds
The simplest type of stochastic background will be Gaussianstationary, unpolarized, and spatially homogenous and isotropic. The quadratic expectation values for such a background are then
or, equivalently,
The numerical factors out front have been included so that \(S_h(f)\) has the interpretation of being the onesided gravitationalwave strain power spectral density function (units of \(\mathrm{strain}^2/\mathrm{Hz}\)), summed over both polarizations and integrated over the sky. The factor of \(\delta (ff')\) arises due to our assumption of stationarity; the factor of \(\delta _{AA'}\) (or \(\delta ^{PP'}\)) is due to our assumption that the polarization modes are statistically independent of one another and have no preferred component; and the factor of \(\delta ^2(\hat{n},\hat{n}')\) (or \(\delta _{ll'}\delta _{mm'}\)) is due to our assumption of spatial homogeneity and isotropy.
Anisotropic, unpolarized, Gaussianstationary backgrounds, whose radiation from different directions on the sky are uncorrelated with one another, are also simply represented in terms of the quadratic expectation values:
The function \(\mathcal{P}(f,\hat{n})\) describes the spatial distribution of gravitationalwave power on the sky at frequency f. It is related to \(S_h(f)\) via
The corresponding expectation values in terms of the tensor spherical harmonic expansion coefficients \(a^P_{(lm)}(f)\) are more complicated, since an individual mode in this basis corresponds to a gravitationalwave background whose radiation is correlated between different angular directions on the sky. (See Gair et al. (2014) for a discussion of backgrounds that have such correlations). We will discuss searches for anisotropic backgrounds in more detail in Sect. 7.
More general Gaussianstationary backgrounds (e.g., polarized, statistically isotropic but with correlated radiation, etc.) can be represented by appropriately changing the righthandside of the quadratic expectation values. However, for the remainder of this section and for most of the article, we will consider “vanilla” isotropic backgrounds, whose quadratic expectation values (2.14) or (2.15) are completely specified by the power spectral density \(S_h(f)\).
Fractional energy density spectrum
The gravitationalwave strain power spectral density \(S_h(f)\) is simply related to the fractional energy density spectrum in gravitational waves \(\Omega _\mathrm{gw}(f)\), see e.g., Allen and Romano (1999):
where
Here \(d\rho _\mathrm{gw}\) is the energy density in gravitational waves contained in the frequency interval f to \(f+df\), and \(\rho _c\equiv 3c^2 H_0^2/8\pi G\) is the critical energy density need to close the universe. The total energy density in gravitational waves normalized by the critical energy density is thus
where \(f_\mathrm {max}\) is some maximum cutoff frequency (e.g., associated with the Planck scale), beyond which our current understanding of gravity breaks down. \(\Omega _\mathrm {gw}\) can be compared, for example, to the total fractional energy density \(\Omega _\mathrm {b}\), \(\Omega _{\Lambda }\), in baryons, dark energy, etc. Since \(\rho _\mathrm {c}\) involves the Hubble constant, one sometimes writes \(H_0=h_0\,100\mathrm {\ km\ s^{1}\ Mpc^{1}}\), and then absorbs a factor of \(h_0^2\) in \(\Omega _\mathrm {gw}(f)\). The quantity \(h_0^2\,\Omega _\mathrm {gw}(f)\) is then independent of the value of the Hubble constant. However, since recent measurements by Planck (Ade et al. 2015b; ESA 2016c) have shown that \(h_0=0.68\) to a high degree of precision, we have assumed this value in this review article and quote limits directly on \(\Omega _\mathrm {gw}(f)\) (Sect. 10). The specific functional form for \(\Omega _\mathrm {gw}(f)\) depends on the source of the background, as we shall see explicitly below.
Characteristic strain
Although the fractional energy density spectrum \(\Omega _\mathrm{gw}(f)\) completely characterizes the statistical properties of a Gaussianstationary isotropic background, it is often convenient to work with the (dimensionless) characteristic strain amplitude \(h_c(f)\) defined by
It is related to \(\Omega _\mathrm{gw}(f)\) via:
Several theoretical models of gravitationalwave backgrounds predict characteristic strains that have a powerlaw form
where \(\alpha \) is spectral index and \(f_\mathrm{ref}\) is typically set to \(1/\mathrm{year}\). (There is no sum over \(\alpha \) in the above expression, and no sum over \(\beta \) in the following expression). Using Eqs. (2.22) and (2.23) it follows that
where
For inflationary backgrounds relevant for cosmology, it is often assumed that \(\Omega _\mathrm{gw}(f)=\mathrm{const}\), for which \(\beta =0\) and \(\alpha =1\). For a background arising from binary coalescence, \(\Omega _\mathrm{gw}(f) \propto f^{2/3}\), for which \(\beta = 2/3\) and \(\alpha =2/3\). This powerlaw dependence is applicable to supermassive blackhole binary (SMBHB) coalescences targeted by pulsar timing observations as well as to compact binary coalescences relevant for groundbased and spacebased detectors.
Statistical inference
If your experiment needs statistics, you ought to have done a better experiment. Ernest Rutherford
In this section, we review statistical inference from both the Bayesian and frequentist perspectives. Our discussion of frequentist and Bayesian upper limits, and the example given in Sect. 3.5 comparing Bayesian and frequentist analyses is modelled in part after Röver et al. (2011). Readers interested in more details about Bayesian statistical inference should see, e.g., Howson and Urbach (1991), Howson and Urbach (2006), Jaynes (2003), Gregory (2005) and Sivia and Skilling (2006). For a description of frequentist statistics, we recommend Helstrom (1968), Wainstein and Zubakov (1971) and Feldman and Cousins (1998).
Introduction to Bayesian and frequentist inference
Statistical inference can be used to answer questions such as “Is a gravitationalwave signal present in the data?” and, if so, “What are the physical characteristics of the source?” These questions are addressed using the techniques of classical (also known as frequentist) inference and Bayesian inference. Many of the early theoretical studies and observational papers in gravitationalwave astronomy followed the frequentist approach, but the use of Bayesian inference is growing in popularity. Moreover, many contemporary analyses cannot be classified as purely frequentist or Bayesian.
The textbook definition states that the difference between the two approaches comes down to their different interpretations of probability: for frequentists, probabilities are fundamentally related to frequencies of events, while for Bayesians, probabilities are fundamentally related to our own knowledge about an event. For example, when inferring the mass of a star, the frequentist interpretation is that the star has a true, fixed (albeit unknown) mass, so it is meaningless to talk about a probability distribution for it. Rather, the uncertainty is in the data, and the relevant probability is that of observing the data d, given that the star has mass m. This probability distribution is the likelihood, denoted \(p(d\vert m)\). In contrast, in the Bayesian interpretation the data are known (after all, it is what is measured!), and the mass of the star is what we are uncertain about,^{Footnote 5} so the relevant probability is that the mass has a certain value, given the data. This probability distribution is the posterior, \(p(m \vert d)\). The likelihood and posterior are related via Bayes’ theorem:
where p(m) is the prior probability distribution for m, and the normalization constant,
is the marginalized likelihood, or evidence. For uniform (flat) priors the frequentist confidence intervals for the parameters will coincide with the Bayesian credible intervals, but the interpretation remains quiet distinct.
The choice of prior probability distributions is a source of much consternation and debate, and is often cited as a weakness of the Bayesian approach. But the choice of probability distribution for the likelihood (which is also important for the frequentist approach) is often no less fraught. The prior quantifies what we know about the range and distribution of the parameters in our model, while the likelihood quantifies what we know about our measurement apparatus, and, in particular, the nature of the measurement noise. The choice of prior is especially problematic in a new field where there is little to guide the choice. For example, electromagnetic observations and population synthesis models give some guidance about black hole masses, but the mass range and distribution is currently not well constrained. The choice of likelihood can also be challenging when the measurement noise deviates from the stationary, Gaussian ideal. More details related to the choice of likelihood and choice of prior will be given in Sect. 3.6.
In addition to parameter estimation, statistical inference is used to select between competing models, or hypotheses, such as, “is there a gravitationalwave signal in the data or not?” Thanks to GW150914 and GW151226, we know that gravitationalwave signals are already present in existing data sets, but most are at levels where we are unable to distinguish them from noise processes. For detection we demand that a model for the data that includes a gravitationalwave signal be favored over a model having no gravitationalwave signal. In Bayesian inference a detection might be announced when the odds ratio between models with and without gravitationalwave signals gets sufficiently large, while in frequentist inference a detection might be announced when the pvalue for some test statistic is less than some prescribed threshold. These different approaches to deciding whether or not to claim a detection (e.g., Bayesian model selection or frequentist hypothesis testing), as well as differences in regard to parameter estimation, are described in the following subsections. Table 2 provides an overview of the key similarities and differences between frequentist and Bayesian inference, to be described in detail below.
Frequentist statistics
As mentioned above, classical or frequentist statistics is a branch of statistical inference that interprets probability as the “longrun relative occurrence of an event in a set of identical experiments.” Thus, for a frequentist, probabilities can only be assigned to propositions about outcomes of (in principle) repeated experiments (i.e., random variables) and not to hypotheses or parameters describing the state of nature, which have fixed but unknown values. In this interpretation, the measured data are drawn from an underlying probability distribution, which assumes the truth of a particular hypothesis or model. The probability distribution for the data is just the likelihood function, which we can write as p(dH), where d denotes the data and H denotes an hypothesis.
Statistics play an important role in the frequentist framework. These are random variables constructed from the data, which typically estimate a signal parameter or indicate how well the data fit a particular hypothesis. Although it is common to construct statistics from the likelihood function (e.g., the maximumlikelihood statistic for a particular parameter, or the maximumlikelihood ratio to compare a signalplusnoise model to a noiseonly model), there is no a priori restriction on the form of a statistic other than it be some function of the data. Ultimately, it is the goal of the analysis and the cleverness of the data analyst that dictate which statistic (or statistics) to use.
To make statistical inferences in the frequentist framework requires knowledge of the probability distribution (also called the sampling distribution) of the statistic. The sampling distribution can either be calculated analytically (if the statistic is sufficiently simple) or via Monte Carlo simulations, which effectively construct a histogram of the values of the statistic by simulating many independent realizations of the data. Given a statistic and its sampling distribution, one can then calculate either confidence intervals for parameter estimation or pvalues for hypothesis testing. (These will be discussed in more detail below). Note that a potential problem with frequentist statistical inference is that the sampling distribution depends on data values that were not actually observed, which is related to how the experiment was carried out or might have been carried out. The socalled stopping problem of frequentist statistics is an example of such a problem (Howson and Urbach 2006).
Frequentist hypothesis testing
Suppose, as a frequentist, you want to test the hypothesis \(H_1\) that a gravitationalwave signal, having some fixed but unknown amplitude \(a>0\), is present in the data. Since you cannot assign probabilities to hypotheses or to parameters like a as a frequentist, you need to introduce instead an alternative (or null) hypothesis \(H_0\), which, for this example, is the hypothesis that there is no gravitationalwave signal in the data (i.e., that \(a=0\)). You then argue for \(H_1\) by arguing against \(H_0\), similar to proof by contradiction in mathematics. Note that \(H_1\) is a composite hypothesis since it depends on a range of values of the unknown parameter a. It can be written as the union, \(H_1=\cup _{a>0} H_a\), of a set of simple hypotheses \(H_a\) each corresponding to a single fixed value of the parameter a.
To rule either in favor or against \(H_0\), you construct a statistic \(\Lambda \), called a test or detection statistic, on which the statistical test will be based. As mentioned above, you will need to calculate analytically or via Monte Carlo simulations the sampling distribution for \(\Lambda \) under the assumption that the null hypothesis is true, \(p(\Lambda H_0)\). If the observed value of \(\Lambda \) lies far out in the tails of the distribution, then the data are most likely not consistent with the assumption of the null hypothesis, so you reject \(H_0\) (and thus accept \(H_1\)) at the \(p*100\)% level, where
This is the socalled pvalue (or significance) of the test; it is illustrated graphically in Fig. 3. The pvalue required to reject the null hypothesis determines a threshold \(\Lambda _*\), above which you reject \(H_0\) and accept \(H_1\) (e.g., claim a detection). It is related to the false alarm probability for the test as we explain below.
The above statistical test is subject to two types of errors: (i) type I or false alarm errors, which arise if the data are such that you reject the null hypothesis (i.e., \(\Lambda _\mathrm{obs}>\Lambda _*\)) when it is actually true, and (ii) type II or false dismissal errors, which arise if the data are such that you accept the null hypothesis (i.e., \(\Lambda _\mathrm{obs}<\Lambda _*\)) when it is actually false. The false alarm probability \(\alpha \) and false dismissal probability \(\beta (a)\) are given explicitly by
where a is the amplitude of the gravitationalwave signal, assumed to be present under the assumption that \(H_1\) is true. To calculate the false dismissal probability \(\beta (a)\), one needs the sampling distribution of the test statistic assuming the presence of a signal with amplitude a.
Different test statistics are judged according to their false alarm and false dismissal probabilities. Ideally, you would like your statistical test to have false alarm and false dismissal probabilities that are both as small as possible. But these two properties compete with one another as setting a larger threshold value to minimize the false alarm probability will increase the false dismissal probability. Conversely, setting a smaller threshold value to minimize the false dismissal probability will increase the false alarm probability.
In the context of gravitationalwave data analysis, the gravitationalwave community is (at least initially) reluctant to falsely claim detections. Hence the false alarm probability is set to some very low value. The best statistic then is the one that minimizes the false dismissal probability (i.e., maximizes detection probability) for fixed false alarm. This is the Neyman–Pearson criterion. For medical diagnosis, on the other hand, a doctor is very reluctant to falsely dismiss an illness. Hence the false dismissal probability will be set to some very low value. The best statistic then is the one which minimizes the false alarm probability for fixed false dismissal.
Frequentist detection probability
The value \(1\beta (a)\) is called the detection probability or power of the test. It is the fraction of times that the test statistic \(\Lambda \) correctly identifies the presence of a signal of amplitude a in the data, for a fixed false alarm probability \(\alpha \) (which sets the threshold \(\Lambda _*\)). A plot of detection probability versus signal strength is often used to show how strong a signal has to be in order to detect it with a certain probability. Since detection probability does not depend on the observed data—it depends only on the sampling distribution of the test statistic and a choice for the false alarm probability—detection probability curves are often used as a figureofmerit for proposed search methods for a signal. Figure 4 shows a detection probability curve, with the value of a needed to be detectable with 90% frequentist probability indicated by the dashed vertical line. We will denote this value of a by \(a^{90\%,\mathrm{DP}}\). Note that as the signal amplitude goes to zero, the detection probability reduces to the false alarm probability \(\alpha \), which for this example was chosen to be 0.10.
Frequentist upper limits
In the absence of a detection (i.e., if the observed value of the test statistic is less than the detection threshold \(\Lambda _*\)), one can still set a bound (called an upper limit) on the strength of the signal that one was trying to detect. The upper limit depends on the observed value of the test statistic, \(\Lambda _\mathrm{obs}\), and a choice of confidence level, \(\mathrm {CL}\), interpreted in the frequentist framework as the longrun relative occurrence for a set of repeated identical experiments. For example, one defines the 90% confidencelevel upper limit \(a^{90\%,\mathrm{UL}}\) as the minimum value of a for which \(\Lambda \ge \Lambda _\mathrm{obs}\) at least 90% of the time:
In other words, if the signal has an amplitude \(a^{90\%,\mathrm{UL}}\) or higher, we would have detected it in at least 90% of repeated observations. A graphical representation of a frequentist upper limit is given in Fig. 5.
Frequentist parameter estimation
The frequentist prescription for estimating the value of a particular parameter a, like the amplitude of a gravitationalwave signal, is slightly different than the method used to claim a detection. You need to first construct a statistic (called an estimator) \(\hat{a}\) of the parameter a you are interested in. (This might be a maximumlikelihood estimator of a, but other estimators can also be used). You then calculate its sampling distribution \(p(\hat{a}a, H_a)\). Note that statements like
which one constructs from \(p(\hat{a}a,H_a)\) make sense in the frequentist framework, since \(\hat{a}\) is a random variable. Although the above inequality can be rearranged to yield
this should not be interpreted as a statement about the probability of a lying within a particular interval \([\hat{a}\Delta ,\hat{a}+\Delta ]\), since a is not a random variable. Rather, it should be interpreted as a probabilistic statement about the set of intervals \(\{[\hat{a}\Delta ,\hat{a}+\Delta ]\}\) for all possible values of \(\hat{a}\). Namely, in a set of many repeated experiments, 0.95 is the fraction of the intervals that will contain the true value of the parameter a. Such an interval is called a \(95\%\) frequentist confidence interval. This is illustrated graphically in Fig. 6.
It is important to point out that an estimator can sometimes take on a value of the parameter that is not physically allowed. For example, if the parameter a denotes the amplitude of a gravitationalwave signal (so physically \(a\ge 0\)), it is possible for \(\hat{a} <0\) for a particular realization of the data. Note that there is nothing mathematically wrong with this result. Indeed, the sampling distribution for \(\hat{a}\) specifies the probability of obtaining such values of \(\hat{a}\). It is even possible to have a confidence interval \([\hat{a}\Delta , \hat{a}+\Delta ]\) all of whose values are unphysical, especially if one is trying to detect a weak signal in noise. Again, this is mathematically allowed, but it is a little awkward to report a frequentist confidence interval that is completely unphysical. We shall see that within the Bayesian framework unphysical intervals and unphysical posteriors never arise, as a simple consequence of including a prior distribution on the parameter that requires \(a > 0\).
Unified approach for frequentist upper limits and confidence intervals
Frequentists also have a way of avoiding unphysical or empty confidence intervals, which at the same time unifies the treatment of upper limits for null results and twosided intervals for nonnull results. This procedure, developed by Feldman and Cousins (1998), also solves the problem that the choice of an upper limit or twosided confidence interval leads to intervals that do not have the proper coverage (i.e., the probability that an interval contains the true value of a parameter does not match the stated confidence level) if the choice of reporting an upper limit or twosided confidence interval is based on the data and not decided upon before performing the experiment.
The basic idea underlying this unified approach to frequentist intervals is a new specification (or ordering) of the values of the random variable to include in the acceptance intervals for an unknown parameter. If we let a denote the parameter whose value we are trying to determine, and \(\hat{a}\) be an estimator of a with sampling distribution \(p(\hat{a}a,H_a)\), then the choice of acceptance intervals becomes, for each value of a, how do we choose \([\hat{a}_1, \hat{a}_2]\) such that
where \(\mathrm {CL}\) is the confidence level, e.g., \(\mathrm {CL}=0.95\). The ordering principle proposed by Feldman and Cousins (1998) is based on the ranking function
where \(a_\mathrm{best}\) is the value of the parameter a that maximizes the sampling distribution \(p(\hat{a}a,H_a)\) for a given value of \(\hat{a}\). The prescription then for constructing the acceptance intervals is to find, for each allowed value of a, values of \(\hat{a}_1\) and \(\hat{a}_2\) such that \(R(\hat{a}_1a)=R(\hat{a}_2a)\) and for which (3.9) is satisfied. The set of all such acceptance intervals for different values of a forms a confidence belt in the \(\hat{a}a\)plane, which is then used to construct an upper limit or a twosided confidence interval for a particular observed value of the estimator \(\hat{a}\), as explained below and illustrated in Fig. 7.
As a specific example, let us suppose that \(\hat{a}\) is Gaussiandistributed about a with variance \(\sigma ^2\):
and that the unknown parameter a represents the amplitude of a signal, so that \(a > 0\). (Recall that it is possible, however, for the estimator \(\hat{a}\) to take on negative values). Then \(a_\mathrm{best}=\hat{a}\) if \(\hat{a} > 0\), while \(a_\mathrm{best} = 0\) if \(\hat{a} \le 0\), for which
and
The confidence belt constructed from this ranking function is shown in Fig. 7. The solid horizontal line at \(a=2\) shows the corresponding 95% confidencelevel acceptance interval for this ranking function. The two dashed vertical lines correspond to two different observed values for the estimator \(\hat{a}\), leading to a 95% confidencelevel upper limit and twosided interval, respectively.
Bayesian inference
In the following subsections, we again describe parameter estimation and hypothesis testing, but this time from the perspective of Bayesian inference.
Bayesian parameter estimation
In Bayesian inference, a parameter, e.g., a, is estimated in terms of its posterior distribution, p(ad), in light of the observed data d. As discussed in the introduction to this section, the posterior p(ad) can be calculated from the likelihood p(da) and the prior probability distribution p(a) using Bayes’ theorem
The posterior distribution tells you everything you need to know about the parameter, although you might sometimes want to reduce it to a few numbers—e.g., its mode, mean, standard deviation, etc.
Given a posterior distribution p(ad), a Bayesian confidence interval (often called a credible interval given the Bayesian interpretation of probability as degree of belief, or state of knowledge, about an event) is simply defined in terms of the area under the posterior between one parameter value and another. This is illustrated graphically in Fig. 8, for the case of a 95% symmetric credible interval, centered on the mode of the distribution \(a_\mathrm{mode}\). If the posterior distribution depends on two parameters a and b, but you really only care about a, then you can obtain the posterior distribution for a by marginalizing the joint distribution p(a, bd) over b:
where the second equality follows from the relationship between joint probabilities and conditional probabilities, e.g., \(p(ab,d) p(b) = p(a,bd)\). Variables that you don’t particularly care about (e.g., the variance of the detector noise as opposed to the strength of a gravitationalwave signal) are called nuisance parameters. Although nuisance parameters can be handled in a straightforward manner using Bayesian inference, they are problematic to deal with (i.e., they are a nuisance!) in the context of frequentist statistics. The problem is that marginalization doesn’t make sense to a frequentist, for whom parameters cannot be assigned probability distributions.
The interpretation of Bayes’ theorem (3.14) is that our prior knowledge is updated by what we learn from the data, as measured by the likelihood, to give our posterior state of knowledge. The amount learned from the data is measured by the information gain
Using a natural logarithm gives the information in nats, while using a base 2 logarithm gives the information in bits. If the data tells us nothing about the parameter, then \(p(d\vert a) = \mathrm{constant}\), which implies \(p(a\vert d)=p(a)\) and thus \(I=0\).
Bayesian upper limits
A Bayesian upper limit is simply a Bayesian credible interval for a parameter with the lower end point of the interval set to the smallest value that the parameter can take. For example, the Bayesian 90% upper limit on a parameter \(a> 0\) is defined by:
where probability is interpreted as degree of belief, or state of knowledge, that the parameter a has a value in the indicated range. One usually sets an upper limit on a parameter when the mode of the distribution for the parameter being estimated is not sufficiently displaced from zero, as shown in Fig. 9.
Bayesian model selection
Bayesian inference can easily be applied to multiple models or hypotheses, each with a different set of parameters. In what follows, we will denote the different models by \(\mathcal{M}_\alpha \), where the index \(\alpha \) runs over the different models, and the associated set of parameters by the vector \(\mathbf {\theta }_\alpha \). The joint posterior distribution for the parameters \(\mathbf {\theta }_\alpha \) is given by
and the model evidence is given by
where we marginalize over the parameter values associated with that model. The posterior probability for model \(\mathcal{M}_\alpha \) is given by Bayes’ theorem as
where the normalization constant p(d) involves a sum over all possible models:
Since the space of all possible models is generally unknown, the sum is usually taken over the subset of models being considered. The normalization can be avoided by considering the posterior odds ratio between two models:
The first ratio on the righthand side of the above equation is the prior odds ratio for models \(\alpha ,\beta \), while the second term is the evidence ratio, or Bayes factor,
The prior odds ratio is often taken to equal unity, but this is not always justified. For example, the prior odds that a signal is described by general relativity versus some alternative theory of gravity should be much larger than unity given the firm theoretical and observational footing of Einstein’s theory.
While the foundations of Bayesian inference were laid out by Laplace in the 1700s, it did not see widespread use until the late twentieth century with the advent of practical implementation schemes and the development of fast electronic computers. Today, Monte Carlo sampling techniques, such as Markov Chain Monte Carlo (MCMC) and Nested Sampling, are used to sample the posterior and estimate the evidence (Skilling 2006; Gair et al. 2010). Successfully applying these techniques is something of an art, but in principle, once the likelihood and prior have been written down, the implementation of Bayesian inference is purely mechanical. Calculating the likelihood and choosing a prior will be discussed in some detail in Sect. 3.6.
Relating Bayesian and frequentist detection statements
It is interesting to compare the Bayesian model selection calculation discussed above to frequentist hypothesis testing based on the maximumlikelihood ratio. For concreteness, let us assume that we have two models \(\mathcal{M}_0\) (noiseonly) and \(\mathcal{M}_1\) (noise plus gravitationalwave signal), with parameters \(\mathbf {\theta }_n\) and \(\{\mathbf {\theta }_n,\mathbf {\theta }_h\}\), respectively. The frequentist detection statistic will be defined in terms of the ratio of the maxima of the likelihood functions for the two models:
As described above, the Bayes factor calculation also involves a ratio of two quantities, the model evidences \(p(d\vert \mathcal{M}_1)\) and \(p(d\vert \mathcal{M}_0)\), but instead of maximizing over the parameters, we marginalize over the parameters:
These two expressions can be related using Laplace’s approximation to individually approximate the model evidences \(p(d\vert \mathcal{M}_1)\) and \(p(d\vert \mathcal{M}_0)\). This approximation is valid when the data are informative—i.e., when the likelihood functions are peaked relative to the joint prior probability distributions of the parameters. For an arbitrary model \(\mathcal{M}\) with parameters \(\mathbf {\theta }\), the Laplace approximation yields:
where \(\mathbf {\theta }_\mathrm{ML}\equiv \mathbf {\theta }_\mathrm{ML}(d)\) maximizes the likelihood with respect to variations of \(\mathbf {\theta }\) given the data d; \(\Delta V_\mathcal{M}\) is the characteristic spread of the likelihood function around its maximum (the volume of the uncertainty ellipsoid for the parameters); and \(V_\mathcal{M}\) is the total parameter space volume of the model parameters. Applying this approximation to models \(\mathcal{M}_0\) and \(\mathcal{M}_1\) in (3.25), we obtain
or, equivalently,
The second term on the righthand side of the above equation is negative and penalizes models that require a larger parameter space volume than necessary to fit the data. This is basically an Occam penalty factor, which prefers the simpler of two models that fit the data equally well. The first term has the interpretation of being the squared signaltonoise ratio of the data, assuming an additive signal in Gaussianstationary noise, and it can be used as an alternative frequentist detection statistic in place of \(\Lambda _\mathrm{ML}\).
Table 3 from Kass and Raftery (1995) gives a range of Bayes factors and their interpretation in terms of the strength of the evidence in favor of one model relative to another. The precise levels at which one considers the evidence to be “strong” or “very strong” is rather subjective. But recent studies (Cornish and Sampson 2016; Taylor et al. 2016a) in the context of pulsar timing have been trying to make this correspondence a bit firmer, using sky and phase scrambles to effectively destroy signalinduced spatial correlations between pulsars while retaining the statistical properties of each individual dataset. This is similar to doing timeslides for LIGO analyses, which are used to assess the significance of a detection.
Taylor et al. (2016a) even go so far as to perform a hybrid frequentistBayesian analysis, doing Monte Carlo simulations: (i) over different noiseonly realizations, and (ii) over different sky and phase scrambles, which null the correlated signal. These simulations produce different null distributions for the Bayes factor, similar to a nullhypothesis distribution for a frequentist detection statistic (in this case, the log of the Bayes factor). The significance of the measured Bayes factor is then its corresponding pvalue with respect to one of these null distributions. The utility of such a hybrid analysis is its ability to better assess the significance of a detection claim, especially when there might be questions about the suitability of one of the models (e.g., the noise model) used in the construction of a likelihood function.
Simple example comparing Bayesian and frequentist analyses
To further illustrate the relationship between Bayesian and frequentist analyses, we consider in this section a very simple example—a constant signal with amplitude \(a>0\) in white, Gaussian noise (zero mean, variance \(\sigma \)):
where the index i labels the individual samples of the data. The likelihood functions for the noiseonly and signalplusnoise models \(\mathcal{M}_0\) and \(\mathcal{M}_1\) are thus simple Gaussians:
We will assume that the value of \(\sigma \) is known a priori. Thus, the noise model has no free parameters, while the signal model has just one parameter, which is the amplitude of the signal that we are trying to detect. We will choose our prior on a to be flat over the interval \((0,a_\mathrm{max}]\), so \(p(a)=1/a_\mathrm{max}\).
It is straightforward exercise to check that the maximumlikelihood estimator of the amplitude a is given by the sample mean of the data:
This is an unbiased estimator of a and has variance \(\sigma _{\hat{a}}^2 =\sigma ^2/N\) (the familiar variance of the sample mean). Thus, the sampling distribution of \(\hat{a}\) is simply
where \(\hat{a}\) can take on either positive or negative values (even though \(a>0\)).
To compute the posterior distribution \(p(ad,\mathcal{M}_1)\) for the Bayesian analysis, we first note that
The model evidence \(p(d  \mathcal{M}_1)\) is then given by
and the posterior distribution is given by
Note that this is simply a truncated Gaussian on the interval \(a\in (0,a_\mathrm{max}]\), with mean \(\hat{a}\) and variance \(\sigma _{\hat{a}}^2\).
The above calculation shows that \(\hat{a}\) is a sufficient statistic for a. This means that the posterior distribution for a can be written simply in terms of \(\hat{a}\), in lieu of the individual samples \(d\equiv \{d_1, d_2, \ldots , d_N\}\). The Bayes factor
is given by
In the limit where \(\hat{a}\) is tightly peaked away from 0 and \(a_\mathrm{max}\), the Bayes factor simplifies to
If we take the frequentist detection statistic to be twice the log of the maximumlikelihood ratio, \(\Lambda (d) \equiv 2\ln \Lambda _\mathrm{ML}(d)\), then
which is just the squared signaltonoise ratio of the data. Furthermore, taking twice the log of the approximate Bayes factor in (3.39) gives
where the first term is just the frequentist detection statistic and second term expresses the Occam penalty. This last result is consistent with the general relation (3.28) discussed in the previous subsection.
The statistical distribution of the frequentist detection statistic can be found in closed form for this simple example. Since a linear combination of Gaussian random variables is also Gaussiandistributed, \(\Lambda \) is the square of a (single) Gaussian random variable \(\rho =\bar{d}\sqrt{N}/\sigma \). Moreover, since \(\rho \) has mean \(\mu \equiv a\sqrt{N}/\sigma \) and unit variance, the sampling distribution for \(\Lambda \) in the presence of a signal is a noncentral chisquared distribution with one degree of freedom and noncentrality parameter \(\lambda \equiv \mu ^2 = a^2 N/\sigma ^2\):
where \(I_{1/2}\) is a modified Bessel function of the first kind of order \(1/2\). In the absence of a signal (i.e., when a and hence \(\lambda \) are equal to zero), \(\Lambda \) is given by an (ordinary) chisquared distribution with one degree of freedom:
where \(\Gamma \) is the gamma function. Substituting explicit expressions for \(I_{1/2}(\sqrt{\lambda \Lambda })\) and \(\Gamma (1/2)\), we find:
An equalprobability contour plot of the sampling distribution of the detection statistic is shown in Fig. 10. The fact that we are able to write down analytic expressions for the sampling distributions for the detection statistic \(\Lambda \) is due to the simplicity of the signal and noise models. For more complicated realworld problems, these distributions would need to be generated numerically using fake signal injections and timeshifts to produce many different realizations of the data (signal plus noise) from which one can build up the distributions.
It is also important to point out that \(\Lambda \) is not a sufficient statistic for a, due to the fact that \(\Lambda \) involves the square of the maximumlikelihood estimate \(\hat{a}\)—i.e., \(\Lambda = \hat{a}^2 N/\sigma ^2\). Thus, we cannot take \(p(\Lambda a,\mathcal{M}_1)\) conditioned on \(\Lambda \) (assuming a flat prior on a from \([0,a_\mathrm{max}]\)) to get the posterior distribution for a given d, since we would be missing out on data samples that give negative values for \(\hat{a}\). Another way to see this is to start with \(p(\Lambda a,\mathcal{M}_1)\) given by (3.45), and then make a change of variables from \(\Lambda \) to \(\hat{a}\) using the general transformation relation
This leads to
which is properly normalized for \(\hat{a}>0\), but differs from (3.33) due to the second term involving \(\hat{a}+a\). Thus, we need to construct p(ad) from (3.33)—and not from (3.47)—if we want the posterior to have the proper dependence on a.
Simulated data
For our example, we will take \(N=100\) samples, \(\sigma =1\), and \(a_\mathrm{max} =1.0\). We also simulate data with injected signals having amplitudes \(a_0=0.05\) and 0.3, respectively. Since the expected signaltonoise ratio, \(a\sqrt{N}/\sigma \), is given by 0.5 and 3.0, these injections correspond to weak and (moderately) strong signals. Single realizations of the data for the two different injections are shown in Fig. 11. The noise realization is the same for the two injections.
Frequentist analysis
Given the values for N, \(\sigma \), and the probability distributions (3.44) and (3.45) for the frequentist detection statistic \(\Lambda \), we can calculate the detection threshold for fixed false alarm probability \(\alpha \) (which we will take to equal 10%), and the corresponding detection probability as a function of the amplitude a. The detection threshold turns out to equal \(\Lambda _* = 2.9\) (so 10% of the area under the probability distribution \(p(\Lambda \mathcal{M}_0)\) is for \(\Lambda \ge \Lambda _*\)). The value of the amplitude a needed for 90% confidence detection probability with 10% false alarm probability is given by \(a^{90\%,\mathrm{DP}}=0.30\). (These results for the detection threshold and detection probability do not depend on the particular realizations of the simulated data). The corresponding curves are shown in Fig. 12.
The sample mean of the data for the two simulations are given by \(\bar{d} = 0.085\) and 0.335, respectively. Since \(\hat{a} = \bar{d}\), these are also the values of the maximumlikelihood estimator of the amplitude a. The corresponding values of the detection statistic are \(\Lambda _\mathrm{obs} = 0.72\) and 11.2 for the two injections, and have pvalues equal to 0.45 and \(9.0\times 10^{4}\), as shown in Fig. 13. The 95% frequentist confidence interval is given simply by \([\hat{a}2\sigma _{\hat{a}},\hat{a}+2\sigma _{\hat{a}}]\), since \(\hat{a}\) is Gaussiandistributed, and has values \([0.11,0.29]\) and [0.14, 0.54], respectively. These intervals contain the true value of the amplitudes for the two injections, \(a_0=0.05\) and 0.3.
The 90% confidencelevel frequentist upper limits are \(a^{90\%,\mathrm{UL}}= 0.20\) and 0.46, respectively. Figure 14 shows the probability distributions for the detection statistic \(\Lambda \) conditioned on these upper limit values for which the probability of obtaining \(\Lambda \ge \Lambda _\mathrm{obs}\) is equal to 0.90.
Bayesian analysis
The results of the Bayesian analysis for the two different injections are summarized in Fig. 15. The plots show the posterior distribution for the amplitude a given the value of the maximumlikelihood estimator \(\hat{a}\), which (as we discussed earlier) is a sufficient statistic for the data d. Recall that the posterior for a for this example is simply a truncated Gaussian from 0 to \(a_\mathrm{max}\) centered on \(\hat{a}\), which could be negative, see (3.36). The left two panels show the graphical construction of the Bayesian 90% upper limit and 95% credible interval for the amplitude a for the weak injection, \(a^{90\%,\mathrm{UL}}=0.23\) and [0, 0.26]. The right two panels show similar plots for the strong injection, \(a^{90\%,\mathrm{UL}}=0.46\) and [0.14, 0.54].
Finally, the Bayes factor for the signalplusnoise model \(\mathcal{M}_1\) relative to the noiseonly model \(\mathcal{M}_0\) can be calculated by taking the ratio of the marginalized likelihood \(p(d\mathcal{M}_1)\) given by (3.35) to \(p(d\mathcal{M}_0)\) given by (3.30). Doing this, we find 2 ln \(B_{10} = 2.2\) and 9.2 for the weak and strong signal injections, respectively. The Laplace approximation to this quantity is given by (3.41), with values \(2.0\) and 8.5, respectively.
Comparison summary
Table 4 summarizes the numerical results for the frequentist and Bayesian analyses. We see that the frequentist and Bayesian 90% upper limits and 95% intervals numerically agree for the strong injection, but differ slightly for the weak injection. The interpretation of these results is different, of course, for a frequentist and a Bayesian, given their different definitions of probability. But for a moderately strong signal in noisy data, we expect both approaches to yield a confident detection as they have for this simple example.
Likelihoods and priors for gravitationalwave searches
To conclude this section on statistical inference, we discuss some issues related to calculating the likelihood and choosing a prior in the context of searches for gravitationalwave signals using a network of gravitationalwave detectors.
Calculating the likelihood
Defining the likelihood function (for either a frequentist or Bayesian analysis) involves understanding the instrument response and the instrument noise. The data collected by gravitationalwave detectors comes in a variety of forms. For groundbased interferometers such as LIGO and Virgo, the data comes from the error signal in the differential armlength control system, which is nonlinearly related to the laser phase difference, which in turn is linearly related to the gravitationalwave strain. For pulsar timing arrays, the data comes from the arrival times of radio pulses (derived from the folded pulse profiles), which must be corrected using a complicated timing model that takes into account the relative motion of the telescopes and the pulsars, along with the spindown of the pulsars, in addition to a variety of propagation effects. The timing residuals formed by subtracting the timing model from the raw arrival times contain perturbations due to gravitational waves integrated along the line of sight to the pulsar. For future spacebased gravitationalwave detectors such as LISA, the data will be directly read out from phase meters that perform a heterodyne measurement of the laser phase. Synthetic combinations of these phase read outs (chosen to cancel laser phase noise) are then linearly proportional to the gravitationalwave strain.
Since gravitational waves can be treated as small perturbations to the background geometry, the time delays or laser phase/frequency shifts caused by a gravitational wave can easily be computed. These idealized calculations have then to be related to the actual observations, either by propagating the effects through an instrument response model, or, alternatively, inverting the response model to convert the measured data to something proportional to the gravitationalwave strain. (For example, most LIGO analyses work with the calibrated strain, rather than the raw differential error signal). If we assume that the gravitationalwave signal and the instrument noise are linearly independent, then the data taken at time t can be written as
where h(t) is shorthand for the gravitationalwave metric perturbations \(h_{ab}(t,{\vec {x}})\) convolved with the instrument response function and converted into the appropriate quantity—phase shift, time delay, differential arm length error, etc. (A detailed calculation of h(t) and the associated detector response functions will be given in Sect. 5.2). As mentioned above, the data d(t) may be the quantity that is measured directly, or, more commonly, some quantity that is derived from the measurements such as timing residuals or calibrated strain. In any analysis, it is important to marginalize over the model parameters used to make the conversion from the raw data.
The likelihood of observing d(t) is found by demanding that the residual
be consistent with a draw from the noise distribution \(p_n(x)\):
Here \(\bar{h}(t)\) is our model^{Footnote 6} for the gravitationalwave signal. The likelihood of observing a collection of discretelysampled data \(d \equiv \{ d_1, d_2, \ldots , d_N\}\), where \(d_i\equiv d(t_i)\), is then given by \(p( d\vert \bar{h}) = p_n(r)\), where \(r\equiv \{r_1, r_2,\ldots , r_N\}\) with \(r_i\equiv r(t_i)\). Since instrument noise is due to a large number of small disturbances combined with counting noise in the largenumber limit, the central limit theorem suggests that the noise distribution can be approximated by a multivariate normal (Gaussian) distribution:
where \(C_n\) is the noise correlation matrix, with components
If the noise is stationary, then the correlation matrix only depends on the lag \(\vert t_i  t_j\vert \), and the matrix \(C_n\) can be (approximately) diagonalized by transforming to the Fourier domain, where \(r_i\) should then be interpreted as \(\tilde{r}(f_i)\) (see Appendix D.6 for a more careful treatment of discrete probability distributions in the time and frequency domain). In practice, the noise observed in most gravitationalwave experiments is neither stationary nor Gaussian (Sect. 9 and Appendix C), but (3.51) still serves as a good starting point for more sophisticated treatments. The Gaussian likelihood (3.51) immediately generalizes for a network of detectors:
where I, J labels the detector, and i, j labels the discrete time or frequency sample for the corresponding detector. Note here that the parameters \(\mathbf {\theta }\) appearing in (3.18) are the individual time or frequency samples \(\bar{h}_i\).
Choosing a prior
For Bayesian inference, it is also necessary to define a model \(\mathcal{M}\) for the gravitationalwave signal, which is done by placing a prior \(p(\bar{h} \vert \mathcal{M})\) on the samples \(\bar{h}_i\). In some cases, a great deal is known about the signal model, such as when approximate solutions to Einstein’s equations provide waveform templates. In that case the prior can be written as
Marginalizing over \(\bar{h}\) converts the posterior \(p(\bar{h} \vert d)\) to a posterior distribution for the signal parameters \(p(\mathbf {\theta }\vert d, \mathcal{M})\). In other cases, such as for shortduration bursts associated with certain violent astrophysical events, much less is known about the possible signals and weaker priors have to be used. Models using wavelets, which have finite timefrequency support, and priors that favor connected concentrations of power in the timefrequency plane are commonly used for these “unmodeled burst” searches. At the other end of the spectrum from deterministic point sources are the statisticallyisotropic stochastic backgrounds that are thought to be generated by various processes in the early Universe, or through the superposition of a vast number of weak astrophysical sources. In the case of Gaussian stochastic signals, the prior for a signal \(\bar{h}=(\bar{h}{}_+(\hat{n}),\bar{h}{}_\times (\hat{n}))\) coming from direction \(\hat{n}\) direction has the form
where \(S_h\) is the power spectrum of the background. As we shall show in Sect. 4, marginalizing over \(\bar{h}\) converts the posterior \(p(\bar{h} \vert d)\) to a posterior \(p(S_h \vert d, \mathcal{M})\) for \(S_h\).
Correlations
Correlation is not cause, it is just a ‘music of chance’. Siri Hustvedt
Stochastic gravitational waves are indistinguishable from unidentified instrumental noise in a single detector, but are correlated between pairs of detectors in ways that differ, in general, from instrumental noise. Crosscorrelation methods basically use the random output of one detector as a template for the other, taking into account the physical separation and relative orientation of the two detectors. In this section, we introduce crosscorrelation methods in the context of both frequentist and Bayesian inference, analyzing in detail a simple toy problem (the data are “white” and we ignore complications that come from the separation and relative orientation of the detectors—this we discuss in detail in Sect. 5). We also briefly discuss possible alternatives to crosscorrelation methods, e.g., using a null channel as a noise calibrator.
The basic idea of using crosscorrelation to search for stochastic gravitationalwaves can be found in several early papers (Grishchuk 1976; Hellings and Downs 1983; Michelson 1987; Christensen 1990, 1992; Flanagan 1993). The derivation of the likelihood function in Sect. 4.2 follows that of Cornish and Romano (2013); parts of Sect. 4.4 are also discussed in Allen et al. (2003) and Drasco and Flanagan (2003).
Basic idea
The key property that allows one to distinguish a stochastic gravitationalwave background from instrumental noise is that the gravitationalwave signal is correlated across multiple detectors while instrumental noise typically is not. To see this, consider the simplest possible example, i.e., a single sample of data from two colocated and coaligned detectors:
Here h denotes the common gravitationalwave signal and \(n_1\), \(n_2\) the noise in the two detectors. To cross correlate the data, we simply form the product of the two samples, \(\hat{C}_{12}\equiv d_1 d_2\). The expected value of the correlation is then
since the gravitationalwave signal and the instrumental noise are uncorrelated. If the instrumental noise in the two detectors are also uncorrelated, then
which implies
This is just the variance (or power) of the stochastic gravitationalwave signal. So by crosscorrelating data in two (or more) detectors, we can extract the common gravitationalwave component.
We have assumed here that there is no crosscorrelated noise (instrumental or environmental). If there is correlated noise, then the simple procedure describe above needs to be augmented. This will be discussed in more detail in Sect. 9.6.
Relating correlations and likelihoods
The crosscorrelation approach arises naturally from a standard likelihood analysis if we adopt a Gaussian stochastic template for the signal. Revisiting the example from the previous section, let’s assume that the detector noise is Gaussiandistributed with variances \(S_{n_1}\) and \(S_{n_2}\). Then the likelihood function for the data \(d\equiv (d_1,d_2)\) for the noiseonly model \(\mathcal{M}_0\) is simply
For the signalplusnoise model \(\mathcal{M}_1\), we have
where the gravitationalwave signal \(\bar{h}\) is assumed to be a Gaussian random deviate with probability distribution
In most applications we are not interested in the value of \(\bar{h}\), but rather the power \(S_h\). Marginalizing over \(\bar{h}\), the likelihood takes the form
where
Maximizing the likelihood with respect to \(S_h\), \(S_{n_1}\) and \(S_{n_2}\) yields the maximumlikelihood estimators
Thus, the crosscorrelation statistic \(\hat{C}_{12}\) is the maximumlikelihood estimator for a Gaussian stochastic gravitational wave template with zero mean and variance \(S_h\).
Extension to multiple data samples
The extension to multiple data samples
is fairly straightforward. In the following two subsections, we consider the cases where the detector noise and stochastic signal are either: (i) both white (i.e., the data are uncorrelated between time samples) or (ii) both colored (i.e., allowing for correlations in time). The white noise example will be analyzed in more detail in Sects. 4.4–4.6.
White noise and signal
If the detector noise and stochastic signal are both white, then the likelihood functions for the data \(d\equiv \{d_{1i};d_{2i}\}\), are simply products of the likelihoods (4.5) and (4.8) for the individual data samples. We can write these product likelihoods as single multivariate Gaussian distributions:
where
The arguments in the exponential have the form
and similarly for \(d^T C^{1} d\). The maximumlikelihood estimators for this case are:
Note that these are just averages of the singledatum estimators (4.10) over the N independent data samples.
A couple of remarks are in order: (i) It is easy to show that the expectation values of the estimators are the true values of the parameters \(S_h\), \(S_{n_1}\), \(S_{n_2}\). It is also fairly straightforward to calculate the variances of the estimators. In particular,
Note that this expression reduces to \(\mathrm{Var}(\hat{S}_h)\approx S_{n_1}S_{n_2}/N\) in the weaksignal limit, \(S_h\ll S_{n_I}\), for \(I=1,2\). (ii) If we simply maximized the likelihood with respect to variations of \(S_h\), treating the noise variances \(S_{n_1}\) and \(S_{n_2}\) as known parameters, then the frequentist estimator of \(S_h\) would also include autocorrelation terms for each detector:
In practice, however, the noise variances are not known well enough to be able to extract useful information from the autocorrelation terms; they actually worsen the performance of the simple crosscorrelation estimator when the uncertainty in \(S_{n_1}\) or \(S_{n_2}\) is greater than or equal to \(S_h\).
Colored noise and signal
For the case where the detector noise and stochastic signal are colored, it simplest to work in the frequency domain, since the Fourier components are independent of one another. (This assumes that the data are stationary, so that there is no preferred origin of time). Assuming multivariate Gaussian distributions as before, the variances \(S_{n_1}\), \(S_{n_2}\), and \(S_h\) generalize to power spectral densities, which are functions of frequency defined by
where \(I=1,2\) and tilde denotes Fourier transform.^{Footnote 7} The factor of 1 / 2 in (4.20) is for onesided power spectra, for which the integral of the power spectrum over positive frequencies equals the variance of the data:
This is just the continuous version of Parseval’s theorem, see e.g., (D.40). For N samples of discretelysampled data from each of two detectors \(I=1,2\) (total duration T), the likelihood function for a Gaussian stochastic signal template becomes (Allen et al. 2002; Cornish and Romano 2013):
where
Here \(k=0,1,\ldots , N/21\) labels the discrete positive frequencies. There is no square root of the determinant in the denominator of (4.22) since the volume element for the probability density involves both the real and imaginary parts of the Fourier transformed data (Appendix D.6).
We do not bother to write down the maximumlikelihood estimators of the signal and noise power spectral densities for this particular example. We will return to this problem in Sect. 6, where we discuss the optimallyfiltered crosscorrelation statistic for isotropic stochastic backgrounds. There one assumes a particular spectral shape for the gravitationalwave power spectral density, and then simply estimates its overall amplitude. That simplifies the analysis considerably.
Maximumlikelihood detection statistic
Let’s return to the example discussed in Sect. 4.3.1, which consists of N samples of data in each of two detectors, having uncorrelated white noise and a common white stochastic signal. As described in Sect. 3.4, one can calculate a frequentist detection statistic based on the maximumlikelihood ratio:
Substituting (4.12) and (4.13) for the likelihood functions and performing the maximizations yields
where
Note that the these estimators involve only autocorrelations of the data. In the absence of a signal, they are maximumlikelihood estimators of the noise variances \(S_{n_1}\) and \(S_{n_2}\). But in the presence of a signal, they are maximumlikelihood estimators of the combined variances \(S_1\equiv S_{n_1}+S_h\) and \(S_2\equiv S_{n_2}+S_h\).
Recall that for comparison with Bayesian model selection calculations, it is convenient to define the frequentist statistic \(\Lambda (d)\) as twice the logarithm of the maximumlikelihood ratio:
In the limit that the stochastic gravitationalwave signal is weak compared to the detector noise—i.e., \(S_h\ll S_{n_I}\), for \(I=1,2\)—the above expression reduces to
This is just the squared signaltonoise ratio of the crosscorrelation statistic. Note also that \(\hat{S}_h^2/\hat{S}_1\hat{S}_2\) is the normalized crosscorrelation (i.e., coherence) of the data from the two detectors. It is a measure of how well the data in detector 2 matches that in detector 1.
From (4.17), we see that \(\Lambda (d)\) is a ratio of the square of a sum of products of Gaussian random variables to the product of a sum of squares of Gaussian random variables. This is a sufficiently complicated expression that we will estimate the distribution of \(\Lambda (d)\) numerically, doing fake signal injections into many realizations of simulated noise to build up the sampling distribution. We do this explicitly in Sect. 4.6, when we compare the frequentist and Bayesian correlation methods for this example.
Bayesian correlation analysis
Compared to the frequentist crosscorrelation analysis described above, a Bayesian analysis is conceptually much simpler. One simply needs the likelihood functions \(p(dS_{n_1},S_{n_2},\mathcal{M}_0)\) and \(p(dS_{n_1},S_{n_2},S_h,\mathcal{M}_1)\) given by (4.12) and (4.13), and joint prior probability distributions for the signal and noise parameters. For our example, we will assume that the signal and noise parameters are statistically independent of one another so that the joint prior distributions factorize into a product of priors for the individual parameters. We use Jeffrey’s priors for the individual noise variances:
and a flat^{Footnote 8} prior for the signal variance:
Then, using Bayes’ theorem (3.18), we obtain the joint posterior distribution:
where \(p(d\mathcal{M}_1)\) is the evidence (or marginalized likelihood) for the signalplusnoise model \(\mathcal{M}_1\). (Similar expressions can be written down for the noiseonly model \(\mathcal{M}_0\)). The marginalized posterior distributions for the signal and noise parameters are given by marginalizing over the other parameters. For example,
for the signal variance \(S_h\).
Correlations enter the Bayesian analysis via the covariance matrix C that appears in the likelihood function \(p(dS_{n_1},S_{n_2}, S_h,\mathcal{M}_1)\). The covariance matrix for the data includes the crossdetector signal correlations, as we saw in (4.15). So although one does not explicitly construct a crosscorrelation statistic in the Bayesian framework, cross correlations do play an important role in the calculations.
Comparing frequentist and Bayesian crosscorrelation methods
To explicitly compare the frequentist and Bayesian methods for handling crosscorrelations, we simulate data for the white noise, white signal example that we have been discussing in the previous subsections. The particular realization of data that we generate has \(N=100\) samples with \(S_{n_1} =1\), \(S_{n_2}=1.5\), and \(S_h=0.3\). Plots of the simulated data in the two detectors are given in Fig. 16.
Frequentist analysis
The frequentist maximumlikelihood estimators (4.17) are very easy to calculate. For the simulated data they have values:
In addition
The weaksignal approximation to \(\Lambda (d)\), given by (4.28), is significantly larger (having a value of 14), since the injected stochastic signal for this case was relatively strong, with the injected \(S_h\) equal to \(0.3 S_{n_1}\) and \(0.2 S_{n_2}\). In addition, for this realization of data, the signal variance was overestimated while both noise variances were underestimated, leading to a much larger value than the nominal squared signaltonoise ratio of 6.
As mentioned previously, the form (4.27) of the detection statistic \(\Lambda (d)\) is sufficiently complicated that it was simplest to resort to numerical simulations to estimate its sampling distribution, \(p(\Lambda S_{n_1}, S_{n_2}, S_h, \mathcal{M}_1)\). We took 50 values for each of \(S_{n_1}\), \(S_{n_2}\), and \(S_h\) in the interval [0, 3], and then for each of the corresponding \(50^3\) points in parameter space, we generated \(10^4\) realizations of the data, yielding \(10^4\) values of \(\Lambda (d)\). By histogramming these values for each point in parameter space, we were able to estimate the probability density function (and also the cumulative distribution function) for \(\Lambda \).
Figure 17 shows the frequentist 90% confidencelevel exclusion and inclusion regions for our simulated data with \(\Lambda _\mathrm{obs}=7.6\). The 90% confidencelevel exclusion region \(\mathcal{E}_{90\%}\) lies above the red surface; it consists of points \((S_{n_1},S_{n_2},S_h)\) satisfying
The region below the red surface is the 90% confidencelevel inclusion region \(\mathcal{I}_{90\%}\). Note that construction of these regions is such that the true values of the parameters \(S_{n_1}\), \(S_{n_2}\), and \(S_h\) have a 90% frequentist probability of lying in \(\mathcal{I}_{90\%}\). This generalizes, to multiple parameters, the definition of the frequentist 90% confidencelevel upperlimit for a single parameter, which was discussed in detail in Sect. 3.2.3. Note that it is not correct to simply “cut” the surface using the maximumlikelihood point estimates \(\hat{S}_{n_1}= 0.78\) and \(\hat{S}_{n_2}=1.46\) to obtain a single value for \(S_h^{90\%, \mathrm{UL}}\). One needs to include the whole region in order to get the correct frequentist coverage.
A similar procedure can be used to estimate sampling distributions for the frequentist maximumlikelihood estimators \(\hat{S}_{n_1}\), \(\hat{S}_{n_2}\), and \(\hat{S}_h\). From these distributions, one can then calculate e.g., frequentist 95% confidencelevel exclusion and inclusion regions for the given point estimates. For example, \((S_{n_1}, S_{n_2}, S_h)\in \mathcal{I}_{95\%}\) for the observed point estimate \(\hat{S}_{h,\mathrm{obs}}\) if and only if \(\hat{S}_{h,\mathrm{obs}}\) is contained in the symmetric 95% confidencelevel interval centered on the mode of the probability distribution \(p(\hat{S}_hS_{n_1}, S_{n_2}, S_h, \mathcal{M}_1)\). These regions again generalize to multiple parameters the definition of a frequentist confidence interval for a single parameter, which was discussed in detail in Sect. 3.2.4. They will be different, in general, for the different maximumlikelihood estimators. But in order to move on to the Bayesian analysis for this example, we will leave the explicit construction of these regions to the interested reader.
Bayesian analysis
For the Bayesian analysis of this example, we limit ourselves to calculating the Bayes factor \(2 \ln \mathcal{B}_{10}(d)\) comparing the noiseonly and signalplusnoise models \(\mathcal{M}_0\) and \(\mathcal{M}_1\), as well as the posterior distributions for the three parameters \(S_h\), \(S_{n_1}\), and \(S_{n_2}\). Following the procedure described above in Sect. 4.5 we find, for this particular realization of data,
This Bayes factor corresponds to positive evidence (see Table 3) in favor of a correlated stochastic signal in the data.
Figure 18 shows the marginalized posterior \(p(S_hd,\mathcal{M}_1)\) for the stochastic signal variance given the data d and signalplusnoise model \(\mathcal{M}_1\). The peak of the posterior lies close the frequentist maximumlikelihood estimator \(\hat{S}_h=0.40\) (blue dotted vertical line), and easily contains the injected value in its 95% Bayesian credible interval (grey shaded region). Figure 19 shows similar plots for the marginalized posteriors for the noise variances \(S_{n_1}\) and \(S_{n_2}\) for both the signalplus noise model \(\mathcal{M}_1\) (blue curves) and the noiseonly model \(\mathcal{M}_0\) (green curves). For comparison, the frequentist maximumlikelihood estimators \(\hat{S}_{n_1}, \hat{S}_{n_2}=0.78, 1.46\) and 1.18, 1.86 for the two models are shown by the corresponding (blue and green) dotted vertical lines. Again, the peaks of the Bayesian posterior distributions lie close to these values. The 95% Bayesian credible intervals for \(S_{n_1}\) and \(S_{n_2}\) for the signalplusnoise model \(\mathcal{M}_1\) are also shown (grey shaded region). These intervals easily contain the injected values for these two parameters.
What to do when crosscorrelation methods aren’t available
Crosscorrelation methods can be used whenever one has two or more detectors that respond to a common gravitationalwave signal. The beauty of such methods is that even though a stochastic background is another source of “noise” in a single detector, the common signal components in multiple detectors combine coherently when the data from pairs of detectors are multiplied together and summed, as described in Sect. 4.1. But with only a single detector, searches for a stochastic background need some other way to distinguish the signal from the noise—e.g., a difference between the spectra of the noise and the gravitationalwave signal, or the modulation of an anisotropic signal due to the motion of the detector (as is expected for the confusionnoise from galactic compact white dwarf binaries for LISA). Without some way of differentiating instrumental noise from gravitationalwave “noise”, there is no hope of detecting a stochastic background.
As a simple example, suppose that we have N samples of data from each of two detectors \(I=1,2\) (which we will call channels in what follows), but let’s assume that the second channel is insensitive to the gravitationalwave signal:
Then if we make the same assumptions as before for the signal and the noise, it follows that the likelihood function for the data \(d\equiv \{d_{1i};d_{2i}\}\) is given by
where
is the covariance matrix of the data. Since the offdiagonal blocks of the covariance matrix are identically zero, it is clear that we will not be able to use the crosscorrelation methods developed in the previous sections. So we need to do something else if we are going to extract the gravitationalwave signal from the noise.
Singledetector excess power statistic
If we knew \(S_{n_1}\) a priori, then we could construct an excess power statistic from the autocorrelated data to estimate the signal variance:
(This is effectively how Penzias and Wilson (1965) discovered the CMB; they observed excess antenna noise that they could not attribute to any other known source of noise). But as mentioned at the end of Sect. 4.3.1, typically we do not know the detector noise well enough to use such a statistic, since the uncertainty in \(S_{n_1}\) is much greater than the variance of the gravitationalwave signal that we are trying to detect. This is definitely the case for groundbased detectors like LIGO, Virgo, etc. An exception to this “rule” will probably be the predicted foreground signal from galactic whitedwarf binaries in the LISA band. For frequencies below a few mHz, the gravitationalwave confusion noise from these binaries is expected to dominate the LISA instrument noise (Hils et al. 1990; Bender and Hils 1997; Hils and Bender 2000; Nelemans et al. 2001).
Null channel method
If it were possible to make an offsource measurement using detector 1, then we could estimate the noise variance \(S_{n_1}\) directly from the detector output, free of contamination from gravitational waves. Using this noise estimate, \(\hat{S}_{n_1}\), we could then define our excess power statistic as
Unfortunately, such offsource measurements are not possible, since you cannot shield a gravitationalwave detector from gravitational waves. However, in certain cases one can construct a particular combination of the data (called a null channel) for which the response to gravitational waves is strongly suppressed. The symmetrized Sagnac combination of the data for LISA (Tinto et al. 2001; Hogan and Bender 2001) is one such example.
So let us assume that channel 2 for our example is such a null channel, and let us also assume that there is some relationship between the noise in the two channels—e.g., \(S_{n_1} = aS_{n_2}\), with \(a>0\). (For colored noise, the variances would be replaced by power spectra and a would be replaced by a function of frequency—i.e., a transfer function relating the noise in the two channels). To begin with, we will also assume that a is known. Then the data from the second channel can be used as a noise calibrator for the first channel. The frequentist estimators for this scenario are:
These are the maximumlikelihood estimators of the signal and noise parameters, derived from the likelihood (4.38) with \(S_{n_1}\) replaced by \(aS_{n_2}\). In the Bayesian framework, the relation \(S_{n_1}= a S_{n_2}\) is encoded in the joint prior probability distribution
which eliminates \(S_{n_1}\) as an independent variable. The marginalized posterior distribution for the signal variance \(S_h\), assuming a flat prior \(p_h(S_h)=\mathrm{const}\), is then
In the more realistic scenario where the transfer function a is not known a priori, but is described by its own prior probability distribution \(p_a(a)\), we have
and
This integral can be done numerically given priors for \(S_{n_2}\) and a.
To help illustrate the above discussion, Fig. 20 shows plots of several different posterior distributions for \(S_h\), corresponding to different choices for the prior distribution \(p_a(a)\). For these plots, we chose a Jeffrey’s prior for \(S_{n_2}\):
and a lognormal prior for a:
The different curves correspond to different values of \(\mu \) and \(\sigma \):
where \(a_0\) denotes the nominal (true) value of a. Note that \(A=0.67a_0\) and \(1.5a_0\) correspond to priors for a that are biased away from its true value \(a=a_0\). Note also that 68% of the prior distribution is contained in the region \(a\in [A/\Sigma , A\Sigma ]\) (so \(\Sigma =1\) corresponds to a deltafunction prior—i.e., no uncertainty in a). The particular realization that we used consisted of \(N=100\) samples of data (4.37) with \(S_h=1\), \(S_{n_2}=1\), and \(S_{n_1} = a_0 S_{n_2}\) with \(a_0=1\). Note that for the biased priors for a (associated with the dashed and dotted curves in Fig. 20), an under (over) estimate in a corresponds to over (under) estimate in \(S_h\), as \(S_h\) is effectively the difference between the estimated variance in channel 1 and a times the estimated variance in channel 2. For this particular realization of the data, the mode of the “0%, unbiased” posterior for \(S_h\) is about 20% less than the injected value, \(S_h=1\). On average, they would agree.
Geometrical factors
There is geometry in the humming of the strings, there is music in the spacing of the spheres. Pythagoras
In the previous sections, we ignored many details regarding detector response and detector geometry. We basically assumed that the detectors were isotropic, responding equally well to all gravitational waves, regardless of the waves’ directions of propagation, frequency content, and polarization. We also ignored any loss in sensitivity in the correlations between data from two or more detectors, due to the separation and relative orientation of the detectors. But these details are important if we want to design optimal (or nearoptimal) data analysis algorithms to search for gravitational waves. To specify the likelihood function, for example, requires models not only for the gravitationalwave signal and instrument noise, but also for the response of the detectors to the waves that a source produces.
In this section, we fill in these details. We first discuss the response of a single detector to an incident gravitational wave. We then show how these nontrivial detector responses manifest themselves in the correlation between data from two or more detectors. The results are first derived in a general setting making no assumption, for example, about the wavelength of a gravitational wave to the characteristic size of a detector. The general results are then specialized, as appropriate, to the case of groundbased and spacebased laser interferometers, spacecraft Doppler tracking, and pulsar timing arrays. We conclude this section by discussing how the motion of a detector relative to the gravitationalwave source affects the detector response.
The approach we take in this section is similar in spirit to that of Hellings (1991), attempting to unify the treatment of detector response functions and correlation functions across different gravitationalwave detectors. Readers interested in more details about the effect of detector geometry on the correlation of data from two or more detectors should see the original papers by Hellings and Downs (1983) for pulsar timing arrays, and Flanagan (1993) and Christensen (1990, 1992) for groundbased laser interferometers.
Detector response
Gravitational waves are timevarying perturbations to the background geometry of spacetime. Since gravitational waves induce timevarying changes in the separation between two freelyfalling objects (socalled test masses), gravitationalwave detectors are designed to be as sensitive as possible to this changing separation. For example, a resonant bar detector acts like a giant tuning fork, which is set into oscillation when a gravitational wave of the natural frequency of the bar is incident upon it. These oscillations produce a stress against the equilibrium electromagnetic forces that exist within the bar. The stress (or oscillation) is measured by a strain gauge (or accelerometer), indicating the presence of a gravitational wave. The response for a bar detector is thus the fractional change in length of the bar, \(h(t) = \Delta l(t)/l\), induced by the wave. Since the length of the bar is typically much smaller than the wavelength of a gravitational wave at the bar’s resonant frequency, the response is most easily computed using the geodesic deviation equation (Misner et al. 1973) for the timevarying tidal field.
In this article, we will focus our attention on beam detectors, which use electromagnetic radiation to monitor the separation of two or more freelyfalling objects. Spacecraft Doppler tracking, pulsar timing arrays, and ground and spacebased laser interferometers (e.g., LIGOlike and LISAlike detectors) are all examples of beam detectors, which can be used to search for gravitational waves (see, e.g., Section 4.2 in Sathyaprakash and Schutz 2009).
Spacecraft Doppler tracking
For spacecraft Doppler tracking, pulses of electromagnetic radiation are sent from one test mass (e.g., a radio transmitting tower on Earth) to another (e.g., the Cassini probe), and then bounced back (or coherently transponded) from the second test mass to the first. From the arrival times of the returning pulses, one can calculate the fractional change in the frequency of the emitted pulses induced by a gravitational wave. The detector response for such a measurement is thus
where \(\Delta T(t)\) is the deviation of the roundtrip travel time of a pulse away from the value it would have had at time t in the absence of the gravitational wave. A schematic representation of \(\Delta T(t)\) for spacecraft Doppler tracking is given in Fig. 21.
Pulsar timing
Pulsar timing is even simpler in the sense that we only have oneway transmission of electromagnetic radiation (i.e., radio pulses are emitted by a pulsar and received by a radio antenna on Earth). The response for such a system is simply the timing residual
which is the difference between the measured time of arrival of a radio pulse and the expected time of arrival of the pulse (as determined from a detailed timing model for the pulsar) due to the presence of a gravitational wave. A schematic representation of \(\Delta T(t)\) for a pulsar timing measurement is given in Fig. 22.
Laser interferometers
For laser interferometers like LIGO or LISA, the detector response is the phase difference in the laser light sent down and back the two arms of the interferometer. Again, the phase difference can be calculated in terms of the change in the roundtrip travel time of the laser light from one test mass (e.g., the beam splitter) to another (e.g., one of the end test masses). If we consider an equalarm Michelson interferometer with unit vectors \(\hat{u}\) and \(\hat{v}\) pointing from the beam splitter to the end masses in each of the arms, then
where \(\Delta T(t)\equiv T_{{\hat{u}},\mathrm{rt}}(t) T_{{\hat{v}},\mathrm{rt}}(t)\) is the difference of the roundtrip travel times, and \(\nu _0\) is the frequency of the laser light. (See Fig. 23). Alternatively, one often writes the interferometer response as a strain measurement in the two arms
where \(\Delta L(t)\equiv L_{\hat{u}}(t)L_{\hat{v}}(t)\) is the difference of the proper lengths of the two arms (having unperturbed length L), and \(\Delta T(t)\) is the difference in roundtrip travel times as before. Thus, interferometer phase and strain response are simply related to one another.
Calculation of \(\Delta T(t)\) for beam detectors is most simply carried out in the transversetraceless gauge^{Footnote 9} (Misner et al. 1973; Schutz 1985; Hartle 2003) since the unperturbed separation L of the two test masses can be larger than or comparable to the wavelength \(\lambda \equiv c/f\) of an incident gravitational wave having frequency f. This is definitely the case for pulsar timing where L is of order a few kpc, and for spacecraft Doppler tracking where L is of order tens of AU. It is also the case for spacebased detectors like LISA (\(L=5\times 10^6 \mathrm{\ km}\)) for gravitational waves with frequencies around a tenth of a Hz. On the other hand, for Earthbased detectors like LIGO (\(L=4~\mathrm{km}\)), \(L\ll \lambda \) is a good approximation below a few kHz. Thus, the approach that we will take in the following subsections is to calculate the detector response in general, not making any approximation a priori regarding the relative sizes of \(\lambda =c/f\) and L. To recover the standard expressions (i.e., in the longwavelength or smallantenna limit) for Earthbased detectors like LIGO will be a simple matter of taking the limit fL / c to zero. For reference, Table 5 summarizes the characteristic properties (i.e., size, characteristic frequency, sensitivity band, etc.) of different beam detectors.
Calculation of response functions and antenna patterns
Gravitational waves are weak. Thus, the detector response is linear in the metric perturbations \(h_{ab}(t,{\vec {x}})\) describing the wave, and can be written as the convolution of the metric perturbations \(h_{ab}(t,{\vec {x}})\) with the impulse response \(R^{ab}(t,{\vec {x}})\) of the detector:
where \({\vec {x}}\) is the location of the measurement at time t. In terms of a planewave expansion (2.1) of the metric perturbations, we have
or, in the frequency domain,
where^{Footnote 10}
Further specification of the response function depends on the choice of gravitationalwave detector as well as on the basis tensors used to expand \(h_{ab}(f,\hat{n})\), as we shall see below and in the following subsections.
For example, if we work in the polarization basis, with expansion coefficients \(h_A(f,\hat{n})\), where \(A=\{+,\times \}\), then
with
If we work instead in the tensor spherical harmonic basis, with expansion coefficients \(a^P_{(lm)}(f)\), where \(P=\{G,C\}\), then
with
Note that in the polarization basis the response function \(R^A(f,\hat{n})\) is the detector response to a sinusoidal planewave with frequency f, coming from direction \(\hat{n}\), and having polarization \(A=+,\times \). Plots of \(R^A(f,\hat{n})\) for fixed frequency f are antenna beam patterns for gravitational waves with polarization A. A plot of
for fixed frequency f is the beam pattern for an unpolarized gravitational wave—i.e., a wave having statistically equivalent \(+\) and \(\times \) polarization components.
Since the previous subsection showed that the response of all beam detectors can be written rather simply in terms of the change in the lighttravel time of an electromagnetic wave propagating between two test masses, we now calculate \(\Delta T(t)\) in various scenarios and use the resulting expressions to readoff the response functions \(R^{ab}(f,\hat{n})\) for the different detectors. We also make plots of various antenna patterns.
Oneway tracking
Consider two test masses located at position vectors \(\vec {r}_1\) and \(\vec {r}_2=\vec {r}_1+L\hat{u}\), respectively, in the presence of a plane gravitational wave propagating in direction \(\hat{k} = \hat{n}\), as shown in Fig. 24. Then the change in the lighttravel time for a photon emitted at \(\vec {r}_1\) and received at \(\vec {r}_2\) at time t is given by Estabrook and Wahlquist (1975):
where the 0thorder expression for the photon trajectory can be used in \(h_{ab}\):
Since \(h_{ab}(t,{\vec {x}}) = h_{ab}(t+\hat{n}\cdot {\vec {x}}/c)\) for a plane wave, it is relatively easy to do the integral. The result is
where we factored out \(e^{i 2\pi f (t+\hat{n}\cdot \vec {r}_2/c)}\), corresponding to the time and location of the measurement, to get the last line. Note that the two terms in square brackets in (5.16) correspond to sampling the gravitationalwave phase at photon reception (location \(\vec {r}_2\) at time \(t_2\equiv t\)) and photon emission (location \(\vec {r}_1\) at time \(t_1\equiv tL/c\)), respectively. In the context of pulsar timing, these two terms are called the Earth term and pulsar term, respectively.
From Eq. (5.17), we can readoff the response function for a timing residual measurement, \(h_\mathrm{timing}(t)\equiv \Delta T(t)\). It is
where
is the timing transfer function for oneway photon propagation along \({\vec {u}}=L\hat{u}\). (Here \(\mathrm{sinc}\,x \equiv \sin x/x\)). If we choose \(\vec {r}_2\) to be the origin of coordinates, then \(\mathcal{T}_{{\vec {u}}}(f,\hat{n}\cdot \hat{u})\) contains all the frequencydependence of the timing response. For example, for normal incidence of the gravitational wave (\(\hat{n}\cdot \hat{u}=0\)), \(\mathcal{T}_{{\vec {u}}}(f,0) = (L/c)\,\mathrm{sinc}(\pi fL/c)\). Figure 25 is a plot of \(\mathcal{T}_{{\vec {u}}}(f,0)\) versus frequency on a logarithmic frequency scale.
If we choose instead to measure the fractional Doppler frequency shift of the incoming photons, then we need to differentiate the timing response with respect to t as indicated in (5.1). This simply pullsdown a factor of \(i2\pi f\) from the exponential in \(\Delta T(t)\), leading to
Thus, the frequencydependence of the Doppler frequency response is \(i2\pi f\) times the timing transfer function \(\mathcal{T}_{{\vec {u}}}(f,\hat{n}\cdot \hat{u})\). All of the above remarks are relevant for pulsar timing and oneway spacecraft Doppler tracking.
In Fig. 26 we plot the antenna beam pattern (5.13) for unpolarized gravitational waves for a oneway tracking Doppler frequency measurement (e.g., pulsar timing) with \(\hat{u}=\hat{z}\). For this calculation, we chose \(\vec {r}_2=0\) and ignored the exponential (i.e., ‘pulsar’) term in the timing transfer function, which yields
for the \(A=+,\times \) polarization modes. Setting \(\hat{u}=\hat{z}\) and taking the gravitational waves to propagate inward (toward the origin), we find
which is axially symmetric around \(\hat{u}\). The response is maximum when the photon and the gravitational wave both propagate in the same direction.
Figure 27 shows plots of the real parts of the individual polarization basis response functions (5.21), represented as color bar plots on a Mollweide projection of the sky. For this plot we chose the pulsar to be located in the direction \((\theta ,\phi ) = (50^\circ ,60^\circ )\). (The direction \(\hat{p}\) to the pulsar is given by \(\hat{p} = \hat{u}\)). The imaginary parts of both response functions are identically zero, so are not shown in the figure.
Making the same approximations as above, we can also calculate the corresponding Dopplerfrequency response functions for the gradient and curl tensor spherical harmonic components \(\{a^G_{lm}(f),a^C_{lm}(f)\}\) by performing the integration in (5.12). As shown in Gair et al. (2014), this leads to^{Footnote 11}
where \({}^{(2)}\!N_l\) is given by (2.8) and \(\hat{p} = \hat{u}\) is the direction on the sky to a pulsar. Note, somewhat surprisingly, that the curl response is identically zero. We will discuss the consequences of this result in more detail in Sect. 7.5.6, in the context of phasecoherent mapping of anisotropic gravitationalwave backgrounds.
Twoway tracking
To calculate \(\Delta T(t)\) for twoway spacecraft Doppler tracking, we need to generalize the calculation of the previous subsection to include a return trip of the photon from \(\vec {r}_2\) back to \(\vec {r}_1\). This can be done by simply summing the expressions for the oneway timing residuals:
where the subscripts on the \(\Delta T\)’s on the righthand side of the above equation indicate the direction of oneway photon propagation (e.g., 12 indicates photon propagation from test mass 1 to test mass 2), and the arguments of \(\Delta T_{12}\) and \(\Delta T_{21}\) indicate when the photon arrived at test mass 2 and test mass 1, respectively. Doing this calculation leads to the following expression for the timing residual:
which has three terms corresponding to the final reception of the photon at \(\vec {r}_1\) at time t, the reflection of the photon at \(\vec {r}_2\) at time \(tL/c\), and the emission of the photon at \(\vec {r}_1\) at time \(t2L/c\). The timing response function is given by
where
is the timing transfer function for twoway (or roundtrip) photon propagation along \({\vec {u}}\) and back. For normal incidence, the magnitude of the timing transfer function is given by \(\mathcal{T}_{{{\vec {u}}},\mathrm{rt}}(f,0)= (2L/c)\mathrm{sinc}(2\pi fL/c)\), which is identical to the expression for oneway tracking with L / c replaced by 2L / c. We also note that if we choose the origin of coordinates to be at \(\vec {r}_1\) (which we can always do for a single detector), and if the frequency f is such that \(fL/c\ll 1\), then the timing response simplifies to
We will use the terminology smallantenna limit (instead of longwavelength limit) for this type of limit, since it avoids an ambiguity that might arise if we want to compare three or more length scales. For example, if we have two detectors that are physically separated and the wavelength of a gravitational wave is large compared to the size of each detector but small compared to the separation of the detectors, we would be in the longwavelength limit with respect to detector size but in the shortwavelength limit with respect to detector separation. (This is actually the case for the current network of groundbased interferometers). The terminology smallantenna, largeseparation limit is more appropriate for this case.
Michelson interferometer
For an equalarm Michelson interferometer, the timing residual that we calculate is the difference in the roundtrip lighttravel times down and back each of the arms. (See Fig. 28). If we let \({\vec {u}}\) and \({\vec {v}}\) denote the vectors pointing from e.g., the beam splitter to the two end mirrors for LIGO, or from one spacecraft to the other two spacecraft for LISA, then^{Footnote 12}
where the last equality is valid for an equalarm interferometer. But we just calculated these singlearm roundtrip \(\Delta T\)’s in the previous section. Thus, the timing response of an equalarm Michelson is simply
where we have chosen the origin of coordinates to be at the vertex of the interferometer. The phase and strain responses of a Michelson are related to the timing response by constant multiplicative factors, cf. (5.3) and (5.4), so that
where \(\nu _0\) is the frequency of the laser. Note that in the smallantenna limit, which is valid for the LIGO detectors below a few kHz, the strain response is given by
Plots of the antenna patterns for the strain response to \(A=+,\times \) polarized gravitational waves are given in Fig. 29, for both the smallantenna limit (where we simply set \(f=0\)) and at the freespectral range of the interferometer, \(f = f_\mathrm{fsr}\equiv c/(2L)\). Similar plots of the antenna patterns for unpolarized gravitational waves are given in Fig. 30. In Fig. 31 we show colorbar plots of the antenna patterns for the strain response to unpolarized gravitational waves for the LIGO Hanford and Virgo interferometers (located in Hanford, WA and Cascina, Italy, respectively), again evaluated in the smallantenna limit.
We can also calculate the strain response of an interferometer to the gradient and curl tensor spherical harmonic components \(\{a^G_{(lm)}(f), a^C_{(lm)}(f)\}\) by performing the integration in (5.12). As shown in Appendix E of Gair et al. (2014), this leads to
for an interferometer in the smallantenna limit, where the vertex is at the origin of coordinates, and \(\hat{u}\), \(\hat{v}\) are unit vectors pointing in the direction of the interferometer arms. Similar to (5.23) for pulsar timing, the curl response is again identically zero. We will discuss the consequences of this result in more detail in Sect. 7.5.7, in the context of phasecoherent mapping of anisotropic gravitationalwave backgrounds.
Overlap functions
As mentioned in Sect. 4, a stochastic gravitationalwave background manifests itself as a nonvanishing correlation between the data taken by two or more detectors. This correlation differs, in general, from that due to instrumental noise, allowing us to distinguish between a stochastic gravitationalwave signal and other noise sources. In this section, we calculate the expected correlation due to a gravitationalwave background, allowing for nontrivial detector response functions and nontrivial detector geometry. Interested readers can find more details in Hellings and Downs (1983), Christensen (1990, 1992), Flanagan (1993), and Finn et al. (2009).
Definition
Let \(d_I\) and \(d_J\) denote the data taken by two detectors labeled by I and J. In the presence of a gravitational wave, these data will have the form
where \(h_{I,J}\) denote the response of detectors I, J to the gravitational wave, and \(n_{I,J}\) denote the contribution from instrumental noise. If the instrumental noise in the two detectors are uncorrelated with one another, it follows that the expected correlation of the data is just the expected correlation of the detector responses, \(\langle d_I d_J\rangle = \langle h_I h_J\rangle \). If we also assume that the gravitational wave is due to a stationary, Gaussian, isotropic, and unpolarized stochastic background, then
where \(S_h(f)\) is the onesided strain power spectral density of the gravitationalwave background, computed from the expectation values of the Fourier components of the metric perturbations (2.14), and
is the socalled overlap function for the two detectors I, J written in terms of the polarizationbasis response function \(R^A_{I,J}(f,\hat{n})\),^{Footnote 13} where \(A=\{+,\times \}\). In terms of the tensor spherical harmonicbasis response functions \(R^P_{I,J(lm)}(f)\), we would have
where \(P=\{G,C\}\) for the gradient and curl tensor spherical harmonic components.
Interpretation
The overlap function \(\Gamma _{IJ}(f)\) quantifies the reduction in sensitivity of the crosscorrelation to a stochastic gravitationalwave background due to the nontrivial response of the detectors and their separation and orientation relative to one another. This meaning of the overlap function is most easily seen in the frequency domain, where (5.35) becomes
This implies
where \(\tilde{C}_{h_I h_J}(f)\) is the (onesided) crossspectrum of the response in the two detectors. Thus, \(\Gamma _{IJ}(f)\) can be interpreted as the transfer function between gravitationalwave strain power \(S_h(f)\) and detector response crosspower \(\tilde{C}_{h_I h_J}(f)\).
Expression (5.36) for the overlap function involves four length scales: the lengths of the two detectors, \(L_I\) and \(L_J\), which appear in the response functions \(R^A_{I,J}(f,\hat{n})\); the separation of the detectors, \(s\equiv {\vec {x}}_I{\vec {x}}_J\), which appears in the exponential factor; and the wavelength of the gravitational waves, \(\lambda = c/f\). In general, one has to evaluate the integral in (5.36) numerically, due to the nontrivial frequency dependence of the response functions. However, as we shall see in Sect. 5.4, in certain limiting cases of the ratio of these length scales, we can do the integral analytically and obtain relatively simple expressions for the overlap function in terms of spherical Bessel or trigonometric functions. This is the case for groundbased interferometers, which operate in the smallantenna limit—i.e., \(fL/c\ll 1\) for both detectors, even though the separation can be large compared to the wavelength, \(fs/c \gtrsim 1\). It is also the case for pulsar timing arrays, which operate in the largeantenna, smallseparation limit, since \(fL/c\gg 1\) for each pulsar and \(fs/c\ll 1\) for different radio receivers on Earth. (The Earth effectively resides at the solar system barycenter relative to the wavelength of the gravitational waves relevant for pulsar timing).
Normalization
It is often convenient to define a normalized overlap function \(\gamma _{IJ}(f)\propto \Gamma _{IJ}(f)\) by requiring that \(\gamma _{IJ}(0)=1\) for two detectors that are colocated and coaligned. For the strain response of two identical equalarm Michelson interferometers, this leads to the relation
where \(\beta \) is the opening angle between the two arms (\(\pi /2\) for LIGO and \(\pi /3\) for LISA).
Autocorrelated response
To obtain the autocorrelated response of a single detector, we can simply set \(I=J\) in the previous expressions. This means that the gravitationalwave strain power \(S_h(f)\) and the detector response power \(P_{h_I}(f)\) in detector I are related by
where
Note that \(\Gamma _{II}(f)\) is just the square of the antenna pattern for the response to unpolarized gravitational waves integrated over the whole sky. A plot of the normalized transfer function \(\gamma _{II}(f)\) for the strain response of an equalarm Michelson interferometer is shown in Fig. 32. Compared to Fig. 25 for the timing transfer function \(\mathcal{T}_{{\vec {u}}}(f,0)\) for oneway photon propagation evaluated at normal incidence of the gravitational wave, we see that the relevant frequency scale for an equalarm Michelson is c / (2L) (as opposed to c / L) due to the roundtrip motion of the photons. Also, the hard nulls in Fig. 25 have been softened into dips due to averaging of the waves over the whole sky. The highfrequency ‘bumps’ for \(\gamma _{II}(f)\) are lower than those for \(\mathcal{T}_{{\vec {u}}}(f,0)\) due to the squaring of \(R^A_I(f,\hat{n})\) which enters into the definition of \(\Gamma _{II}(f)\) (and \(\gamma _{II}(f)\)). Figure 33 is an extended version of Fig. 32, with the appropriate frequency ranges for groundbased interferometers (like LIGO), spacebased interferometers (like LISA), spacecraft Doppler tracking, and pulsar timing searches indicated on the plot. See also Table 5 for more details.
Examples of overlap functions
LHOLLO overlap function
As mentioned above, Earthbased interferometers like LIGO operate in the smallantenna limit where \(fL/c\ll 1\). This implies that the associated response functions are wellapproximated by the expression in (5.32). If we denote the unit vectors along the two arms of one Earthbased interferometer by \(\hat{u}_1\) and \(\hat{v}_1\), and the corresponding unit vectors of a second Earthbased interferometer by \(\hat{u}_2\) and \(\hat{v}_2\), then the strain responses in the two interferometers are simply
where
and \({\vec {x}}_1\) and \({\vec {x}}_2\) denote the vertices of the two interferometers. The tensors \(D_1^{ab}\), \(D_2^{ab}\) defined above are called detector tensors; they are symmetric and tracefree with respect to their ab indices. In terms of the detector tensors, the overlap function becomes
where
and \(\Delta {\vec {x}}\equiv {\vec {x}}_2{\vec {x}}_1\) is the separation vector connecting the two vertices. We will also define:
Thus, in the smallantenna limit, the orientationdependence of the overlap function \(\Gamma _{12}(f)\) is encoded in the detector tensors \(D_1^{ab}\), \(D_2^{ab}\), while the separationdependence is encoded in \(\Gamma _{abcd}(\Delta {\vec {x}})\).
Note that \(\Gamma _{abcd}\) is a tensor which is symmetric under the interchanges \(a\leftrightarrow b\), \(c\leftrightarrow d\), and \(ab\leftrightarrow cd\); it is also tracefree with respect to the ab and cd index pairs. The most general expression that we construct for \(\Gamma _{abcd}(\Delta {\vec {x}})\) given \(\delta _{ab}\), \(s_a\), and its symmetry properties is:
By contracting the above expression with tensors of the form \(\delta ^{ab}\delta ^{cd}\), \((\delta ^{ac}\delta ^{bd} + \delta ^{bc}\delta ^{ad})\), \(\ldots \), \(s^a s^b s^c s^d\), we obtain a linear system of equations for \(A, B,\ldots , E\), which we can solve in terms of scalar integrals involving contractions of the products of the polarization tensors, \(e^A_{ab}(\hat{n})e^A_{cd}(\hat{n})\), with various combinations of \(\delta ^{ab}\) and \(s^a\). As shown in Flanagan (1993) and Allen and Romano (1999), these integrals can be done analytically, leading to
where \(j_0(\alpha )\), \(j_1(\alpha )\), and \(j_2(\alpha )\) are the standard spherical Bessel functions (Abramowitz and Stegun 1972). With these explicit expressions for \(A, B,\ldots , E\) in hand, all that is left to do is to contract the righthand side of (5.48) with \(D_1^{ab} D_2^{cd}\) to obtain \(\Gamma _{12}(f)\). If we only assume that the detector tensors are symmetric,^{Footnote 14} then all terms contribute (Coughlin and Harms 2014):
For symmetric, tracefree detector tensors, as is the case for groundbased interferometers, there is no contribution from the A and C terms. Thus, in the smallantenna limit, the overlap function for the strain response of two equalarm Michelson interferometers can be written as a sum of the first three spherical Bessel functions with coefficients that depend on the product of the frequency and separation of the two detectors. (The analytic expression for the overlap function can also be derived using (5.37), which involves the tensor spherical harmonic response functions. A detailed derivation using these response functions is given in Romano et al., 2015).
Figure 34 is a plot of the normalized overlap function for the strain response of the 4km LIGO interferometers in Hanford, WA and Livingston, LA. There are several things to note about the plot: (i) The overlap function is negative as \(f\rightarrow 0\). This is because the arms of the Hanford and Livingston interferometers are rotated by \(90^\circ \) with respect to one another. (ii) The magnitude of the overlap function at \(f=0\) is less than unity—i.e., \(\gamma _{HL}(0)=0.89\), even though the overlap function was normalized. This is because the planes of the Hanford and Livingston interferometers are not identical; these two detectors are separated by \(27.2^\circ \) as seen from the center of the Earth. (iii) The first zero of the overlap function occurs just above 60 Hz. This is roughly equal to \(c/(2s)=50~\mathrm{Hz}\), where \(s= 3000~\mathrm{km}\) is the separation between the two interferometers. Note that \(f=c/(2s)\) is the frequency of a gravitational wave that has a wavelength equal to twice the separation of the two sites. For lower frequencies, the two interferometers will be driven (on average) by the same positive (or negative) part of the incident gravitational wave. For slightly higher frequencies, one interferometer will be driven by the positive (or negative) part of the incident wave, while the other interferometer will be driven by the negative (or positive) part. The zeros of the overlap function correspond to the transitions between the inphase and outofphase excitations of the two interferometers.
Bigbang observer overlap function
As a second example, we consider the overlap function between two LISAlike constellations oriented in a hexagram (i.e., ‘sixpointed star’) configuration as shown in Fig. 35. This is one of the configurations being considered for the BigBang Observer (BBO), which is a proposed space mission designed to detect or put stringent limits on a cosmologicallygenerated gravitationalwave background (Phinney et al. 2004). The arm lengths of the two interferometers, with vertices \({\vec {x}}_1\) and \({\vec {x}}_2\), are taken to be \(L=5\times 10^6~\mathrm{km}\). The opening angle for the two interferometers is \(\beta = 60^\circ \). For this example, we calculate the normalized overlap function for strain response numerically, since the smallantenna limit is not valid for the highfrequency end of the sensitivity band. A plot of the normalized overlap function is given in Fig. 36.
Pulsar timing overlap function (Hellings and Downs curve)
As our final example, we consider the overlap function for timing residual measurements from an array of N pulsars, labeled by index \(I=1,2,\ldots , N\). Each pulsar defines a oneway tracking beam detector with the position of pulsar I at \(\vec {p}_{I}\) and the position of detector I (i.e., a radio receiver on Earth) by \({\vec {x}}_{I}\). For convenience, we will take the origin of coordinates to lie at the solar system barycenter. Since the diameter of the Earth (\({\sim } 10^4~\mathrm{km}\)) and its distance from the Sun (\({\sim } 10^8~\mathrm{km}\)) are both small compared to the wavelength of gravitational waves relevant for pulsar timing (\(\lambda = c/f\sim 10^{13}~\mathrm{km}\)), we can effectively set \({\vec {x}}_{I}\approx {\vec {x}}_J\approx {\vec {0}}\) in the argument of the exponential term that enters expression (5.36) for the overlap function. Thus,
where the unit vectors \(\hat{u}_I\), \(\hat{u}_J\) are defined by \({\vec {x}}_I= \vec {p}_I + L_I\hat{u}_I\), where \(L_I\) is the distance to pulsar I. But since \({\vec {x}}_I\approx {\vec {0}}\), it follows that \(\hat{u}_I\) and \(\hat{u}_J\) are just unit vectors pointing from the location of pulsars I and J toward the solar system barycenter. For distinct pulsars (\(I\ne J\)), we can ignore the exponential terms in the square brackets, since \(fL/c\gg 1\) for \(L\sim 1~\mathrm{kpc}\ ({=}3\times 10^{16}~\mathrm{km})\) implies that \(e^{i 2\pi fL_I(1+\hat{n}\cdot \hat{u}_I)/c}\) and its product with the corresponding term for pulsar J are rapidly varying functions of \(\hat{n}\) and do not contribute significantly when integrated over the whole sky (Hellings and Downs 1983; Anholm et al. 2009). (For a single pulsar (\(I=J\)), the product of the two exponential terms equals 1 and hence cannot be ignored). With these simplifications, the integral can be done analytically (Hellings and Downs 1983; Anholm et al. 2009; Jenet and Romano 2015). The result is
where
and \(\zeta _{IJ}\) is the angle between the two pulsars I and J relative to the solar system barycenter. (For Doppler frequency measurements, the overlap function is independent of frequency, \(\Gamma _{IJ}=\chi (\zeta _{IJ})/3\)). \(\chi (\zeta )\) is the Hellings and Downs function (Hellings and Downs 1983); it depends only on the angular separation of a pair of pulsars. The normalization was chosen so that for a single pulsar, \(\chi (0)=1\) (for two distinct pulsars occupying the same angular position on the sky, \(\chi (0)=0.5\)). A plot of the Hellings and Downs curve is given in Fig. 37.
A couple of remarks are in order: (i) The Hellings and Downs curve is independent of frequency; it is a function of the angle \(\zeta \) between different pulsar pairs. This contrasts with the overlap functions for the two LIGO interferometers and for BBO given in Figs. 34 and 36. These overlap functions were calculated for a fixed pair of detectors; they are functions instead of the frequency of the gravitational wave. (ii) The value of the Hellings and Downs function \(\chi (\zeta _{IJ})\) for a pair of pulsars I, J can be written as a Legendre series in the cosine of the angle between the two pulsars. This follows immediately if one uses (5.37) for the overlap function and (5.23) for the pulsar timing response functions in the tensor spherical harmonic basis. As shown in Gair et al. (2014):
where \(\hat{p}_I\) and \(\hat{p}_J\) are unit vectors that point in the directions to the two pulsars. A Legendre series expansion out to \(l_\mathrm{max}=4\) (i.e., only three terms) gives very good agreement with the exact expression for the Hellings and Downs function, except for very small angular separations. This is illustrated in Fig. 38.
Moving detectors
So far, we have ignored any timedependence in the detector response introduced by the motion of the detectors relative to the gravitationalwave source. In general, this relative motion produces a modulation in both the amplitude and the phase of the response of a detector to a monochromatic, planefronted gravitational wave (Cutler 1998). For Earthbased interferometers like LIGO, the modulation is due to both the Earth’s daily rotation and yearly orbital motion around the Sun. For spacebased interferometers like LISA, the modulation is due to the motion of the individual spacecraft as they orbit the Sun with a period of 1 year. For example, for the original LISA design, three spacecraft fly in an equilateraltriangle configuration around the Sun. The centerofmass (or guiding center) of the configuration moves in a circular orbit of radius 1 AU, at an angle of \(20^\circ \) behind Earth, while the configuration ‘cartwheels’ in retrograde motion about the guiding center, also with a period of 1 year (see Fig. 39).
Monochromatic plane waves
The phase modulation of a monochromatic plane wave will have contributions from both the timevarying orientation of the detector as well as the detector’s translational motion relative the source. The timevarying orientation leads to changes in the response of the detector to the \(+\) and \(\times \) polarization components of the wave, \(R^+ h_+\) and \(R^\times h_\times \). The translational motion leads to a Doppler shift in the observed frequency of the wave, which is proportional to v / c times the nominal frequency, where v is velocity of the detector relative to the source:
For example, for a monochromatic source with \(f=100~\mathrm{Hz}\) observed by groundbased detectors like LIGO, the Earth’s daily rotational motion (\(v\approx 500~\mathrm{m/s}\)) produces a Doppler shift of order \({\sim } 10^{4}~\mathrm{Hz}\), while the Earth’s yearly orbital motion (\(v\approx 3\times 10^4~\mathrm{m/s}\)), produces a shift of order \({\sim } 10^{2}~\mathrm{Hz}\). A matchedfilter search for a sinusoid must take this latter modulation into account, as the frequency shift is larger than the width of a frequency bin for a typical search for such a signal.
Stochastic backgrounds
For stochastic gravitationalwave backgrounds, things are slightly more complicated as the signal is an incoherent sum of sinusoidal plane waves having different amplitudes, frequencies, and phases, and coming from different directions on the sky (2.1). But since the signal is broadband, the Doppler shift associated with the phase modulation of the individual component plane waves is not important, as the gravitationalwave signal power is (at worst) shuffled into nearby bins.^{Footnote 15} On the other hand, the amplitude modulation of the signal, due to the timevarying orientation of a detector, can be significant if the background is anisotropic—i.e., stronger coming from certain directions on the sky than from others. (We will discuss searches for anisotropic backgrounds in detail in Sect. 7). As the lobes of the antenna pattern sweep through the “hot” and “cold” spots of the anisotropic background, the amplitude of the signal is modulated in time.
Figure 40 shows the expected timedomain output of a particular Michelson combination, X(t), of the LISA data over a 5year period. The combined signal (red) consists of both detector noise (black) and the confusionlimited gravitationalwave signal from the galactic population of compact whitedwarf binaries. At frequencies \({\sim } 10^{4}  10^{3}~\mathrm{Hz}\), which corresponds to the lower end of LISA’s sensitivity band, the contribution from these binaries dominates the detector noise. The modulation of the detector output is clearly visible in the figure. The peaks in amplitude are more than 50% larger than the minimima; they repeat on a 6 month time scale, as expected from LISA’s yearly orbital motion around the Sun (Fig. 39).
Figure 41 is a single frame of an animation showing the time evolution of the LISA antenna pattern, represented as a colorbar plot on a Mollweide projection of the sky in ecliptic coordinates. The peaks in the detector output that we saw earlier in Fig. 40 correspond to those times when the maxima of the antenna pattern point in the general direction of the galactic center, \((\mathrm{lon},\mathrm{lat})=(93.3^\circ , 5.6^\circ )\) in ecliptic coordinates.^{Footnote 16} The motion of the LISA constellation was taken from Cutler (1998), and the antenna pattern was calculated for the XMichelson combination of the LISA data, assuming the smallantenna approximation for the interferometer response functions. The full animation corresponds to LISA’s orbital period of 1 year. Go to http://dx.doi.org/10.1007/s4111401700041 to view the animation.
Rotational and orbital motion of Earthbased detectors
As mentioned above, given the broadband nature of a stochastic signal, the Doppler shift associated with the motion of a detector does not play an important role for stochastic background searches. This means that we can effectively ignore the velocity of a detector, and treat its motion as quasistatic. So, for example, the motion of a single Earthbased detector like LIGO can be thought of as synthesizing a set of static virtual detectors located along an approximately circular ring 1 AU from the solar system barycenter (Romano et al. 2015). Each virtual detector in this set observes the gravitationalwave background from a different spatial location and with a different orientation.
As described in Romano et al. (2015), the relevant timescale for a set of virtual detectors is the time over which measurements made by the different virtual detectors are correlated with one another. Basically, we want two neighboring virtual detectors to be spaced far enough apart that they provide independent information about the background. For a gravitational wave of frequency f, the minimal separation corresponds to \(\Delta {\vec {x}}\approx \lambda /2\), where \(\lambda = c/f\) is the wavelength of the gravitational wave. For smaller separations, the two detectors will be driven in coincidence (on average), as discussed in item (iii) at the very end of Sect. 5.4.1. Writing \(\Delta {\vec {x}}= v\Delta t\) and solving for \(\Delta t\) yields
where \(t_\mathrm{corr}\) is the correlation timescale. For \(\Delta t \lesssim t_\mathrm{corr}\), the measurements taken by the two virtual detectors will be correlated with one another; for \(\Delta t\gtrsim t_\mathrm{corr}\) the measurements will be uncorrelated with one another.
As a concrete example, let us consider a gravitational wave having frequency \(f=100~\mathrm{Hz}\), and calculate the correlation time scale for the Earth’s rotational and orbital motion, treated independently. Since \(v\approx 500~\mathrm{m/s}\) for daily rotation and \(v\approx 3\times 10^4~\mathrm{m/s}\) for orbital motion, we get
Thus, the orbital motion of the Earth around the Sun will more rapidly synthesize a large network of independent detectors from the motion of a single detector, compared to just rotational motion.
We can confirm these approximate results by plotting the overlap function at \(f=100~\mathrm{Hz}\) for two virtual interferometers synthesized by the Earth’s rotational and orbital motion as function of time. This is done in Fig. 42, assuming an isotropic and unpolarized stochastic background, and using the smallantenna approximation to calculate the detector response functions. The lefthand plot is for a set of virtual interferometers synthesized by the daily rotation of a detector located on the Earth’s equator, with no orbital motion. The center of the Earth is fixed at the solar system barycenter, and the virtual interferometers have one arm pointing North and the other pointing East. One sees from the plot that the virtual interferometers decorrelate on a timescale of roughly an hour, consistent with (5.57), and recorrelate after 24 h when the original detector returns to its starting position. The righthand plot is for a set of virtual interferometers at \(1~\mathrm{AU}\) from the solar system barycenter, associated with Earth’s yearly orbital motion. There is no rotational motion for this case, as the interferometers are located at the center of the Earth in its orbit around the Sun, with the orientation of the interferometer arms unchanged by the orbital motion. Here we see that the virtual interferometers decorrelate on a timescale of roughly 1 min, again consistent with (5.57). They will recorrelate only after 1 year (not shown on the plot). Since the orbital velocity of the Earth is much larger than the velocity of a detector on the surface of the Earth due to the Earth’s daily rotational motion, the virtual interferometers associated with orbital motion build up a larger separation and decorrelate on a much shorter time scale.
We will return to this idea of using the motion of a detector to synthesize a set of static virtual detectors when we discuss a phasecoherent approach for mapping anisotropic gravitationalwave backgrounds in Sect. 7.5.
Optimal filtering
Filters are for cigarettes and coffee. Cassandra Clare
Optimal filtering, in its most simple form, is a method of combining data so as to extremize some quantity of interest. The optimality criterion depends on the particular application, but for signal processing, one typically wants to: (i) maximize the detection probability for a fixed rate of false alarms, (ii) maximize the signaltonoise ratio of some test statistic, or (iii) find the minimal variance, unbiased estimator of some quantity. Finding such optimal combinations plays a key role in both Bayesian and frequentist approaches to statistical inference (Sect. 3), and it is an important tool for every data analyst. For a Bayesian, the optimal combinations are often implicitly contained in the likelihood function, while for a frequentist, optimal filtering is usually more explicit, as there is much more freedom in the construction of a statistic.
In this section, we give several simple examples of optimal (or matched) filtering for deterministic signals, and we then show how the standard optimallyfiltered crosscorrelation statistic (Allen 1997; Allen and Romano 1999) for an Gaussianstationary, unpolarized, isotropic gravitationalwave background can be derived as a matchedfilter statistic for the expected crosscorrelation. This derivation of the optimallyfiltered crosscorrelation statistic differs from the standard derivation given, e.g., in Allen (1997), but it illustrates a connection between searches for deterministic and stochastic signals, which is one of the goals of this review article.
Optimal combination of independent measurements
As a simple explicit example, suppose we have N independent measurements
where a is some astrophysical parameter that we want to estimate and \(n_i\) are (independent) noise terms. Assuming the noise has zero mean and known variance \(\sigma _i^2\) (which can be different from measurement to measurement), it follows that
The goal is to find a linear combination of the data
that is optimal in the sense of being an unbiased, minimal variance estimator of a. Unbiased (i.e., \(\langle \hat{a}\rangle =a\)) implies
while minimum variance implies
Since (6.4) is a constraint that must hold when we minimize the variance, we can use Lagrange’s method of undetermined multipliers (Boas 2006) and minimize instead
with respect to both \(\lambda _i\) and \(\Lambda \). The final result is:
so that
Thus, the linear combination is a weighted average that gives less weight to the noiser measurements (i.e., those with large variance \(\sigma _i^2\)). The variance of the optimal combination is
If the individual variances happen to be equal (i.e., \(\sigma _i^2\equiv \sigma ^2\)), then the above expressions reduce to \(\hat{a} = N^{1}\sum _i d_i\) and \(\sigma _{\hat{a}}^2 = \sigma ^2/N\), which are the standard formulas for the sample mean and the reduction in the variance for N independent and identicallydistributed measurements as we saw in Sect. 3.5.
The above results can also be derived by maximizing the likelihood function
with respect to the signal parameter a, assuming that the noise terms \(n_i\) are Gaussiandistributed and independent of one another. In fact, similar to what we showed in Sect. 3.5, one can rewrite the argument of the exponential so that
where \(\hat{a}\) and \(\sigma _{\hat{a}}^2\) are given by (6.8) and (6.9), respectively. From this expression, it immediately follows that \(\hat{a}\) maximizes the likelihood, and also the posterior distribution of a, if the prior for a is flat.
Correlated measurements
Suppose the N measurements \(d_i\) are correlated, so that the covariance matrix C has nonzero elements
when \(i\ne j\). Again, we want to find a linear combination (6.3) that is unbiased and has minimum variance
By following the same Lagrange multiplier procedure described in the previous subsection, one can show that the optimal estimator is
Thus, the weighting factors \(1/\sigma _i^2\) of the previous subsection are replaced by \(\sum _j (C^{1})_{ij}\). Note that for uncorrelated measurements, \(C_{ij}=\delta _{ij}\sigma _i^2\), so the above expression for \(\hat{a}\) reduces to that found previously in (6.8).
Note that although (6.14) shows how to optimally combine data that are correlated with one another, it turns out that for most practical purposes one can get by using expressions like (6.8) and (6.18) below, which are valid for uncorrelated data. This is because the values of the Fourier transform of a stationary random process are uncorrelated for different frequency bins. Basically, the Fourier transform is a rotation in data space to a basis in which the covariance matrix is diagonal; this is called a Karhunen–Loeve transformation. (See also Appendix D.6). This is one of the reasons why much of signal processing is done in the frequency domain.
Matched filter
Suppose that the astrophysical signal is not constant but also has a ‘shape’ \(h_i\) so that
We will assume that the \(h_i\) are known, so that the only unknown signal parameter is a. We will also assume that the different measurements are independent, as will be the case for a stationary random process in the frequency domain. Since \(\langle d_i\rangle = a h_i\) is not a constant, the analysis of the previous subsection does not immediately apply. However, if we simply rescale \(d_i\) by \(h_i\), we obtain a new set of measurements
for which
so that the previous analysis is now valid. Thus,
is the optimal estimator of a.
The above expression for \(\hat{a}\) is often called a matched filter (Wainstein and Zubakov 1971) since the data \(d_i\) are projected onto the expected signal shape \(h_i\) (as well as weighted by the inverse of the noise variance \(\sigma _i^2\)). The particular combination
multiplying \(d_i\) is the optimal filter for this analysis.^{Footnote 17} When there are many possible candidate signal shapes, one constructs a template bank—i.e., a collection of possible shapes against which the data compared. By normalizing each of the templates so that \(\sum _i (h_i^2/\sigma _i^2)=1\), the signaltonoise ratio of the matched filter
or its square, can be used as a frequentist detection statistic. That is, the maximum value of \(\hat{\rho }(h)\) over the space of templates \(\{h_i\}\) is compared against some threshold \(\rho _*\) (chosen so that the false alarm probability is below some acceptable value). If the maximum signaltonoise ratio exceeds the threshold, then one claims detection of the signal with a certain level of confidence. The shape of the detected signal is that which corresponds to the maximum matchedfilter signaltonoise ratio.
Optimal filtering for a stochastic background
As noted by Fricke (2006), the above results can be used to derive the optimal crosscorrelation statistic for the stochastic background search. (A more standard derivation can be found, e.g., in Allen 1997). To see this, consider a crosscorrelation search for a Gaussianstationary, unpolarized, isotropic gravitationalwave background using two detectors having uncorrelated noise. Let T be the total observation time of the measurement. In the frequency domain, the measurements are given by the values of the complexvalued crosscorrelation
where \(\tilde{d}_I(f)\), \(I=1,2\) are the Fourier transforms of the timeseries output of the two detectors:
The x(f) for different frequencies correspond to the measurements \(d_i\) of the previous subsections. Since we are assuming uncorrelated detector noise,
where \(S_h(f)\) is the power spectral density of the stochastic background signal, and \(\Gamma _{12}(f)\) is the overlap function for the two detectors.^{Footnote 18} In the weaksignal limit, the covariance matrix is dominated by the diagonal terms:
where \(P_{n_I}(f)\) are the 1sided power spectral densities of the noise in the two detectors:
Thus, in this approximation
Now, suppose we are searching for a stochastic background with a powerlaw spectrum
whose amplitude \(\Omega _\beta \) we would like to estimate. Then, according to (2.18),
where
Using the above form of \(S_h(f)\) and (6.23), we see that
is the expected signal ‘shape’ \(h_i\) in the notation of the previous subsection. Given (6.26) and (6.30), it is now a simple matter to show that
where
The variance and expected signaltonoise ratio of the estimator \(\hat{\Omega }_\beta \) are:
and
The combination
multiplying \(\tilde{d}_1(f)\tilde{d}_2^*(f)\) in (6.31) is the standard optimal filter (see, e.g., Allen 1997; Allen and Romano 1999), which was derived in those references for a flat spectrum, \(\beta =0\). The optimallyfiltered crosscorrelation statistic, denoted S in Allen (1997) and Allen and Romano (1999), is given by \(S=\hat{\Omega }_0 T\).
Optimal estimators for individual frequency bins
As shown in Aasi et al. (2015), we can also construct estimators of the amplitude \(\Omega _\beta \) of a powerlaw spectrum using crosscorrelation data for individual frequency bins, of width \(\Delta f\), centered at each (positive) frequency f:
Note that these estimators are just the measured values of the crossspectrum divided by the expected spectral shape of the crosscorrelation due to a gravitationalwave background with spectral index \(\beta \). In the above expression, T is the duration of the data segments used in calculating the Fourier transforms \(\tilde{d}_1(f)\), \(\tilde{d}_2(f)\); and \(\Gamma _{12}(f)\) is the overlap function for the two detectors.
In the absence of correlated noise, the above estimators are optimal in the sense that they are unbiased estimators of \(\Omega _\beta \) and have minimal variance for a single bin:
where we assumed the weaksignal limit to obtain the approximate equality for the variance. For a frequency band consisting of many bins of width \(\Delta f\), we can optimally combine the individual estimators \(\hat{\Omega }_\beta (f)\) using the standard \(1/\sigma ^2\)weighting discussed earlier:
The expressions for \(\hat{\Omega }_\beta \) and \(\sigma ^2_{\hat{\Omega }_\beta }\) obtained in this way reproduce the standard optimal filter expressions (6.31) and (6.33) in the limit where \(\Delta f\rightarrow df\) and the sums are replaced by integrals.
More general parameter estimation
The analyses in the previous two subsections take as given the spectral shape of an isotropic stochastic background, and then construct estimators of its overall amplitude. But it is also possible to construct estimators of both the amplitude and spectral index of the background. One simply treats these as free parameters in the signal model e.g., when constructing the likelihood function. Interested readers should see Mandic et al. (2012) for details.
Anisotropic backgrounds
Sameness is the mother of disgust, variety the cure. Francesco Petrarch
An anisotropic background of gravitational radiation has preferred directions on the sky—the associated signal is stronger coming from certain directions (“hot” spots) than from others (“cold” spots). The anisotropy is produced primarily by sources that follow the local distribution of matter in the universe (e.g., compact whitedwarf binaries in our galaxy), as opposed to sources at cosmological distances (e.g., cosmic strings or quantum fluctuations in the gravitational field amplified by inflation Allen, 1997; Maggiore, 2000), which would produce an isotropic background. This means that the measured distribution of gravitationalwave power on the sky can be used to discriminate between cosmologicallygenerated backgrounds, produced in the very early Universe, and astrophysicallygenerated backgrounds, produced by more recent populations of astrophysical sources. In addition, an anisotropic distribution of power may allow us to detect the gravitationalwave signal in the first place; as the lobes of the antenna pattern of a detector sweep across the “hot” and “cold” spots of the anisotropic distribution, the amplitude of the signal is modulated in time, while the detector noise remains unaffected (Adams and Cornish 2010).
In this section, we describe several different approaches for searching for anisotropic backgrounds of gravitational waves: The first approach (described in Sect. 7.2) looks for modulations in the correlated output of a pair of detectors, at harmonics of the rotational or orbital frequency of the detectors (e.g., daily rotational motion for groundbased detectors like LIGO, Virgo, etc., or yearly orbital motion for spacebased detectors like LISA). This approach assumes a known distribution of gravitationalwave power \(\mathcal{P}(\hat{n})\), and filters the data so as to maximize the signaltonoise ratio of the harmonics of the correlated signal. The second approach (Sect. 7.3) constructs maximumlikelihood estimates of the gravitationalwave power on the sky based on crosscorrelated data from a network of detectors. This approach produces sky maps of \(\mathcal{P}(\hat{n})\), analogous to sky maps of temperature anisotropy in the cosmic microwave background radiation. The third approach (Sect. 7.4) constructs frequentist detection statistics for either an unknown or an assumed distribution of gravitationalwave power on the sky. The fourth and final approach we describe (Sect. 7.5) attempts to measure both the amplitude and phase of the gravitationalwave background at each point on the sky, making minimal assumptions about the statistical properties of the signal. This latter approach produces sky maps of the real and imaginary parts of the random fields \(h_+(f,\hat{n})\) and \(h_\times (f,\hat{n})\), from which the power in the background \(\mathcal{P}(\hat{n}) = h_+^2 + h_\times ^2\) is just one of many quantities that can be estimated from the measured data.
Numerous papers have been written over the last \({\approx }20\) years on the problem of detecting anisotropic stochastic backgrounds, starting with the seminal paper by Allen and Ottewill (1997), which laid the foundation for much of the work that followed. Readers interested in more details should see Allen and Ottewill (1997) regarding modulations of the crosscorrelation statistic at harmonics of the Earth’s rotational frequency; Ballmer (2006a, b), Mitra et al. (2008), Thrane et al. (2009), Mingarelli et al. (2013) and Taylor and Gair (2013) for maximumlikelihood estimates of gravitationalwave power; Thrane et al. (2009) and Talukder et al. (2011) for maximumlikelihood ratio detection statistics; and Gair et al. (2014), Cornish and van Haasteren (2014) and Romano et al. (2015) regarding phasecoherent mapping. For results of actual analyses of initial LIGO data and pulsar timing data for anisotropic backgrounds, see Abadie et al. (2011) and Taylor et al. (2015) and Sect. 10.2.5.
Note that we will not discuss in any detail methods to detect anisotropic backgrounds using spacebased interferometers like LISA or the BigBang Observer (BBO). As mentioned in Sect. 5.5.2, the confusion noise from the galactic population of compact white dwarf binaries is a guaranteed source of anisotropy for such detectors. At low frequencies, measurements made using a single LISA will be sensitive to only the \(l=0,2,4\) components of the background, while crosscorrelating data from two independent LISAtype detectors (as in BBO) will allow for extraction of the full range of multipole moments. The proposed data analysis methods are similar to those that we will discuss in Sects. 7.2 and 7.3, but using the synthesized A, E, and T data channels for a single LISA (see Sect. 9.7). Readers should see Giampieri and Polnarev (1997), Cornish (2001), Ungarelli and Vecchio (2001), Seto (2004), Seto and Cooray (2004), Kudoh and Taruya (2005), Edlund et al. (2005) and Taruya and Kudoh (2005) for details.
Preliminaries
Quadratic expectation values
For simplicity, we will restrict our attention to Gaussianstationary, unpolarized, anisotropic backgrounds with quadratic expectation values given by (2.16):
where
We will also assume that \(\mathcal{P}(f,\hat{n})\) factorizes
so that the angular distribution of power on the sky is independent of frequency. We will chose our normalization so that \(\bar{H}(f_\mathrm{ref})=1\), where \(f_\mathrm{ref}\) is a reference frequency, typically taken to equal 100 Hz for groundbased detectors. We will also assume that the spectral shape \(\bar{H}(f)\) is known, so that we only need to recover \(\mathcal{P}(\hat{n})\). If we expand the power \(\mathcal{P}(\hat{n})\) in terms of spherical harmonics,
then this normalization choice is equivalent to \(\mathcal{P}_{00} = S_h(f_\mathrm{ref})/\sqrt{4\pi }\), and has units of \((\mathrm{strain})^2 \, \mathrm{Hz}^{1}\, \mathrm{sr}^{1}\), where \(\mathrm{sr} \equiv \mathrm{rad}^2\) is one steradian. Thus, \(\mathcal{P}_{00}\) is a measure of the isotropic component of the background, and sets the overall normalization of the strain power spectral density \(S_h(f)\).
Shortterm Fourier transforms
Since the response of a detector changes as its antenna pattern sweeps across the “hot” and “cold” spots of an anisotropic distribution, we will need to split the data taken by the detectors into chunks of duration \(\tau \), where \(\tau \) is much greater than the lighttravel time between any pair of detectors, but small enough that the detector response functions do not change appreciably over that interval. (For Earthbased interferometers like LIGO, \(\tau \sim 100~\mathrm{s}\) to 1000 s is appropriate). Each chunk of data \([t\tau /2,t+\tau /2]\) will then be Fourier transformed over the duration \(\tau \), yielding
This operation is often called a shortterm Fourier transform. Note that, in this notation, t labels a particular time chunk, and is not a variable that is subsequently Fourier transformed.
Crosscorrelations
For many of the approaches that map the distribution of gravitationalwave power, it is convenient to work with crosscorrelated data from two detectors, evaluated at the same time chunk t and frequency f:
The factor of 2 is a convention consistent with the choice of onesided power spectra. Assuming uncorrelated detector noise and using expectation values given in (7.1), we find
where
Note that up to a factor of \(1/(4\pi )\), the function \(\gamma _{IJ}(t;, f,\hat{n})\) is just the integrand of the isotropic overlap function \(\Gamma _{IJ}(f)\) given by (5.36). In what follows, we will drop the detector labels IJ from both \(\hat{C}_{IJ}(t;f)\) and \(\gamma _{IJ}(t;f,\hat{n})\) when there is no chance for confusion.
Figure 43 shows maps of the real and imaginary parts of \(\gamma (t; f, \hat{n})\) (appropriately normalized) for the strain response of the 4km LIGO Hanford and LIGO Livingston interferometers evaluated at \(f=0~\mathrm{Hz}\) (top two plots) and \(f=200~\mathrm{Hz}\) (bottom two plots). (In the Earthfixed frame, the detectors don’t move so there is no time dependence to worry about). Note the presence of oscillations or ‘lobes’ for the \(f=200~\mathrm{Hz}\) plots, which come from the exponential factor \(e^{i2\pi f\hat{n}\cdot \Delta {\vec {x}}/c}\) of the product of the two response functions (5.43). For \(f=0\), this factor is unity.
Figure 44 is a similar plot, showing Mollweide projections of \(\gamma (t; f, \hat{n})\) for the Earthtermonly Doppler frequency response (5.21) of pairs of pulsars separated on the sky by \(\zeta =0^\circ \), \(45^\circ \), \(90^\circ \), \(135^\circ \), \(180^\circ \). (There is no time dependence nor frequency dependence for these functions). The bottom panel is a plot of the Hellings and Downs curve as a function of the angular separation between a pair of Earthpulsar baselines. By integrating the top plots over the whole sky (appropriately normalized), one obtains the values of the Hellings and Downs curve for those angular separations.
Spherical harmonic components of \(\gamma (t; f, \hat{n})\)
As first noted in Allen and Ottewill (1997), the functions \(\gamma (t; f,\hat{n})\) defined above (7.8) play a very important role in searches for anisotropic backgrounds. For a fixed pair of detectors at a fixed time t and for fixed frequency f, these functions are scalar fields on the unit 2sphere and hence can be expanded in terms of the ordinary spherical harmonics \(Y_{lm}(\hat{n})\):
or, equivalently,
Note that this definition differs from (7.4) for \(\mathcal{P}_{lm}\) by a complex conjugation, but agrees with the convention used in Allen and Ottewill (1997). In terms of the spherical harmonic components, it follows that
as a consequence of the orthogonality of the \(Y_{lm}(\hat{n})\). This expression enters (7.7) for the expected crosscorrelation of the output in two detectors. As explained in Allen and Ottewill (1997) and Thrane et al. (2009), the time dependence of \(\gamma _{lm}(t;f)\) is particularly simple:
where \(T_\mathrm{mod}\) is the relevant modulation period associated with the motion of the detectors. For example, for groundbased detectors like LIGO and Virgo, \(T_\mathrm{mod}=1\) sidereal day, since the displacement vector \(\Delta {\vec {x}}(t) \equiv {\vec {x}}_2(t){\vec {x}}_1(t)\) connecting the vertices of the two interferometers (and which enters the expression for the overlap function) traces out a cone on the sky with a period of one sidereal day. If there is no time dependence, as is the case for pulsar timing, \(T_\mathrm{mod}\) is infinite.
Example: Earthbased interferometers
As was also shown in Allen and Ottewill (1997), one can derive analytic expressions for \(\gamma _{lm}(t;f)\) for a pair of Earthbased interferometers in the shortantenna limit. If we set \(t=0\), then \(\gamma _{lm}(0;f)\) can be written as a linear combination^{Footnote 19} involving spherical Bessel functions, \(j_n(x)/x^n\) (for l even) and \(j_n(x)/x^{n1}\) (for l odd), where x depends on the relative separation of the two detectors, \(x \equiv 2\pi f\Delta {\vec {x}}/c\). The coefficients of the expansions are complex numbers that depend on the relative orientation of the detectors. Explicit expression for the first few spherical harmonic components for the LIGO Hanford–LIGO Livingston pair are given below:
Note that the above numerical coefficients do not agree with those in Allen and Ottewill (1997) due to an overall normalization factor of \(4\pi /5\) and phase \(e^{im\phi }\), where \(\phi = 38.52^\circ \) is the angle between the separation vector between the vertices of the LIGOHanford and LIGOLivingston interferometers and the Greenwich meridian (Thrane et al. 2009). Plots of the real and imaginary parts of \(\gamma _{lm}(0;f)\) for \(l=0\), 1, 2, 3, 4 and \(m\ge 0\) for the LIGO HanfordLIGO Livingston detector pair are given in Fig. 45. For \(m<0\), one can use the relation
which follows from the properties of the spherical harmonics \(Y_{lm}(\hat{n})\) (see Appendix E). Note that up to an overall normalization factor of \(5/\sqrt{4\pi }\), the real part of \(\gamma _{00}(0;f)\) is the HanfordLivingston overlap function for an unpolarized, isotropic stochastic background, shown in Fig. 34.
Example: Pulsar timing arrays
In Fig. 46, we show plots of the spherical harmonic components of \(\gamma (t;f, \hat{n})\) calculated using the Earthtermonly Dopplerfrequency response functions (5.21) for pulsar timing. Since there is no frequency or timedependence for these response functions, the spherical harmonic components of \(\gamma (\hat{n})\) depend only of the angular separation \(\zeta \) between the two pulsars that define the detector pair. As shown in Mingarelli et al. (2013) and Gair et al. (2014), these functions can be calculated analytically for all values of l and m. A detailed derivation with all the relevant formulae can be found in Appendix E of Gair et al. (2014); there the calculation is done in a ‘computational’ frame, where one of the pulsars is located along the zaxis and the other is in the xzplane, making an angle \(\zeta \) with respect to the first. In this computational frame, all of the components \(\gamma _{lm}(\zeta )\) are real. Note that up to an overall normalization factor^{Footnote 20} of \(3/\sqrt{4\pi }\), the function \(\gamma _{00}(\zeta )\) is just the Hellings and Downs function for an unpolarized, isotropic stochastic background, shown in Fig. 37.
Modulations in the correlated output of two detectors
For groundbased detectors like LIGO and Virgo, an anisotropic gravitationalwave background will modulate the correlated output of a pair of detectors at harmonics of the Earth’s rotational frequency. It turns out that for an unpolarized, anisotropic background, the contribution to the mth harmonic of the correlation has a frequency dependence proportional to
where \(\mathcal{P}_{lm}\) are the spherical harmonic components of the gravitationalwave power on the sky \(\mathcal{P}(\hat{n})\). (We are assuming here that the spherical harmonic decomposition of \(\mathcal{P}(\hat{n})\) is with respect to a coordinate system whose zaxis points along the Earth’s rotational axis). In this section, we derive the above result following the presentation in Allen and Ottewill (1997) and construct an optimal filter for the crosscorrelation that maximizes the signaltonoise ratio for the mth harmonic. This was the first concrete approach that was proposed for detecting an anisotropic stochastic background.
Timedependent crosscorrelation
We start by writing down an expression (in the frequency domain) for the correlated output of two groundbased detectors (e.g., LIGO Hanford and LIGO Livingston):
where \(\tilde{d}_{1,2}(t;f)\) are (shortterm) Fourier transforms (7.5) centered around t, and where we have included a filter function \(\tilde{Q}(t;f)\), whose specific form we will specify later. Since the crosscorrelation is periodic with a period \(T_\mathrm{mod} = 1\) sidereal day (due to the motion of the detectors attached to the surface of the Earth), we can expand \(\hat{C}(t)\) as a Fourier series:
Here T is the total observation time, e.g., 1 sidereal year, which we will assume for simplicity is an integer multiple of \(T_\mathrm{mod}\).
Assuming as usual that the detector noise is uncorrelated across detectors, and using the expectation values (7.1) for an unpolarized, anisotropic background, we find
where \(\gamma _{lm}(t;f)\) are the spherical harmonic components of \(\gamma _{12}(t;f,\hat{n})\). (We have dropped the 12 indices to simplify the notation). Similarly, if we assume that the gravitationalwave signal is weak compared to the detector noise, and that the duration \(\tau \) is also much larger than the correlation time of the detectors, then
where \(P_{n_1}(t;f)\) is the onesided power spectral density for the noise in detector \(I=1,2\) centered around t. These two results can now be cast in terms of the Fourier components \(\hat{C}_m\) using (7.17). Since (7.12) implies
we immediately obtain
where we used
Similarly,
for the covariance of the estimators.
Calculation of the optimal filter
To determine the optimal form of the filter \(\tilde{Q}(t;f)\) for the mth harmonic \(\hat{C}_m\), we maximize the (squared) signaltonoise:
The above expression can be written in a more suggestive form if we introduce an inner product on the space of complexvalued functions (Allen 1997):
In terms of this inner product,
But now the maximization problem is trivial, as it has been cast as a simple problem in vector algebra—namely to find the vector \(\tilde{Q}\) that maximizes the ratio \((\tilde{Q}, A)^2/(\tilde{Q},\tilde{Q})\) for a fixed vector A. But since this ratio is proportional to the squared cosine of the angle between \(\tilde{Q}\) and A, it is maximized by choosing \(\tilde{Q}\) proportional to A. Thus,
is the form of the filter function that maximizes the SNR for the mth harmonic.
Note that this expression reduces to the standard form of the optimal filter (6.35) for an isotropic background, \(\mathcal{P}_{lm} = \delta _{l0}\delta _{m0}\mathcal{P}_{00}\). Note also that the optimal filter assumes knowledge of both the spectral shape \(\bar{H}(f)\) and the angular distribution of gravitationalwave power on the sky, \(\mathcal{P}_{lm}\). So if one has some model for the expected anisotropy (e.g., a dipole in the same direction as the cosmic microwave background), then one can filter the crosscorrelated data to be optimally sensitive to the harmonics \(\hat{C}_m\) induced by that anisotropy.
Inverse problem
In Allen and Ottewill (1997), there was no attempt to solve the inverse problem—that is, given the measured values of the correlation harmonics, how can one infer (or estimate) the components \(\mathcal{P}_{lm}\)? The first attempt to solve the inverse problem was given in Cornish (2001), in the context of correlation measurements for both groundbased and spacebased interferometers. Further developments in solving the inverse problem were given in subsequent papers, e.g., Ballmer (2006a, b), Mitra et al. (2008) and Thrane et al. (2009), which we explain in more detail in the following subsections. Basically, these latter methods constructed frequentist maximumlikelihood estimators for the \(\mathcal{P}_{lm}\), using singularvalue decomposition to ‘invert’ the Fisher matrix (or point spread function), which maps the true gravitationalwave power distribution to the measured distribution on the sky.
Maximumlikelihood estimates of gravitationalwave power
In this section, we describe an approach for constructing maximumlikehood estimates of the gravitationalwave power distribution \(\mathcal{P}(\hat{n})\). It is a solution to the inverse problem discussed at the end of the previous subsection. But since a network of gravitationalwave detectors typically does not have perfect coverage of the sky, the inversion requires some form of regularization, which we describe below. The gravitationalwave radiometer and spherical harmonic decomposition methods (Sect. 7.3.6) are the two main implementations of this approach, and have been used to analyze LIGO science data (Abadie et al. 2011; Abbott et al. 2016a).
Likelihood function and maximumlikelihood estimators
As shown in Sect. 7.1.3 the crosscorrelated data from two detectors
has expectation values
We can write this relation abstractly as a matrix equation
where \(M_{IJ}\equiv \bar{H}(f)\gamma _{IJ}(t;f,\hat{n})\) and the matrix product is summation over directions \(\hat{n}\) on the sky. The covariance matrix for the crosscorrelated data is given by
where we have assumed as before that there is no crosscorrelated detector noise, and that the gravitationalwave signal is weak compared to the detector noise.
If we treat the detector noise and the gravitationalwave spectral shape \(\bar{H}(f)\) as known quantities (or if we estimate the detector noise from the autocorrelated output of each detector), then we can write down a likelihood function for the crosscorrelated data given the signal model (7.30). Assuming a Gaussianstationary distribution for the noise, we have
where we have temporarily dropped the IJ indices for notational convenience.^{Footnote 21} Since the gravitationalwave power distribution \(\mathcal{P}\) enters quadratically in the exponential of the likelihood, we can immediately write down the maximumlikelihood estimators of \(\mathcal{P}\):
where
The (square) matrix F is called the Fisher information matrix. It is typically a singular matrix, since the response matrix \(M=\bar{H}\gamma \) usually has null directions (i.e., anisotropic distributions of gravitationalwave power that are mapped to zero by the detector response). Inverting F therefore requires some sort of regularization, such as singularvalue decomposition (Press et al. 1992) (Sect. 7.3.5). The vector X is the socalled dirty map, as it represents the gravitationalwave sky as ‘seen’ by a pair of detectors. If the spectral shape \(\bar{H}(f)\) that we used for our signal model exactly matches that of the observed background, then
Thus, even in the absence of noise, a point source \(\mathcal{P}(k) = \delta ^2(\hat{n},\hat{n}_0)\) does not map to a point source by the response of the detectors, but it maps instead to \(F_{\hat{n}\hat{n}_0}\). This ‘blurring’ or ‘spreading’ of point sources is represented by a point spread function, which is a characteristic feature of any imaging system. We give plots of point spread functions for both pulsar timing arrays and groundbased interferometers in Sect. 7.3.4.
Extension to a network of detectors
The above results generalize to a network of detectors. One simply replaces X and F in (7.33) by their network expressions, which are simply sums of the dirty maps and Fisher matrices for each distinct detector pair:
Explicit expressions for the dirty map and Fisher matrix for a network of detectors are:
and
Note that including more detectors in the network is itself a form of regularization, as adding more detectors typically means better coverage of the sky. This tends to ‘soften’ the singularities that may exist when trying to deconvolve (i.e., invert) the detector response.
Error estimates
Using (7.35) it follows that \(\hat{\mathcal{P}}\) is an unbiased estimator of \(\mathcal{P}\):
Similarly, in the weaksignal approximation,
Thus, F is the covariance matrix for the dirty map X, while \(F^{1}\) is the covariance matrix of the clean map \(\hat{\mathcal{P}}\). We will see below (Sect. 7.3.5) that regularization necessarily changes these results as one cannot recover modes of \(\mathcal{P}\) to which the detector network is insensitive. This introduces a bias in \(\hat{\mathcal{P}}\), and changes the corresponding elements of the covariance matrix for \(\hat{\mathcal{P}}\).
Point spread functions
As discussed in the previous section, the point spread function for mapping gravitationalwave power is given by the components of the Fisher information matrix:
Here \(\hat{n}_0\) is the direction to the point source and \(\hat{n}\) is an arbitrary point on the sky. In the following three figures (Figs. 47, 48, 49) we shows plots of point spread functions for both pulsar timing arrays and the LIGO Hanford–LIGO Livingston detector pair.
Example: Pulsar timing arrays
Figure 47 shows plots of point spread functions for pulsar timing arrays consisting of \(N=2\), 5, 10, 20, 25, 50 pulsars. The point source is located at the center of the maps, indicated by a black dot. The pulsar locations (indicated by white stars) were randomlydistributed on the sky, and we used equalnoise weighting for calculating the point spread function. One can see that the point spread function becomes tighter as the number of pulsars in the array increases. Figure 48 are similar plots for an actual array of \(N=20\) pulsars given in Table 6. Note that the pulsar locations are concentrated in the direction of the galactic center, \((\mathrm{ra},\mathrm{dec}) = (6^\mathrm{h}15^\mathrm{m}, 29^\circ )\) in equatorial coordinates. The point source is again located at the center of the maps, indicated by a black dot. The left panel shows the point spread function calculated using equalnoise weighting, while the right panel shows the point spread function calculated using actualnoise weighting, based on the timing noise values given in the second column of Table 6. Note that this latter plot is similar to the smallN plots in Fig. 47, being dominated by pulsars with low timing noise—in this particular case, J0437−4715 and J2124−3358, which have the lowest and thirdlowest timing noise.
Example: Earthbased interferometers
In Fig. 49 we plot point spread functions for gravitationalwave power for the LIGO HanfordLIGO Livingston pair of detectors. The lefthand plot is for a point source located at the center of the map, \((\theta ,\phi )=(90^\circ , 0^\circ )\), while the righthand plot is for a point source located at \((\theta ,\phi )=(60^\circ , 0^\circ )\) (indicated by black dots). We assumed equal whitenoise power spectra for the two detectors, and we combined the contributions from 100 discrete frequencies between 0 and 100 Hz, and 100 discrete time chunks over the course of one sidereal day. The point spread functions for the two different point source locations are shaped, respectively, like a figureeight with a bright region at the center of the figureeight pattern, and a tear drop with a bright region near the top of the drop. These results are in agreement with Mitra et al. (2008) (see e.g., Fig. 1 in that paper). Provided one combines data over a full sidereal day, the point spread function is independent of the right ascension (i.e., azimuthal) angle of the source. Readers should see Mitra et al. (2008) for more details, including a stationary phase approximation for calculating the point spread function.
Angular resolution estimates
There are “rules of thumb” that can be used to estimate the angular resolution \(\Delta \theta \) (or size of a point spread function) for an anisotropic stochastic background search. For crosscorrelations using groundbased interferometers like LIGO, Virgo, etc., the angular resolution of the detector network can be estimated from the diffraction limit (Monnier 2003):
where f is gravitationalwave frequency and D is separation between a pair of detectors. Thus, the larger the separation between detectors and the higher frequencies searched for, the better the angular resolution. For a pulsar timing array consisting of N pulsars, the corresponding estimate is given by
where \(l_\mathrm{max}\) is the maximum value of l for a spherical harmonic decomposition of the background having angular features of size \(\Delta \theta \). The last approximate equality follows from the fact that, at each frequency, one can extract at most N (complex) pieces of information about the gravitationalwave background using an Npulsar array (Boyle and Pen 2012; Cornish and van Haasteren 2014; Gair et al. 2014); and those N pieces of information correspond to the number of spherical harmonic components (lm) out to \(l_\mathrm{max}\), so \(N\sim l_\mathrm{max}^2\). (We will discuss this again in Sect. 7.5.4, in the context of basis skies for a phasecoherent search for anisotropic backgrounds). Note that if we knew the distances to the pulsars in the array and used information from the pulsarterm contribution to the timing residuals (5.17), then \(\Delta \theta \) for a pulsar timing array would have the same form as (7.42), but with D now representing the Earthpulsar distance. See Boyle and Pen (2012) for details.
Singularvalue decomposition
Expression (7.33) for the maximumlikelihood estimator \(\hat{\mathcal{P}}\) involves the inverse of the Fisher matrix F. But this is just a formal expression, as F is typically a singular matrix, requiring some sort of regularization to invert. Here we describe how singularvalue decomposition (Press et al. 1992) can be used to ‘invert’ F. Since this a general procedure, we will frame our discussion in terms of an arbitrary matrix S.
Singular value decomposition factorizes an \(n\times m\) matrix S into the product of three matrices:
where U and V are \(n\times n\) and \(m\times m\) unitary matrices, and \(\Sigma \) is an \(n\times m\) rectangular matrix with (real, nonnegative) singular values \(\sigma _k\) along its diagonal, and with zeros everywhere else. We will assume, without loss of generality, that the singular values are arranged from largest to smallest along the diagonal. We define the pseudoinverse \(S^+\) of S as
where \(\Sigma ^+\) is obtained by taking the reciprocal of each nonzero singular value of \(\Sigma \), leaving all the zeros in place, and then transposing the resulting matrix. Note that when S is a square matrix with nonzero determinant, then the pseudoinverse \(S^+\) is identical to the ordinary matrix inverse \(S^{1}\). Thus, the pseudoinverse of a matrix generalizes the notion of ordinary inverse to nonsquare or singular matrices.
As a practical matter, it is important to note that if the nonzero singular values of \(\Sigma \) vary over several orders of magnitude, it is usually necessary to first set to zero (by hand) all nonzero singular values \(\le \) some minimum threshold value \(\sigma _\mathrm{min}\) (e.g., \(10^{5}\) times that of the largest singular value). Alternatively, we can set those very small singular values equal to the threshold value \(\sigma _\mathrm{min}\). This procedure helps to reduce the noise in the maximumlikelihood estimates, which is dominated by the modes to which we are least sensitive.
Returning to the gravitationalwave case, the above discussion means that all of the previous expressions for the inverse of the Fisher matrix, \(F^{1}\), should actually be written in terms of the pseudoinverse \(F^+\). Thus,
which then implies
So \(\hat{\mathcal{P}}\) is actually a biased estimator of \(\mathcal{P}\) if \(F^+\ne F^{1}\), as was discussed in Thrane et al. (2009).
Figure 50 is a plot of the singular values of typical Fisher matrices for different groundbased interferometer detector pairs (Hanford–Livingston, Hanford–Virgo, Livingston–Virgo) and a multibaseline detector network (Hanford–Livingston–Virgo). For these examples, we chose to expand the gravitationalwave power on the sky \(\mathcal{P}(\hat{n})\) and the integrand of the overlap functions \(\gamma _{IJ}(t;f,\hat{n})\) in terms of spherical harmonics out to \(l_\mathrm{max}=20\). (See Sect. 7.3.6 for more details about the spherical harmonic decomposition method). This yields \((l_\mathrm{max}+1)^2 = 441\) modes of gravitationalwave sky that we would like to recover. Note how the inclusion of more detectors to the network reduces the dynamic range of the singular values of F, hence making the matrix less singular without any external form of regularization.
Radiometer and spherical harmonic decomposition methods
The gravitationalwave radiometer (Ballmer 2006a, b; Mitra et al. 2008) and spherical harmonic decomposition methods (Thrane et al. 2009; Abadie et al. 2011) are two different ways of implementing the maximumlikelihood approach for mapping gravitationalwave power \(\mathcal{P}(\hat{n})\). They differ primarily in their choice of signal model, and their approach for deconvolving the detector response from the underlying (true) distribution of power on the sky.
Gravitationalwave radiometer
The radiometer method takes as its signal model a point source characterized by a direction \(\hat{n}_0\) and amplitude \(\mathcal{P}_{\hat{n}_0}\):
It is applicable to an anisotropic gravitationalwave background dominated by a limited number of widelyseparated point sources. As the number of point sources increases or if two point sources are sufficiently close to one another, the point spread function for the detector network will cause the separate signals to interfere with one another. Thus, the radiometer method is not appropriate for diffuse backgrounds. Moreover, by assuming that the signal is pointlike, the radiometer method ignores correlations between neighboring pixels on the sky, effectively sidestepping the deconvolution problem. Explicitly, the inverse of the Fisher matrix that appears in the maximumlikelihood estimator \(\hat{\mathcal{P}}=F^{1} X\) is replaced by the inverse of the diagonal element \(F_{\hat{n}\hat{n}}\) to obtain an estimate of the pointsource amplitude at \(\hat{n}\):
where X is the dirty map (7.34). Thus, the radiometer method estimates the strength of point sources at different points on the sky, ignoring any correlations between neighboring pixels.
Note that for a single pair of detectors IJ the above estimator (7.49) is equivalent to an appropriately normalized crosscorrelation statistic:
with filter function
where \(\gamma _{IJ}\) is given by (7.8). For a network of detectors, one recovers the estimator \(\hat{\mathcal{P}}_{\hat{n}}\) by summing the individualbaseline statistics (7.50) over both time and distinct detector pairs, weighted by the inverse variances of the individualbaseline statistics. See e.g., Ballmer (2006a, b) and Mitra et al. (2008) for more details.
Spherical harmonic decomposition
The spherical harmonic decomposition method is appropriate for extended anisotropic distributions on the sky, assuming a signal model for gravitationalwave power that includes spherical harmonic components up to some specified value of \(l_\mathrm{max}\):
The cutoff in the expansion at \(l_\mathrm{max}\) corresponds to an angular scale \(\Delta \theta \simeq 180^\circ /l_\mathrm{max}\). The diffraction limit (Monnier 2003):
where f is the maximum gravitationalwave frequency and D is the separation between a pair of detectors, sets an upper limit on the size of \(l_\mathrm{max}\), since the detector network is not able to resolve features having smaller angular scales. For example, for the LIGO Hanford–LIGO Livingston detector pair (\(D= 3000\ \mathrm{km}\)) and a stochastic background having contributions out to \(f\sim 500~\mathrm{Hz}\), we find \(l_\mathrm{max} \lesssim 30\). Alternatively, one can use Bayesian model selection to determine the value of \(l_\mathrm{max}\) that is most consistent with the data.
Since the spherical harmonic method targets extended distributions of gravitationalwave power on the sky, correlations between neighboring pixels or, equivalently, between different spherical harmonic components must be taken into account. This is addressed by using singularvalue decomposition as described in Sect. 7.3.5 to ‘invert’ the Fisher matrix. By effectively ignoring those modes to which the detector network is insensitive, we can construct the pseudoinverse \(F^+\) to perform the deconvolution. In terms of \(F^+\), we have
for the spherical harmonic components of the maximumlikelihood estimators \(\hat{\mathcal{P}}\). The sky map constructed from the \(\hat{\mathcal{P}}_{lm}\) is called a ‘clean’ map, since the inversion removes the detector response from the ‘dirty’ map X.
Figure 51 shows clean maps produced by the spherical harmonic decomposition method for a simulated anisotropic background distributed along the galactic plane (Thrane et al. 2009). The injected map is the bottom plot in the figure. (All sky maps are in equatorial coordinates). The four maps shown in the top two rows of the figure correspond to analyses with different interferometer detector pairs (Hanford–Livingston, Hanford–Virgo, and Livingston–Virgo) and a multibaseline detector network (Hanford–Livingston–Virgo). Consistent with our findings in Fig. 50, we see that the recovered map is best for the multibaseline network, whose Fisher matrix has singular values with the smallest dynamic range. For the reconstructed maps, \(F^+\) was calculated by keeping 2/3 of all the eigenmodes (those with the largest singular values), setting the remaining singular values equal to the minimum value \(\sigma _\mathrm{min}\) of the modes that were kept. For all cases, \(l_\mathrm{max}=20\). The anisotropic background was injected into simulated LIGO and Virgo detector noise (initial design sensitivity) whose power spectra are shown in Fig. 52. The overall amplitude of the signal was chosen to be large enough that it was easily detectable in 1 sidereal day’s worth of simulated data. For additional details see Thrane et al. (2009).
Frequentist detection statistics
As discussed in Sects. 3.4 and 4.4, one can construct a frequentist detection statistic \(\Lambda _\mathrm{ML}(d)\) by taking the ratio of the maxima of the likelihood functions for the signalplusnoise model to the noiseonly model. The logarithm,
is the squared signaltonoise ratio of the data. If we calculate this quantity for an anisotropic background \(\mathcal{P}(\hat{n})\) using (7.32) for the signalplusnoise model, we find
where \(\hat{\mathcal{P}}\) are the maximumlikelihood estimators of \(\mathcal{P}\). As described in Sect. 3.2.1, one can use this statistic to do frequentist hypothesis testing, comparing its observed value \(\Lambda _\mathrm{obs}\) to a threshold \(\Lambda _*\) to decide whether or not to claim detection of a signal.
The above detection statistic can be written in several alternative forms:
where X is the ‘dirty’ map, which is related to \(\hat{\mathcal{P}}\) via \(\hat{\mathcal{P}} = F^{1} X\). The last form suggests a standard matchedfilter statistic:
where \(\bar{\mathcal{P}}_\mathrm{model}\) is an assumed distribution of gravitationalwave power on the sky, normalized such that
The above normalization is chosen so that if the true gravitationalwave background has the same spectral shape \(\bar{H}(f)\) and the same angular distribution \(\bar{\mathcal{P}}_\mathrm{model}\), then \(\lambda (d)\) is an estimator of the overall amplitude of the background. In the absence of a signal, \(\lambda (d)\) has zero mean and unit variance.
Such a matchedfilter statistic was proposed in Appendix C of Thrane et al. (2009) and studied in detail in Talukder et al. (2011). One nice property of this statistic is that it does not require inverting the Fisher matrix. Hence it avoids the inherent bias (7.47) and introduction of other uncertainties associated with the deconvolution process. Indeed, if we are given a model of the expected anisotropy, \(\lambda (d)\) is the optimal statistic for detecting its presence. Thus, \(\lambda (d)\) is especially good at detecting weak anisotropic signals. See Talukder et al. (2011) for more details.
Phasecoherent mapping
Phasecoherent mapping is an approach that constructs estimates of both the amplitude and phase of the gravitationalwave background at each point of the sky (Cornish and van Haasteren 2014; Gair et al. 2014; Romano et al. 2015). In some sense, it can be thought of as the “square root” of the approaches described in the previous subsections, which attempt to measure the distribution of gravitationalwave power \(\mathcal{P}(\hat{n}) = h_+^2 + h_\times ^2\). The gravitationalwave signal can be characterized in terms of either the standard polarization basis components \(\{h_+(f,\hat{n}), h_\times (f,\hat{n})\}\) or the tensor spherical harmonic components \(\{a^G_{(lm)}(f), a^C_{(lm)}(f)\}\). In what follows we will restrict our attention the polarization basis components, although a similar analysis can be carried out in terms of the spherical harmonic components (Gair et al. 2014).
Maximumlikelihood estimators and Fisher matrix
Unlike the previous approaches, which target gravitationalwave power and hence use crosscorrelations (7.6) as their fundamental data product, phasecoherent mapping works directly with the data from the individual detectors. In terms of the shortterm Fourier transforms defined in Sect. 7.1.2, we can write
where I labels the different detectors, and \(\tilde{n}_I(t;f)\) denotes the corresponding detector noise. Given our assumption (7.3) that the spectral and angular dependence of the background factorize with known spectral function \(\bar{H}(f)\), we can rewrite the above equation as
so that the only unknowns are \(\{h_+(\hat{n}), h_\times (\hat{n})\}\) at different locations on the sky. We will write this equation abstractly as a matrix equation
where
The matrix multiplication corresponds to a sum over polarizations A and directions \(\hat{n}\) on the sky.
Assuming that the noise is uncorrelated across detectors, the noise covariance matrix is given by:
where \(P_{n_{I}}(t;f)\) is the onesided power spectral density of the noise in detector I at time t. Thus, we can write down a likelihood function for the data \(d\equiv \{\tilde{d}_I(t;f)\}\) given a:
where the multiplications inside the exponential are matrix multiplications, involving summations over detectors I, times t, and frequencies f, or summations over polarizations A and directions \(\hat{n}\) on the sky. Note that (7.65) has exactly the same form as (7.32), so the same general remarks made in Sect. 7.3.1 apply here as well. Namely, the maximumlikelihood estimators of a are
where
are the Fisher matrices and ‘dirty’ maps for this analysis. (The definitions of M, N here are different, of course, from those in Sect. 7.3.1). Explicit expression for X and F are given below:
and
Note that these expressions have an extra polarization index A, compared to the corresponding expressions, (7.37) and (7.38), for gravitationalwave power.
Point spread functions
The point spread function for the above analysis can now be obtained by fixing values for both \(A'\) and \(\hat{n}'\), and letting A and \(\hat{n}\) vary. Since there are two polarization modes (\(+\) and \(\times \)), there are actually four different point spread functions for each direction \(\hat{n}'\) on the sky:
These correspond to the \(A=+,\times \) responses to the \(A'=+,\times \)polarized point sources located in direction \(\hat{n}'\).
To illustrate the above procedure, we calculate point spread functions for phasecoherent mapping, for pulsar timing arrays consisting of \(N=1\), 2, 5, 10, 25, 50, 100 pulsars. Figure 53 show plots of these point spread functions. The pulsars are randomly distributed over the sky (indicated by white stars), and the point source is located at the center of the maps (indicated by a black dot). For simplicity, we assumed a single frequency bin, and used equalnoise weighting for calculating the point spread functions. (In addition, there is no time dependence as the directions to the pulsars are fixed on the sky). Different rows in the figure correspond to different numbers of pulsars in the array. Different columns correspond to different choices for A and \(A'\): columns 1, 2 correspond to the \(A=+,\times \) response of the pulsar timing array to an \(A'=+\)polarized point source; columns 3, 4 correspond to the \(A=+,\times \) response of the pulsar timing array to an \(A'=\times \)polarized point source. Note that for \(N=1\), the point spread functions are proportional to either \(R_I^+(\hat{n})\) or \(R_I^\times (\hat{n})\) for that pulsar, producing maps similar to those shown in Fig. 27. As N increases the \(++\) and \(\times \times \) point spread functions (columns 1 and 4) become tighter around the location of the point source, which is at the center of the maps. But since the \(+\) and \(\times \) polarizations are orthogonal, the \(\times +\) and \(+\times \) point spread functions (columns 2 and 3) have values close to zero around the location of the point source.
Singular value decomposition
Just as we had to deconvolve the detector response in order to obtain the estimators \(\hat{\mathcal{P}}\) for gravitationalwave power, we need to do the same for the estimators \(\hat{a}\) for the phasecoherent mapping approach. Although we could use singularvalue decomposition for the Fisher matrix F given by (7.69), we will first whiten the data, which leads us directly to pseudoinverse of the whitened response matrix M, (7.63). This is the approach followed in Cornish and van Haasteren (2014) and Romano et al. (2015), and it leads to some interesting results regarding skymap basis vectors, which we will describe in more detail in Sect. 7.5.4. An alternative approach involving the pseudoinverse of the unwhitened response matrix is given in Gair et al. (2014) and Appendix B of Romano et al. (2015).
To whiten the data, we start by finding the Cholesky decomposition of the inverse noise covariance matrix \(N^{1} = LL^\dagger \), where L is a lower triangular matrix. The whitened data are then given by \(\bar{d} = L^\dagger d\) (since this has unit covariance matrix), and the whitened response matrix is given by \(\bar{M} = L^\dagger M\). In terms of these whitened quantities,
implying
The last equality is a formal expression for the pseudoinverse \(\bar{M}^+\) since \(\bar{M}^\dagger \bar{M}\) is not necessarily invertible. But as shown in Sect. 7.3.5 it is always possible to define the pseudoinverse of a matrix in terms of its singularvalue decomposition. Thus, given the singularvalue decomposition:
we have
where \(\Sigma ^+\) is defined by the procedure described in Sect. 7.3.5. Thus,
This is the expression we need to compute to calculate the maximumlikelihood estimators \(\hat{a}\) for the phasecoherent mapping approach.
Basis skies
The singularvalue decomposition of \(\bar{M}\) also has several nice geometrical properties. For example, from (7.75), we see that the columns of V corresponding to the nonzero singular values of \(\Sigma \) are basis vectors (which we will call basis skies) in terms of which \(\hat{a}\) can be written as a linear combination. Similarly, if write the whitened response to the gravitationalwave background as
then we see that the columns of U corresponding to the nonzero singular values of \(\Sigma \) can be interpreted as range vectors for the response. To be more explicit, let \(u_{(k)}\) and \(v_{(k)}\) denote the kth columns of U and V, and let r be the number of nonzero singular values of \(\Sigma \). Then
where the dot product of two vectors a and b is defined as \(a\cdot b = a^\dagger b\). If we further expand \(\bar{d} = \bar{M} a + \bar{n}\) in the first of these equations, then
This last expression involves the projection of the true gravitationalwave sky a onto the basis skies \(v_{(k)}\) for only the nonzero singular values of \(\Sigma \).
In Fig. 54, we show plots of the real parts of the \(+\) and \(\times \)polarization basis skies for a pulsar timing array consisting of \(N=5\) pulsars randomly distributed on the sky. The imaginary components of the basis skies are identically zero, and hence are not shown in the figure. The basis skies are shown in decreasing size of their singular values, from top to bottom. In general, if N is the number of pulsars in the array, then the number of basis skies is 2N (the factor of 2 corresponding to the two polarizations, \(+\) and \(\times \)). This means that one can extract at most 2N real pieces of information about the gravitationalwave background with an Npulsar array. This is typically fewer than the number of modes of the background that we would like to recover.
Underdetermined reconstructions
More generally, let’s consider the case where the total number of measured data points n is less than the number of modes m that we are trying to recover (so \(n<m\)), or where there are certain modes of the gravitationalwave background (e.g., null skies) that our detector network is simply insensitive to. Then, for both of these cases, the linear system of equations that we are trying to solve, \(\bar{d} = \bar{M} a\), is underdetermined—i.e., there exist multiple solutions for a, which differ from (7.75) by terms of the form
where \(a_\mathrm{arb}\) is an arbitrary gravitationalwave background. (Note that \(a_\mathrm{null}\) is an element of the null space of \(\bar{M}\) as it maps to zero under the action of \(\bar{M}\)). Our solution for \(\hat{a}\) given in (7.75) sets to zero those modes that we are insensitive to. Our solution also sets to zero the variance of these modes.
In a Bayesian formulation of the problem, one needs to specify prior probability distributions for the signal parameters, in addition to specifying the likelihood function (7.65). For a mode of the background to which our detector network is insensitive, the marginalized posterior for that mode will be the same as the prior, since the data are uniformative about this mode. This is what one would expect for a mode that is unconstrained by the data, in contrast to setting the variance equal to zero as we do with our maximumlikelihood reconstruction. Basically, our maximumlikelihood reconstruction does not attempt to say anything about the modes of the background for which we have no information.
Pulsar timing arrays
The phasecoherent mapping approach was first developed in the context of pulsar timing arrays (Cornish and van Haasteren 2014; Gair et al. 2014). In Cornish and van Haasteren (2014), the analysis was done in terms of the standard polarization components \(a\equiv \{h_+(f,\hat{n}),h_\times (f,\hat{n})\}\), similar to what we described above. In Gair et al. (2014), the analysis was done in terms of the tensor spherical harmonic components \(a\equiv \{a_{(lm)}^G(f), a_{(lm)}^C(f)\}\). Now recall from (5.23) that the Earthtermonly, Dopplerfrequency response functions are given by
where \(\hat{p}\) is the direction to an arbitrary pulsar. Thus, the pulsar response to curl modes is identically zero. This means that a pulsar timing array is blind to half of all possible modes of a gravitationalwave background, regardless of how many pulsars there are in the array. Note that this statement is not restricted to the tensor spherical harmonic analysis; it is also true in terms of the standard \((+,\times )\) polarization components, since \(a^G_{(lm)}(f)\) and \(a^C_{(lm)}(f)\) are linear combinations of \(h_+(f,\hat{n})\) and \(h_\times (f,\hat{n})\), see (2.11). It is just that the insensivity of a pulsar timing array to half of the gravitationalwave modes is manifest in the gradient and curl spherical harmonic basis for which (7.80) is valid.
To explicitly demonstrate that a pulsar timing array is insensitive to the curlcomponent of a gravitationalwave background, Gair et al. (2014) constructed maximumlikelihood estimates of a simulated background containing both gradient and curl modes. The total simulated background and its gradient and curl components are shown in the top row (panels a–c) of Fig. 55. (Note that this is for a noiseless simulation so as not to confuse the lack of reconstructing the curl component with the presence of detector noise). Panel e shows the maximumlikelihood recovered map for a pulsar timing array consisting of \(N=100\) pulsars randomly distributed on the sky. Panels d and f are residual maps obtained by subtracting the maximumlikelihood recovered map from the gradient component and the total simulated background, respectively. Note that the maximumlikelihood recovered map resembles the gradient component of the background, consistent with the fact that a pulsar timing array is insenstive to the curl component of a gravitationalwave background. The residual map for the gradient component (panel d) is much cleaner than the residual map for the total simulated background (panel f), which has angular structure that closely resembles the curl component of the background.
Groundbased interferometers
The phasecoherent mapping approach can also be applied to data taken by a network of groundbased interferometers (Romano et al. 2015). Again the analysis can be performed in terms of either the standard \(+\), \(\times \) polarization components or the gradient and curl spherical harmonic components. Recall from (5.33) that
for a groundbased interferometer in the smallantenna limit, with its vertex at the origin, and with unit vectors \(\hat{u}\), \(\hat{v}\) pointing in the direction of the interferometer arms. At first, one might think that these expressions imply that a network of groundbased interferometers is also blind to the curl component of a gravitationalwave background. But (7.81) are valid only for interferometers with their vertices at the origin of coordinates. Since a translation mixes gradient and curl components, the response functions for an interferometer displaced from the origin by \(\hat{x}_0\) are given by Romano et al. (2015):
where \(\alpha \equiv 2\pi f{\vec {x}}_0/c\) and \(j_L(\alpha )\) are spherical Bessel functions of order L. Here
is shorthand for the particular combination of spherical harmonics that enter the expression for \(R^G_{(lm)}(f)\) in (7.81). The two expressions in parentheses \((\ )\) for each response function are Wigner 3j symbols (see, e.g., Wigner 1959; Messiah 1962). Note that the curl response is now nonzero, and both response functions depend on frequency via the quantity \(\alpha \), which equals \(2\pi \) times the number of radiation wavelengths between the origin and the vertex of the interferometer. These expressions are valid in an arbitrary translated and rotated coordinate system, provided the angles for \(\hat{u}\), \(\hat{v}\), and \(\hat{x}_0\) are calculated in the rotated frame.
Thus, the spatial separation of a network of groundbased interferometers, or of a single interferometer at different times during its daily rotational and yearly orbital motion around the Sun (Sect. 5.5.3), allows for recovery of both the gradient and curl components of a gravitationalwave background. This is in contrast to a pulsar timing array, which is insensitive to the curl component, because one vertex of all the pulsar baselines are ‘pinned’ to the solar system barycenter. To illustrate this difference, we show in Fig. 56, maximumlikelihood recovered sky maps for simulated gradonly and curlonly anistropic backgrounds injected into noise for a 3detector network of groundbased interferometers (Hanford–Livingston–Virgo). The gradonly and curlonly backgrounds are the same as those used for the simulated maps in Fig. 55. In contrast to the recovered maps shown in that figure for the pulsar timing array, the maximumlikelihood maps (bottom row) for the network of groundbased interferometers reproduce the general angular structure of both the gradonly and curlonly injected maps (shown in the top row). (The noise for these injections degrades the recovery compared to the noiseless injections in Fig. 55). See Romano et al. (2015) for more details and related simulations.
Searches for other types of backgrounds/signals
No idea is so outlandish that it should not be considered with a searching but at the same time a steady eye. Winston Churchill
Since stochastic gravitationalwave backgrounds come in many different “flavors”, one needs additional search methods that go beyond the standard “vanilla” crosscorrelation search for a Gaussianstationary, unpolarized, isotropic signal (Sects. 4, 5) to extract the relevant information from the more exotic backgrounds. In Sect. 7, we discussed how to search for anisotropic signals, which are stronger coming from certain directions on the sky than from others. In this section, we discuss search methods for nonGaussian signals (Sect. 8.1), circularly polarized backgrounds (Sect. 8.2), and additional polarization modes predicted by alternative (nongeneralrelativity) metric theories of gravity (Sects. 8.3, 8.4, 8.5). In Sect. 8.6, we also briefly mention searches for other types of gravitationalwave signals, which are not really stochastic backgrounds, but nonetheless can be searched for using the basic idea of crosscorrelation, which we developed in Sect. 4. The majority of the search methods that we will describe here have been implemented “across the band”—i.e., for groundbased interferometers, spacebased interferometers, and pulsar timing arrays. For these methods, we will highlight any significant differences in the implementations for the different detectors, if there are any.
Of course, we do not have enough time or space in this section to do justice for all of these methods. As such, readers are strongly encouraged to read the original papers for more details. For nonGaussian backgrounds, see Drasco and Flanagan (2003), Seto (2009), Thrane (2013), Martellini and Regimbau (2014) and Cornish and Romano (2015); for circular polarization, see Seto and Taruya (2007, 2008) and Kato and Soda (2016); for polarization modes in alternative theories of gravity, see Lee et al. (2008), Nishizawa et al. (2009), Chamberlin and Siemens (2012) and Gair et al. (2015); and for the other types of signals, see Thrane et al. (2011) and Messenger et al. (2015).
NonGaussian backgrounds
In Sect. 2.1, we asked the question “when is a gravitationalwave signal stochastic” to highlight the practical distinction between searches for deterministic and stochastic signals. From an operational perspective, a signal is stochastic if it is best searched for using a stochastic signal model (i.e., one defined in terms of probability distributions), even if the signal is intrinsically deterministic, e.g., a superposition of sinusoids. This turns out to be the case if the signals are: (i) sufficiently weak that they are individually unresolvable in a single detector, and hence can only be detected by integrating their correlated contribution across multiple detectors over an extended period of time, or (ii) they are sufficiently numerous that they overlap in timefrequency space, again making them individually unresolvable, but producing a confusion noise that can be detected by crosscorrelation methods. If the rate of signals is large enough, the confusion noise will be Gaussian thanks to the central limit theorem. But if the rate or dutycycle is small, then the resulting stochastic signal will be nonGaussian and “popcornlike”, as we discussed in Sect. 1.1. This is the type of signal that we expect from the population of binary black holes that produced GW150914 and GW151226; and it is the type of signal that we will focus on in the following few subsections.
Figure 57 illustrates the above statements in the context of a simple toymodel signal consisting of simulated sineGaussian bursts (each with a width \(\sigma _t= 1~\mathrm{s}\)) having different rates or duty cycles. The left two panels correspond to the case where there is 1 burst every 10 seconds (on average). The probability distribution of the signal samples h (estimated by the histogram in the lowerlefthand panel) is far from Gaussian for this case. The right two panels correspond to 100 bursts every second (on average), for which the probability distribution is approximately Gaussiandistributed, as expected from the central limit theorem.
NonGaussian search methods: overview
There are basically two different approaches that one can take to search for nonGaussian stochastic signals: (i) The first is to incorporate the nonGaussianity of the signal into the likelihood function by marginalizing over the appropriate signal model (Sect. 8.1.2). Then given the likelihood, one can construct frequentist detection statistics and estimators from the maximumlikelihood ratio (3.24), or do Bayesian model selection in the usual way (Sect. 3). (ii) The second approach is to construct specific frequentist statistics that targets the higherorder moments of the nonGaussian distribution, and then use these statistics to do standard frequentist hypothesis testing and parameter estimation. This approach is most simply cast in terms of the skewness and (excess) kurtosis of the distribution, which are the third and fourthorder cumulants, defined as follows: If X is a random variable with probability distribution \(p_X(x)\), then the moments are defined by (Appendix B):
and the cumulants by
Note that \(c_1\) and \(c_2\) are just the mean \(\mu \) and variance \(\sigma ^2\) of the distribution. For a Gaussian distribution, \(c_3=0, c_4=0, \ldots \). For a distribution with zero mean, the above formulas simplify to \(c_1=0\), \(c_2=\mu _2\), \(c_3=\mu _3\), and \(c_4 = \mu _4  3\mu _2^2\). The higherordermoment approach requires 3rd or 4thorder correlation measurements (Sect. 8.1.5).
Likelihood function approach for nonGaussian backgrounds
Fundamentally, searching for nonGaussian stochastic signals is no different than searching for a Gaussian stochastic signal. In both cases one must: (i) specify a signal model, (ii) incorporate that signal model into a likelihood function or frequentist detection statistic/estimator, and (iii) then analyze the data to determine how likely it is that a signal is present. It is the choice of signal model, of course, that determines what type of signal is being searched for.
The signal model is incorporated into the likelihood via marginalization over the signal samples as discussed in Sect. 3.6.2. Assuming Gaussianstationary noise^{Footnote 22} with covariance matrix \(C_n\), the probability of observing data d in a network of detectors given signal model \(\bar{h}\) is (3.53):
where
are the residuals in detector I. (The subscript i labels either a time or frequency sample for the analysis, whichever is being used). Since one is often not interested in the particular values of \(\bar{h}\), but rather the values of the parameters \(\mathbf {\theta }_h\) that describe the signal, one marginalizes over \(\bar{h}\):
This yields a likelihood function that depends on the signal and noise parameters \(\mathbf {\theta }_h\), \(\mathbf {\theta }_n\equiv C_n\). It is this likelihood function that we then use for our statistical analysis.
Several different signal priors, which have been proposed in the literature, are given below. For simplicity, we will consider the case where the detectors are colocated and coaligned, and have isotropic antenna patterns, so that the contribution from the signal is the same in each detector, and is independent of direction on the sky. For real analyses, these simplifications will need to be dropped, as is done e.g., in Thrane (2013).
Gaussian signal prior
This is the standard prior that one uses for describing a Gaussianstochastic signal, and leads to the usual Gaussianstochastic crosscorrelation detection statistic (Sect. 4.4).
Drasco and Flanagan (2003) nonGaussian signal prior
This prior corresponds to Gaussian bursts occuring with probability \(0\le \xi \le 1\) and with rootmeansquare (rms) amplitude \(\alpha \).
MixtureGaussian signal prior
The mixtureGaussian signal prior is a nonGaussian distribution, which reduces to the Gaussian signal prior in the limit \(\xi \rightarrow 1\). It reduces to the Drasco and Flanagan signal prior in the limit \(\beta \rightarrow 0\).
Martellini and Regimbau (2014) nonGaussian signal prior
where
is the 4thorder Edgeworth expansion (Martellini and Regimbau 2014) of a nonGaussian distribution with third and fourthorder cumulants \(c_3\) and \(c_4\). (\(H_n(x)\) denotes a Hermite polynomial of order n). The Edgeworth expansion is referenced off a Gaussian probability distribution, and is thus said to be a semiparametric representation of a nonGaussian distribution. This prior reduces to the Drasco and Flanagan signal prior when \(c_3=0\), \(c_4=0\).
Multisinusoid signal prior
This is a deterministic signal prior, corresponding to the superposition of M sinusoids with unknown amplitudes, frequencies, and phases, \(\mathbf {\theta }_h = \{A_I, f_I, \varphi _I\vert I=1,2,\ldots , M\}\). This was one of the signal models used in Cornish and Romano (2015) to investigate the question of when is a signal stochastic.
Superposition of finiteduration deterministic signals
Here, \(\mathcal{T}(t\mathbf {\theta }_\mathcal{T})\) is a normalized waveform (template) for some deterministic signal (e.g., a chirp from an inspiralling binary, a sineGaussian burst, a ringdown signal, \(\ldots \)) described by parameters \(\mathbf {\theta }_\mathcal{T}\) (e.g., chirp mass, correlation time, frequency, \(\ldots \)). \(A_I\) is the amplitude of the Ith signal and \(t_I\) is its arrival time. Note that these signal waveforms can be extended in time, having a characteristic duration \(\tau \). Thus, this signal model is intermediate between the singlesample burst and multisinusoid signal models.
Generic likelihood for unresolvable signals
In Thrane (2013), Thrane writes down a generic likelihood function for a nonGaussian background formed from the superposition of signals which are individually unresolvable in a single detector. The likelihood function:
is defined for a pair of detectors I, J, and takes as its fundamental data vector estimates of the signaltonoise ratio of the crosscorrelated power in the two detectors:
where
Here \(\tau \) is the duration of the shortterm Fourier transforms and \(\delta f\) is the frequency resolution. (Note that \(\delta f\) can be greater than \(1/\tau \) if one averages together neighboring frequency bins). The product over i is over timefrequency pixels tf. The functions S and B are probability distributions for \(\hat{\rho }_i\) for the signal and noise models, respectively. These distributions are generic in the sense that they are to be estimated using Monte Carlo simulations with injected signals for the signal model S, and via timeslides on real data for the noise model B. They need not be Gaussian for either the signal or the detector noise. The vectors \(\mathbf {\theta }_h\) and \(\mathbf {\theta }_n\) denote parameters specific to the signal and noise models. Although the above likelihood function was not obtained by explicitly marginalizing over \(\bar{h}\), mathematically there is some signal prior and noise model which yields this likelihood upon marginalization.
Frequentist detection statistic for nonGaussian backgrounds
As discussed in Sect. 3.4, given likelihood functions for the signalplusnoise and noiseonly models, we can construct a frequentist detection statistic from either the maximumlikelihood ratio \(\Lambda _\mathrm{ML}(d)\) given by (3.24), or twice its logarithm, \(\Lambda (d) \equiv 2\ln (\Lambda _\mathrm{ML}(d))\), which has the interpretation of being the squared signaltonoise ratio of the relevant data. For a white Gaussian stochastic signal in white Gaussian detector noise (assuming a pair of colocated and coaligned detectors), we showed in Sect. 4.4:
where N is the number of samples, and where the last approximate equality assumes that the gravitationalwave signal is weak compared to the detector noise. We have added the superscript G to indicate that this is for a Gaussianstochastic signal model.
We can perform exactly the same calculations, making the same assumptions, for the likelihood functions constructed from any of the nonGaussian signal priors given above (in Sect. 8.1.2). These calculations have already been done for the Drasco–Flanagan and Martellini–Regimbau signal priors (Drasco and Flanagan 2003; Martellini and Regimbau 2014). The expressions that they find for the maximumlikelihood ratios \(\Lambda _\mathrm{ML}^\mathrm{NG}(d)\) for their nonGaussian signal models are rather long and not particularly informative, so we do not bother to write them down here (interested readers should see (1.8) in Drasco and Flanagan 2003, and the last equation in Martellini and Regimbau 2014). The values of the parameters that maximize the likelihood ratio are estimators of \(\xi \), \(\alpha \), \(S_{n_1}\), \(S_{n_2}\) for the Drasco and Flanagan signal model, and estimators of \(\xi \), \(\alpha \), \(c_3\), \(c_4\), \(S_{n_1}\), \(S_{n_2}\) for the Martellini and Regimbau signal model.
To illustrate the performance of a nonGaussian detection statistic, we plot in Fig. 58 the minimum value of \(\Omega _\mathrm{gw}\) (\(S_h\) in the notation above) necessary for detection as a function of the duty cycle \(\xi \). (The signal becomes Gaussian as \(\xi \rightarrow 1\)). The solid line is the theoretical prediction for the Drasco and Flanagan nonGaussian maximumlikelihood statistic, while the dashed line is the theoretical prediction for the standard Gaussianstochastic crosscorrelation statistic. The dotted line is the theoretical prediction for a singledetector burst statistic, which is just the maximum of the absolute value of the data samples in e.g., detector 1: \(\Lambda ^\mathrm{B}(d) = \max _{i} d_{1i}\). The false alarm and false dismissal probabilities were both chosen to equal 0.01 for this calculation. From the figure one sees that for \(\xi \,{\gtrsim }10^{3}\), the Gaussianstochastic crosscorrelation statistic performs best. For smaller values of \(\xi \), the nonGaussian statistic is better. In particular, for \(\xi \sim 10^{4}\). there is a factor of \({\sim }2\) improvement in the minimum detectable signal amplitude if one uses the nonGaussian maximumlikelihood detection statistic.
Figure 59 is taken from Thrane (2013) and shows posterior distributions for the duty cycle \(\xi \) calculated for Monte Carlo simulations corresponding to pure background \(\xi =0\) (dashdot blue), pure signal \(\xi =1\) (solid red), and an even mixture \(\xi =0.5\) (dashed green). These curves illustrate that the formalism in Thrane (2013) can provide estimates of the duty cycle \(\xi \) of the nonGaussian background. See Thrane (2013) for more details.
Bayesian model selection
As an alternative to using frequentist detection statistics and estimators to search for potentially nonGaussian signals, one can use Bayesian model selection to compare the noiseonly model \(\mathcal{M}_0\) to different signalplusnoise models \(\mathcal{M}_1, \mathcal{M}_2, \ldots \). This is a general procedure for Bayesian inference, which was discussed in Sect. 3.3.3. As shown there, the posterior odds ratio between two different models \(\mathcal{M}_\alpha \) and \(\mathcal{M}_\beta \) can be written as
where the first ratio on the righthand side is the prior odds for the two models, while the second term is the Bayes factor:
which is a ratio of model evidences:
and similarly for \(p(d\mathcal{M}_\beta )\). If one assumes equal prior odds, then the posterior odds ratio is just the Bayes factor, and we can use its value to rule in favor of one model or another (see Table 3).
The idea of using Bayesian model selection in the context of searches for nonGaussian stochastic backgrounds was proposed by us in Cornish and Romano (2015). We considered a simple toyproblem consisting of simulated data in two colocated and coaligned detectors, having uncorrelated white Gaussian detector noise plus a gravitationalwave signal formed from the superposition of sinusoids having amplitudes drawn from an astrophysical population of sources. Such a signal is effectively the frequencydomain version of the shortduration timedomain bursts discussed in the previous subsections. Five different models were considered:

\(\mathcal{M}_0\): noiseonly model, consisting of uncorrelated white Gaussian noise in two detectors with unknown variances \(\sigma _1^2\), \(\sigma _2^2\).

\(\mathcal{M}_1\): noise plus the Gaussianstochastic signal model defined by (8.6).

\(\mathcal{M}_2\): noise plus the mixtureGaussian stochastic signal model defined by (8.8).

\(\mathcal{M}_3\): noise plus the deterministic multisinusoid model defined by (8.11).

\(\mathcal{M}_4\): noise plus the deterministic multisinusoid signal model plus the Gaussianstochastic signal model. This is a hybrid signal model that allows for both stochastic and deterministic components for the signal.
Simulated data were generated by coadding sinusoidal signals with amplitudes drawn from an astrophysical model (Sesana 2013), and phases and frequencies drawn uniformly across the range spanned by the data. Gaussiandistributed white noise for the two detectors were then added to the signal data. The amplitude of the signals were scaled so as to produce a specified matched filter signaltonoise ratio per frequency bin. Markov Chain Monte Carlo analyses were run to compare the noiseonly model \(\mathcal{M}_0\) to each of the four signalplusnoise models \(\mathcal{M}_1,\ldots , \mathcal{M}_4\). Quantile intervals for the Bayes factors were estimated from 256 independent realizations of the simulated data for each set of parameter values. These intervals capture the fluctuation in the Bayes factors that come from different realizations of the data; they are not uncertainties in the Bayes factors associated with different Monte Carlo simulations for a single realization, which were \({\lesssim }10\%\).
Figure 60 is a representative plot taken from Cornish and Romano (2015), comparing the different models. The left panel shows the Bayes factors for the four different signalplusnoise models relative to the noiseonly model plotted as a function of the average number of sources per bin. The right panel shows the fraction of time that the different models had the largest Bayes factor for the different simulations. The total number of bins was set to 32 for these simulations and the SNR per bin was fixed at 2. From these and other similar plots in Cornish and Romano (2015), one can draw the general conclusion that deterministic models are generally favored for small source densities, a nonGaussian stochastic model is preferred for intermediate source densities, and a Gaussianstochastic model is preferred for large source densities. Given the large fluctuations in the Bayes factors associated with different signal realizations, the boundaries between these three regimes is rather fuzzy. The hybrid model, which has a deterministic component for the bright signals and a Gaussianstochastic component for the remaining confusion background, is the best model for the majority of cases.
Fourthorder correlation approach for nonGaussian backgrounds
In this section, we briefly describe a fourthorder correlation approach for detecting nonGaussian stochastic signals, originally proposed in Seto (2009). The key idea is that by forming a particular combination of data from 4 detectors (the excess kurtosis), one can separate the nonGaussian contribution to the background from any Gaussiandistributed component. This approach requires that the noise in the four detectors be uncorrelated with one another, but it does not require that the noise be Gaussian. Here we sketch out the calculation for 4 colocated and coaligned detectors, which we will assume have isotropic antenna patterns, so that the contribution from the gravitationalwave signal is the same in each detector, and is independent of direction on the sky. These simplifying assumptions are not essential for this approach; the calculation for separated and misalinged detectors with nonisotropic response functions can also be done (Seto 2009).
Let’s begin then by denoting the output of the four detectors \(I=1,2,3,4\) in the Fourier domain by
where \(\tilde{n}_I\) denotes the noise in detector I and \(\tilde{h}\) denotes the total gravitationalwave contribution, which has a Gaussianstochastic component \(\tilde{g}\), and a nonGaussian component formed from the superposition of shortduration burst signals \(\tilde{b}_i\), \(i=1,2,\ldots , n\). We assume that the noise in the detectors are uncorrelated with one another and with the gravitationalwave signals, and that the individual gravitationalwave signals are also uncorrelated amongst themselves. The (random) number of bursts present in a particular segment of data is determined by a Poisson distribution
where
is the expected number of bursts in segment duration \(T_\mathrm{seg}\). The 4thorder combination of data that we consider is
where angle brackets \(\langle \ \rangle \) can be thought of as either expectation value (i.e., ensemble average) or as an average over the Fourier components of the data, i.e., as an estimator of the expected correlations. Since the noise in the detectors are uncorrelated with everything, the only contributions to \(\mathcal{K}\) will come from expectation values of products of \(\tilde{h} = \tilde{g} +\sum _i \tilde{b}_i\) with itself. Calculating the quadratic terms that enter (8.23), we find:
where we used
which assumes that all the bursts have the same meansquare value, \(\langle \tilde{b}_i\tilde{b}_i\rangle \equiv \langle \tilde{b}\tilde{b}\rangle \). For the 4thorder term, we find:
Substituting these results back into expression (8.23) yields:
where we used
for the Gaussianstochastic signal component \(\tilde{g}\). Thus, both the detector noise and the Gaussianstochastic component of the signal have dropped out of the expression for \(\mathcal{K}\), leaving only the contribution from the nonGaussian component of the background.
As mentioned already, the above calculation can be extended to the case of separated and misaligned detectors (Seto 2009). In so doing, one obtains expressions for generalized (4thorder) overlap functions, which are skyaverages of the product of the response functions for four different detectors. The expected value of the 4thorder detection statistic for this more general analysis involves generalized overlap functions for both the (squared) overall intensity and circular polarization components of the nonGaussian background. We will discuss circular polarization in the following section, but in the simpler context of Gaussianstationary isotropic backgrounds. Readers should see Seto (2009) for more details regarding circular polarization in the context of nonGaussian stochastic signals discussed above.
Circular polarization
Up until now, we have only considered unpolarized stochastic backgrounds. That is, we have assumed that the gravitationalwave power in the \(+\) and \(\times \) polarization modes are equal (on average) and are statistically independent of one another (i.e., there are no correlations between the \(+\) and \(\times \) polarization modes). It is possible, however, for some processes in the early Universe to give rise to parity violations (Alexander et al. 2006), which would manifest themselves as an asymmetry in the amount of right and left circularly polarized gravitational waves. Following Seto and Taruya (2007, 2008), we now describe how to generalize our crosscorrelation methods to look for evidence of circular polarization in a stochastic background.
Polarization correlation matrix
Let us start by writing down the quadratic expectation values for the Fourier components \(h_{ab}(f,\hat{n})\) of the metric perturbations \(h_{ab}(t,{\vec {x}})\) for a polarized anisotropic Gaussianstationary background. (We will restrict attention to isotropic backgrounds later on). It turns out that these expectation values can be written in terms of the Stokes’ parameters I, Q, U, and V, which are defined for a monochromatic plane gravitational wave in Appendix A. If we expand \(h_{ab}(f,\hat{n})\) in terms of the linear polarization basis tensors \(e^A_{ab}(\hat{n})\), where \(A=\{+,\times \}\), we have
where
If instead we expand \(h_{ab}(f,\hat{n})\) in terms of the circular polarization basis tensors \(e^C_{ab}(\hat{n})\), where \(C=\{R,L\}\), then
where
This second representation of the polarization correlation matrix is sometimes more convenient when one is searching for evidence of circular polarization in the background, as V is a measure of a possible asymmetry between the right and left circular polarization components:
The factor of 1/2 on the righthand side of the above equation, as compared to (A.16), is for onesided power spectra.
As discussed in Appendix A, the Stokes’ parameters I and V are ordinary scalar (spin 0) fields on the sphere, while Q and U transform like spin 4 fields under a rotation of the unit vectors \(\{\hat{l},\hat{m}\}\) tangent to the sphere. Thus, I and V can be written as linear combinations of the ordinary spherical harmonics \(Y_{lm}(\hat{n})\):
while \(Q\pm iU\) can be written as linear combination of the spinweighted \(\pm 4\) spherical harmonics \({}_{\pm 4}Y_{lm}(\hat{n})\):
Note that the expansions for \(Q\pm i U\) start at \(l=4\), which means that the Q, U components of the polarization correlation matrix vanish if the background is isotropic (i.e., has only a contribution from the monopole \(l=0\), \(m=0\)). So for simplicity, we will restrict our attention to polarized isotropic backgrounds, for which the circular polarization correlation matrix becomes diagonal and the quadratic exprectation values reduce to:
where
Note that
which is just the total strain power spectral density for the gravitationalwave background.
Overlap functions
Given (8.36), we are now in a position to calculate the expected value of the product of the Fourier transforms of the response of two detectors I and J to such a background. Similar to (5.9), we can write the response of detector I as
where R, L label the right and left circular polarization states for both the Fourier components and the detector response functions. Writing down a similar expression for the response of detector J, and using (8.36) to evaluate the expected value of the product of the responses, we find
where
are the overlap functions for the I and V Stokes parameters for a polarized isotropic stochastic background. Using
we can also write the above overlap functions as
Note that \(\Gamma ^{(I)}_{IJ}(f)\) is identical to the ordinary overlap function \(\Gamma _{IJ}(f)\) for an isotropic background (5.36).
Figure 61 show plots of the I and V overlap functions for the LIGOVirgo detector pairs, using the smallantenna limit for the strain response functions. The overlap functions have been normalized (5.40) so that \(\gamma ^{(I)}_{IJ}(f)=1\) for colocated and coaligned detectors. Similar plots can be made for other interferometer pairs, by simply using the appropriate response functions for those detectors.
Note that for pulsar timing, one can show that \(\Gamma ^{(V)}_{IJ}(f) = 0\) for any pair of pulsars. This means that one cannot detect the presence of a circularly polarized stochastic background using a pulsar timing array if one restricts attention to just the isotropic component of the background. One must include higherorder multipoles in the analysis—i.e., do an anisotropic search as discussed in Sect. 7. Such an analysis for anisotropic polarized backgrounds using pulsar timing arrays is given in Kato and Soda (2016). In that paper, they extend the analysis of Mingarelli et al. (2013) to include circular polarization. See Kato and Soda (2016) for additional details.
Component separation: ML estimates of I and V
As shown in Seto and Taruya (2007, 2008), in order to separate the I(f) and V(f) contributions to a polarized isotropic background at each frequency f, we will need to analyze data from at least two independent baselines (so three or more detectors). In what follows, we will use the notation \(\alpha =1,2,\ldots , N_b\) to denote the individual baselines (detector pairs) and \(\alpha _1\), \(\alpha _2\) to denote the two detectors that constitute that baseline. The formalism we adopt here is similar to that for constructing maximumlikelihood estimators of gravitationalwave power for unpolarized anisotropic backgrounds (Sect. 7.3). For a general discussion of component separation for isotropic backgrounds, see Parida et al. (2016).
As usual, we begin by crosscorrelating the data from pairs of detectors for the independent baselines:
where
are the Fourier transforms of the timedomain data \(d_{\alpha _I}(t)\), and T is the duration of the data. Assuming that the noise in the individual detectors are uncorrelated with one another, we can easily calculate the expected value of \(\hat{C}_\alpha (f)\) using our previous result (8.40). The result is
We will write this equation abstractly as a matrix equation
where
In this notation, \(\hat{C}\) is an \(N_f N_b\times 1\) data vector, M is an \(N_f N_b\times 2 N_f\) detector network response matrix, and \(\mathcal{S}\) is an \(2N_f\times 1\) vector containing the unknown Stokes’ parameters, which we want to estimate from the data.^{Footnote 23}
We also need an expression for the noise covariance matrix \(\mathcal{N}\) for the crosscorrelated data \(\hat{C}\). In the weaksignal limit, the covariance matrix is approximately diagonal with matrix elements
where \(P_{n_{\alpha _I}}(f)\) are the onesided power spectral densities of the noise in the detectors. If we treat the noise power spectra as known quantities (or if we estimate the noise power spectra from the autocorrelated output of each detector), we can write down a likelihood function for the crosscorrelated data given the signal model (8.47). Assuming a Gaussianstationary distribution for the noise, we have
This likelihood has exactly the same form as that in (7.32), so the maximumlikehood estimators for the Stokes’ parameters \(\mathcal{S} = [I, V]^T\) also have the same form:
where
with M and \(\mathcal{N}\) given above. As before, inverting F may require some sort of regularization, e.g., using singularvalue decomposition (Sect. 7.3.5). If that’s the case then \(F^{1}\) should be replaced in the above formula by its pseudoinverse \(F^+\). The uncertainty in the maximum likelihood recovered values is given by the covariance matrix
where we are again assuming the weaksignal limit.
Example: component separation for two baselines
As an explicit example, we now write down the maximumlikelihood estimators for the Stokes’ parameters \(\mathcal{S} = [I, V]^T\) for a detector network consisting of two baselines \(\alpha \) and \(\beta \). For this case, the detector network response matrix M is a square \(2 N_f\times 2 N_f\) matrix, which we assume has nonzero determinant. Then it follows simply from the definitions (8.52) of F and X that
for which
The marginalized uncertainties in these estimates are obtained by taking the diagonal elements of the inverse of the Fisher matrix:
where \(N_\alpha \), \(N_\beta \), defined by \(N_\alpha (f)\equiv P_{n_{\alpha _1}}(f) P_{n_{\alpha _2}}(f)\) (and similarly for \(N_\beta \)), is a diagonal element of the noise covariance matrix \(\mathcal{N}\) (8.49).
Effective overlap functions for I and V for multiple baselines
The above expressions for the uncertainties in the estimates of I and V can easily be extended to the case of an arbitrary number of baselines \(\alpha = 1,2,\ldots , N_b\). For multiple baselines with noise spectra \(N_\alpha (f)\equiv P_{n_{\alpha _1}}(f) P_{n_{\alpha _2}}(f)\), one can show that
Let us assume that the determinant of the \(2\times 2\) matrices for each frequency (which we will denote by \(\bar{F}\)) are not equal to zero. Then
Following Seto and Taruya (2008), we can now define effective overlap functions for I and V associated with a multibaseline detector network by basically inverting the above uncertainties. For simplicity, we will assume that the noise power spectra for the detectors are equal to one another so that \(N_\alpha \equiv N\) can be factored out of the above expressions. We then define
These quantities give us an indication of how sensitive the multibaseline network is to extracting the I and V components of the background. Plots of \(\Gamma _\mathrm{eff}^{(I)}(f)\) and \(\Gamma _\mathrm{eff}^{(V)}(f)\) are shown in Fig. 62 for the multibaseline network formed from the LIGO Hanford, LIGO Livingston, and Virgo detectors. Recall that the overlap functions for the individual detectors pairs are shown in Fig. 61. Dips in sensitivity correspond to frequencies where the determinant of \(\bar{F}\) is zero (or close to zero).
NonGR polarization modes: preliminaries
In a general metric theory of gravity, there are six possible polarization modes: The standard \(+\) and \(\times \) tensor modes predicted by general relativity (GR); two vector (or “shear”) modes, which we will denote by X and Y; and two scalar modes: a “breathing” mode B and a pure longitudinal mode L (see, e.g., Nishizawa et al. 2009). The tensor and breathing modes are transverse to the direction of propagation, while the two vector modes and the scalar longitudinal mode have longitudinal components (parallel to the direction of propagation). See Fig. 63.
In terms of the orthonormal vectors \(\{\hat{n},\hat{l},\hat{m}\}\) defined by (2.4), the polarization basis tensors for the six different polarization modes are:
We will denote these collectively as \(e^A_{ab}(\hat{n})\), where \(A=\{+,\times ,X,Y,B,L\}\). In a coordinate system where \(\hat{n}\) points along the zaxis, and \(\hat{l}\) and \(\hat{m}\) point along the x and y axes, the polarization tensors can be represented by the following \(3\times 3\) matrices:
Transformation of the polarization tensors under a rotation about \(\hat{n}\)
We have already seen (Appendix A) that under a rotation of the unit vectors \(\{\hat{l}, \hat{m}\}\) by an angle \(\psi \) around \(\hat{n}\), the polarization tensors \(e^+_{ab}(\hat{n})\), \(e^\times _{ab}(\hat{n})\) transform to:
This reflects the spin 2 nature of the tensor modes \(+\), \(\times \) in general relativity. Similarly, under the same rotation, the polarization tensors \(e^X_{ab}(\hat{n})\), \(e^Y_{ab}(\hat{n})\) transform to:
while \(e^B_{ab}(\hat{n})\), \(e^L_{ab}(\hat{n})\) are left unchanged:
These last two transformations correspond to the spin 1 nature of the vector modes X, Y, and the spin 0 nature of the scalar modes B, L.
Polarization and spherical harmonic basis expansions
For the tensor modes \(+\), \(\times \), we found (Sect. 2.2.2) that it was convenient to expand the Fourier components \(h_{ab}(f,\hat{k})\) of the metric perturbations \(h_{ab}(t,{\vec {x}})\) in terms of either the polarization basis tensors:
or the gradient and curl tensor spherical harmonics:
Recall that \(Y^G\) and \(Y^C\) are related to spinweight \(\pm 2\) spherical harmonics as described in Appendices G and E. For the vector and scalar modes we can perform similar expansions, provided we use appropriately defined tensor spherical harmonics, which transform properly under rotations. For the vector modes X, Y, we need to use the vectorgradient and vectorcurl spherical harmonics \(Y^{V_G}\), \(Y^{V_C}\), which are defined in terms of spinweight \(\pm 1\) spherical harmonics (Appendices F and E). For the scalar modes, we can use
which are defined in terms of ordinary (scalar) spherical harmonics. In terms of these definitions, we can write the expansions in compact form
or
where \(A=\{+,\times , X, Y, B, L\}\) and \(P=\{G,C, V_G, V_C, B, L\}\) or some subsets thereof. Note that \(\sum _{(lm)}\) is shorthand for
for the tensor, vector, and scalar modes, respectively.
Detector response
The detector response functions corresponding to the above two expansions (8.68) and (8.69) are:
and
In terms of these response functions, the detector response (in the frequency domain) to a gravitationalwave background (2.1) is:
or
Searches for nonGR polarizations using different detectors
Evidence for nonGR polarization modes can show up in searches for either deterministic or stochastic gravitationalwave signals. Whether these alternative polarization modes are first discovered from the observation of gravitational waves from a resolvable source (like a binary black hole merger) or from a stochastic background depends in part on the type and number of detectors making the observations. For example, individual binary black holes mergers (GW150914 and GW151226) have already been observed by advanced LIGO. But it was not possible to extract information about the polarization of the waves, since the two LIGO interferometers are effectively coaligned (and hence see the same polarization). Adding Virgo, KAGRA, and LIGOIndia to the global network will eventually allow for the extraction of this polarization information. Pulsar timing arrays, on the other hand, are expected to first detect a stochastic background from the inspirals of SMBHBs in the centers of distant galaxies (Rosado et al. 2015). So if evidence of alternative polarization modes are discovered by pulsar timing, it will mostlikely first come from stochastic background observations.
In the following sections, we describe stochastic background search methods for nonGR polarization modes using both groundbased interferometers (Sect. 8.4) and pulsar timing arrays (Sect. 8.5). We will calculate antenna patterns, overlap functions, and discuss component separation for the tensor, vector, and scalar polarization modes. For groundbased interferometers, our discussion will be based on Nishizawa et al. (2009). For pulsar timing arrays, see Lee et al. (2008), Chamberlin and Siemens (2012) and Gair et al. (2015).
Searches for nonGR polarizations using groundbased detectors
We now describe crosscorrelation searches for nonGR polarization modes using a network of groundbased laser interferometers. For additional details, see Nishizawa et al. (2009).
Response functions
For groundbased interferometers in the small antenna limit, the strain response functions \(R^A(f,\hat{n})\) for the different polarization modes \(A=\{+,\times , X,Y, B, L\}\) are given by
where \(\hat{u}\), \(\hat{v}\) are unit vectors pointing in the direction of the arms of the interferometer, and where we have chosen the origin of coordinates to be at the vertex of the interferometer. Note that there is no frequency dependence of the response function in the smallantenna limit. Assuming a \(90^\circ \) opening angle between the interferometer arms, and choosing a coordinate system such that \(\hat{u}\) and \(\hat{v}\) point in the \(\hat{x}\) and \(\hat{y}\) direction, we find
where we used (2.4) for our definition of \(\{\hat{n},\hat{l},\hat{m}\}\).
From these expressions, we see that the response functions for the breathing and longitudinal modes differ only by a constant multiplicative factor of \(\sqrt{2}\). This degeneracy means that we will not be able to distinguish these two polarization modes using groundbased interferometers. Plots of the antenna patterns \(R^A(\hat{n})\) for the six different polarization modes are shown in Fig. 64. Note that the overall magnitude of the response gets smaller as one moves from tensor, to vector, to scalar polarization modes. In Fig. 65, we plot the “peanut” antenna patterns for the response to unpolarized gravitational waves for the tensor, vector, and scalar modes, respectively. By unpolarized we simply mean that the incident gravitational waves have equal power in the \(+\) and \(\times \) polarizations for the tensor modes; equal power in the X and Y polarizations for the vector modes, and equal power in the B and L polarizations for the scalar modes.
Overlap functions
Similar to what we did in Sect. 8.2.2, let us assume that the stochastic background is independently polarized, but is otherwise Gaussianstationary and isotropic. This means that the quadratic expectation values of the Fourier components of the metric perturbations can be written as
where \(A=\{+,\times ,X,Y,B,L\}\). The functions \(S_h^A(f)\) are such that
are the onesided strain spectral densities for the tensor, vector, and scalar modes individually. For simplicity, we will also assume that the tensor, vector, and scalar modes are individually unpolarized so that \(S_h^+(f)=S_h^\times (f)\), \(S_h^X(f)=S_h^Y(f)\), etc. All of these assumptions together define the stochastic signal model for this example.
The above expectation values (8.77) can now be used to calculate the expected value of the correlated response of two detectors to such a background. Writing the response of detector I as
it follows (as we have done many times before) that
where
are the corresponding overlap functions for the tensor, vector, and scalar modes \(\{T,V,S\}\). Note that \(\Gamma ^{(T)}_{IJ}(f)\) is identical to the ordinary overlap function \(\Gamma _{IJ}(f)\) for an isotropic background (5.36).
Figure 66 show plots of the tensor, vector, and scalar overlap functions for the three different LIGOVirgo detector pairs. The overlap functions have been normalized so that they equal 1 for colocated and coaligned detectors. This requires multiplying \(\Gamma _{IJ}(f)\) by a factor of 5 for the tensor and vector overlap functions (5.40), but by a factor of 10 for the scalar overlap functions.
Component separation: ML estimates of \(S_h^{(T)}\), \(S_h^{(V)}\), and \(S_h^{(S)}\)
Proceeding along the same lines as in Sect. 8.2.3, we now describe a method for separating the tensor, vector, and scalar contributions to the total strain spectral density. As shown in Nishizawa et al. (2009), we will need to analyze data from at least three independent baselines (so at least three detectors) to separate the tensor, vector, and scalar contributions at each frequency f. As before, we will adopt the notation \(\alpha =1,2,\ldots , N_b\) to denote the individual baselines (detector pairs) and \(\alpha _1\), \(\alpha _2\) to denote the two detectors that constitute that baseline.
Our starting point is again the crosscorrelated data from pairs of detectors in the network:
where
Assuming that the noise in the individual detectors are uncorrelated with one another, it follows that
In addition,
where \(P_{n_{\alpha _I}}(f)\) are the onesided power spectral densities of the noise in the detectors, and where we have assumed again that the gravitationalwave signal is weak compared to the detector noise. As we did in Sect. 8.2.3 we can write down a likelihood function for the crosscorrelated data given the signal model (8.84):
Here we have adopted the matrix notation:
Since \(\mathcal{A}\) enters quadratically in the exponential, we have the usual expression for the maximumlikehood estimators:
where
with M and \(\mathcal{N}\) given above, and with the standard proviso about possibly having to use singularvalue decomposition to invert F. The uncertainty in the maximumlikelihood recovered values is given by the covariance matrix
which we will use below to define effective overlap functions for the tensor, vector, and scalar modes for a multibaseline network of detectors.
Effective overlap functions for multiple baselines
For a multibaseline network of detectors, one has
where \(N_\alpha (f)\equiv P_{n_{\alpha _1}}(f) P_{n_{\alpha _2}}(f)\). Let us assume that the determinant of the \(3\times 3\) matrices for each frequency (which we will denote by \(\bar{F}\)) are not equal to zero. Then the uncertainties in the estimators of \(S_h^{(T)}\), \(S_h^{(V)}\), and \(S_h^{(S)}\) can be written as
Following Nishizawa et al. (2009), we can now define the effective overlap functions for the tensor, vector, and scalar modes, associated with a multibaseline detector network. As we did in Sect. 8.2.5, we will assume for simplicity that the noise power spectra for the detectors are equal to one another so that \(N_\alpha \equiv N\) can be factored out of the above expressions. We then define
Plots of \(\Gamma _\mathrm{eff}^{(T)}(f)\), \(\Gamma _\mathrm{eff}^{(V)}(f)\), and \(\Gamma _\mathrm{eff}^{(S)}(f)\) are shown in Fig. 67 for the multibaseline network formed from the LIGO Hanford, LIGO Livingston, and Virgo detectors. Dips in sensitivity correspond to frequencies where the determinant of \(\bar{F}\) is zero (or close to zero).
Searches for nonGR polarizations using pulsar timing arrays
As discussed in Sect. 8.3.4 it is also possible to search for nonGR polarizations using a pulsar timing array. Although the general concepts are the same as those for groundbased interferometers, there are some important differences, as the vector and scalar longitudinal polarization modes require keeping the pulsar term in the response functions to avoid possible singularities. We shall see below that the sensitivity to the vector and scalar longitudinal modes increases dramatically when crosscorrelating data from pairs of pulsars with small angular separations. For additional details, see Lee et al. (2008), Chamberlin and Siemens (2012) and Gair et al. (2015).
Polarization basis response functions
For pulsar timing, the response functions for Doppler frequency measurements for the different polarization modes \(A=\{+,\times , X,Y,B,L\}\) are given by
where \(\hat{p}\) points in the direction to the pulsar and L is its distance from Earth (see Sect. 5.2.1 with \(\hat{p}=\hat{u}\)). Without loss of generality, we have assumed that the location of the measurement is at the origin of coordinates. Note that we have kept the pulsar term (the second term in the square brackets) since, as we shall see below, it is needed to get finite expressions for the response and overlap functions for the vector and scalar longitudinal modes.
Choosing our coordinate system so that \(\hat{z}\) points along \(\hat{p}\), we find:
where we used (2.4) for our definitions of \(\{\hat{n},\hat{l},\hat{m}\}\). Note that the response functions for the breathing mode B and the tensor \(+\) mode have the same form for our particular choice of \(\{\hat{l},\hat{m}\}\). This is not a problem, however, as we can still distinguish these modes due to their different behavior under rotations. The difference between the breathing and tensor modes becomes more apparent in terms of the spherical harmonic basis response functions \(R^B_{(lm)}(f)\) and \(R^G_{(lm)}(f)\), which are given in (8.98).
If we did not include the pulsar terms in the above expressions, then the response functions for both the vector and scalar longitudinal modes would become singular at \(\theta = 0\) (i.e., \(\cos \theta =1\)).^{Footnote 24} The factor of \(\sin \theta \) in the numerator for \(R^X(f,\hat{n})\) “softens” the \((1\cos \theta )^{1}\) singularity to \((1\cos \theta )^{1/2}\), so that it becomes integrable when calculating the vector longitudinal overlap functions (Lee et al. 2008; Chamberlin and Siemens 2012; Gair et al. 2015). (We will discuss this in more detail in Sect. 8.5.3). By keeping the pulsar term we remove these singularities as can be seen by expanding the full expressions in (8.95) for \(\theta \ll 1\):
where \(y\equiv 2\pi fL/c\), and we have assumed that \(y\theta ^2\) is also sufficiently small that we could Taylor expand the exponential. Since the typical distance to a pulsar is a few kiloparsecs and \(f= 3 \times 10^{9}~\mathrm{Hz}\) for 10 yr of observation, we have \(y~\sim 10^4\), which means \(\theta \lesssim 10^{2}\) for the above expansions to be valid. Thus, for small angular separations between the direction to the pulsar and the direction to the gravitational wave, the response to the scalarlongitudinal modes will be more than an orderofmagnitude larger than that for the vector modes, and several ordersofmagnitude larger than that for both the tensor and breathing modes. This increased sensitivity of the scalar longitudinal and vector longitudinal modes will also become apparent when we calculate the overlap functions for a pair of pulsars (see Sect. 8.5.3; Fig. 68).
Spherical harmonic basis response functions
It is also interesting to calculate the Dopplerfrequency response functions for the tensor spherical harmonic components \(P=\{G, C, V_G, V_C, B, L\}\). The general expression is given by:
As shown in Gair et al. (2015), the above integral can be evaluated and then simplified by taking the limit \(y \gg 1\), which as we mentioned above is valid for typical pulsars. The final results (taken from that paper) are:
where \({}^{(1)}\!N_l\) and \({}^{(2)}\!N_l\) are constants defined by (F.3) and (G.2), and
There are several important features to highlight about these expressions: (i) All of the response functions depend in the same way on the angular position of the pulsar, which is simply \(Y_{lm}(\hat{p})\). (ii) Just as we saw earlier (5.23) that the response to the tensor curl mode is zero, so too is the response to the vector curl mode. Thus, pulsar timing arrays are also insensitive to the curl component of the vectorlongitudinal modes. (iii) In the limit \(y\gg 1\), only the response to the scalarlongitudinal mode has frequency dependence (via y). (iv) The response to the breathing mode has nonzero contributions only from \(l=0\) and \(l=1\). In terms of power (which is effectively the square of the response), this means that pulsar timing observations will be insensitive to anisotropies in power in the breathing mode beyond quadrupole (i.e., \(l=2\)).
Overlap functions
To calculate the overlap functions for nonGR polarization modes for pulsar timing arrays, we will proceed as we did in Sect. 8.4.2, assuming that the stochastic background is independently polarized, but is otherwise Gaussianstationary and isotropic. (Extensions to anisotropic backgrounds will be briefly mentioned in Sect. 8.5.4. Details can be found in Gair et al. 2015). Making these assumptions, the quadratic expectation values of the Fourier coefficients \(h_A(f,\hat{n})\) take the form
where \(S^A_h(f)\) are the onesided strain spectral densities for the individual polarization modes. The overlap functions can then be calculated in the usual way, leading to
where
Note the factor of \(1/4\pi \) as compared to \(1/8\pi \) in (8.81), and that there is no summation over A on the righthand side of this expression.
For simplicity we will also assume as before that the tensor modes \(\{+,\times \}\) and the vectorlongitudinal modes \(\{X, Y\}\) are unpolarized, so that
Then we can define:
for the unpolarized tensor and vectorlongitudinal components. But we will keep the breathing and scalarlongitudinal overlap functions separate:
given the complications that arise when trying to explicitly calculate \(\Gamma ^L_{IJ}(f)\).
As noted in Sect. 5.4.3, the overlap function for the tensor modes can be calculated analytically (Hellings and Downs 1983), without needing to include the pulsar term in the response functions:
where \(\zeta _{IJ}\) is the angle between two Earthpulsar baselines, i.e., \(\cos \zeta _{IJ} = \hat{p}_I\cdot \hat{p}_J\). The above expression differs from (5.53) by an overall normalization. The overlap functions for the breathing mode and for the vector longitudinal modes can be also be calculated analytically, again without needing to include the pulsar term in the response. For the breathing mode we have
For the vectorlongitudinal modes we have (Lee et al. 2008; Gair et al. 2015)
where we have assumed here that the angular separation \(\zeta _{IJ}\) is not too small. In the limit \(\zeta _{IJ}\rightarrow 0\), the above expression for \(\Gamma ^{(V)}_{IJ}\) diverges, which means that we need to include the pulsar terms in the response functions to handle that case. Doing so results in an expression that is finite, but depends on the frequency f via the distances to the pulsars, \(2\pi fL_I/c\) and \(2\pi fL_J/c\). (See Appendix J of Gair et al. (2015) for an analytic expression for \(\Gamma ^{(V)}_{IJ}(f)\) in the limit \(\zeta _{IJ}\rightarrow 0\).)
Finally, for the scalar longitudinal overlap function \(\Gamma ^L_{IJ}(f)\), there is no known analytic expression for the integral in (8.105), except in the limit of codirectional (\(\zeta _{IJ}=0\)) and antidirectional (\(\zeta _{IJ}=\pi \)) pulsars (Lee et al. 2008; Chamberlin and Siemens 2012; Gair et al. 2015). The pulsar terms need to be included in the scalarlongitudinal response functions for all cases to obtain a finite result, which again depend on the frequency f via the distances to the pulsars. A semianalytic expression for \(\Gamma ^L_{IJ}(f)\) is derived in Gair et al. (2015), which is valid in the \(2\pi fL/c \gg 1\) limit. The semianalytic expression effectively replaces the double integral over directions on the sky \(\hat{n}=(\theta ,\phi )\) with just a single numerical integration over \(\theta \). See Gair et al. (2015) for additional details regarding that calculation.
Plots of the normalized overlap functions for the tensor, vectorlongitudinal, breathing and scalarlongitudinal modes are shown in Fig. 68, plotted as functions of the angular separation \(\zeta \) between pairs of pulsars. The normalization is the same for each overlap function, chosen so that the tensor overlap function agrees with the normalized Hellings and Downs curve (5.53). The plots for the tensor, vectorlongitudinal, and breathing modes are all real and do not depend on frequency; the plot for the scalarlongitudinal modes has both real and imaginary components (imaginary shown in red), and depends on frequency via the distances to the pulsars. For the scalarlongitudinal overlap function, we chose \(y_1=1000\) and \(y_2=2000\) for all pulsar pairs, where \(y\equiv 2\pi fL/c\), and we did the integration numerically over both \(\theta \) and \(\phi \). Note the different vertical scales for the vectorlongitudinal and scalarlongitudinal overlap functions, compared to those for the tensor and breathing modes. For small angular separations, the sensitivity to vectorlongitudinal modes is roughly an order of magnitude larger than that for the tensor and breathing modes, while the sensitivity to the scalarlongitudinal mode is several ordersofmagnitude larger. This is consistent with what we found for the response functions, as discussed at the end of Sect. 8.5.1.
Component separation and anisotropic backgrounds
As shown in Gair et al. (2015), the above calculations for nonGR polarization modes can be extended to anisotropic stochastic backgrounds. The spherical harmonic components of the overlap functions
can be calculated analytically for the tensor, vector, and breathing polarization modes for all values of l and m, while the components of the scalar longitudinal overlap function admit only semianalytic expressions. (This is similar to what we described in the previous section in the context of an isotropic background). Plots of the first few spherical harmonics components, as a function of the angular separation \(\zeta _{IJ}\) between a pair of pulsars, are given in Figures 1, 5, 2, and 3 of Gair et al. (2015).
The ability to separate the contributions to the background from the different polarization modes depends crucially on the form of the spherical harmonic basis response functions \(R^P_{(lm)}(f)\), where \(P=\{G,C, V_G, V_C, B,L\}\). These were defined in (8.97) and have the \(y\equiv 2\pi fL/c \gg 1\) limiting expressions given in (8.98). Recall that the (lm) indices here correspond to an expansion of the Fourier components of the metric perturbations in terms of tensor (spin 2), vector (spin 1), and scalar (spin 0) spherical harmonics:
for which
is the response of pulsar I to the background. The expansion coefficients \(a_{(lm)}^P(f)\) give the contributions of the different polarization modes to the background, and \(R^P_{I(lm)}(f)\) are the response functions for those particular coefficients. For an angular resolution of order \(180^\circ /l_\mathrm{max}\), the total number of modes that are (in principle) accessible to a pulsar timing array with a sufficient number of pulsars is
This expression uses the result that the response to the curl modes for both the tensor and vector components are identically zero, as is the response to the breathing modes for \(l\ge 2\). Since a pulsar timing array having \(N_p\) pulsars can measure at most \(2N_p\) real components of the background (as discussed in Sect. 7.5.4), we see that at least \(N_p = N_m\) pulsars are required to measure the \(N_m\) (complex) components.
But as noted in Sect. 8.5.2, all of the response functions \(R^P_{(lm)}(f)\) depend on the direction \(\hat{p}\) to the pulsar in exactly the same way, being proportional to \(Y_{lm}(\hat{p})\). This degeneracy complicates the extraction of the different polarization modes. For the tensor and breathing modes, the degeneracy is broken since pulsar timing arrays typically operate in a regime where \(y\gg 1\), for which the pulsar term can be ignored in the response functions for these modes. In that limit, a pulsar timing array is only sensitive to breathing modes with \(l=0, 1\), while the tensor modes are nonzero only for \(l\ge 2\). On the other hand, the scalarlongitudinal and vectorlongitudinal modes can only be distinguished from the tensor and breathing modes if there are multiple pulsars along the same line of sight, or if there is a known correlation between the expansion coefficients \(a^P_{(lm)}(f)\) at different frequencies, e.g., a powerlaw spectrum. For either of these two cases, we can exploit the frequency dependence of the pulsar term, which is more significant for the longitudinal modes of the background. Keeping all of the frequencydependent terms (Gair et al. 2015):
and
for the scalarlongitudinal and vectorlongitudinal response functions, where \(j_l(y)\) denotes a spherical Bessel functions of order l and \(y\equiv 2\pi fL/c\). If we take the \(y\gg 1\) limit of these equations, we recover the approximate expressions given in (8.98). But to separate the various components of the background, we need to use these more complicated expressions to break the angulardirection degeneracy.
A quantitative analysis of the sensitivity of a phasecoherent mapping search (Sect. 7.5) to the different components \(a^P_{(lm)}(f)\) of a stochastic background is given in Gair et al. (2015). The results of that analysis are summarized in Table 7, which is taken from that paper. The entries in the table show how the uncertainties in our measurements change as we search for: (i) only the tensor modes, (ii) both tensor and breathing modes, (iii) tensor, breathing, and scalarlongitudinal modes, and (iv) all possible modes. The uncertainties were obtained by taking the square root of the diagonal elements of the inverse of the Fisher matrix, following the general prescription described in Sect. 7.5.1. For this calculation, 30 pulsars were distributed randomly on the sky, with distances chosen at random, uniformly between 1 and 10 kpc. There was only a single frequency component, \(f_0=3\times 10^{9}~\mathrm{Hz}\), and the measurement uncertainty (associated with pulse time of arrivals) was assumed to be the same for all the pulsars in the array. The background was also assumed to contain modes with equal intrinsic amplitudes up to \(l_\mathrm{max}=2\), so that the total number of modes \(N_m=26\) was less than the number of pulsars in the array. This gave a fullydetermined system of equations that needed to be solved.
The entries in the table reflect our expectations for recovering the different modes of the background. Namely, there is little change in our ability to recover the tensor modes when the breathing modes are also included in the analysis. This is because the tensor modes are nonzero only for \(l\ge 2\), while the response to the breathing modes is nonzero only for \(l=0,1\). Adding the scalarlongitudinal modes to the analysis worsens the recovery of the tensor and breathing modes by about an order of magnitude, as the scalarlongitudinal modes can also have nonzero values for all values of l. (There are simply more parameters to recover). But one is still able to break the degeneracy as the response to the scalarlongitudinal modes depends strongly on the distances to the pulsars. The uncertainity in the recovery of the scalarlongitudinal modes is about an order of magnitude less than that for the tensor and breathing modes, since the analysis assumes equal intrinsic amplitudes for all the modes, while the correlated response to the scalarlongitudinal modes is much larger for small angular separations between the pulsars (Sect. 8.5.3; Fig. 68). Finally, adding the vectorlongitudinal modes to the analysis weakens the recovery of the scalarlongitudinal modes by about an order of magnitude, again because more parameters need to be recovered. However, it severely worsens the recovery of all the other modes, because of the degeneracy in the response on the angular direction to the pulsars. There is some dependence on frequency for the vectorlongitudinal response, as indicated in (8.114), but it is much weaker than the frequency dependence of the scalarlongitudinal modes. So the degeneracy is not broken nearly as strongly for these modes. See Gair et al. (2015) for more details.
Other searches
It is also possible to use the general crosscorrelation techniques described in Sect. 4 to search for signals that don’t really constitute a stochastic gravitationalwave background. Using a stochasticbased crosscorrelation method to search for such signals is not optimal, but it still gives valid results for detection statistics or estimators of signal parameters, with error bars that properly reflect the uncertainty in these quantities. It is just that these error bars are larger than those for an optimal (minimum variance) search, which is better “tuned” for the signal. Below we briefly describe how the general crosscorrelation method can be used to search for (i) longduration unmodelled transients and (ii) persistent (or continuous) gravitational waves from targeted sources.
Searches for longduration unmodelled transients
The Stochastic Transient Analysis Multidetector Pipeline (Thrane et al. 2011) (STAMP for short) is a crosscorrelation search for unmodelled longduration transient signals (“bursts”) that last on order a few seconds to several hours or longer. The duration of these transients are long compared to the typical merger signal from inspiralling binaries (tens of milliseconds to a few seconds), but short compared to the persistent quasimonochromatic signals that one expects from e.g., rotating (nonaxisymmetric) neutron stars. STAMP was developed in the context of groundbased interferometers, but the general method, which we briefly describe below, is also valid for other types of gravitationalwave detectors.
STAMP is effectively an adapted gravitationalwave radiometer search (Sect. 7.3.6), which crosscorrelates data from pairs of detectors (7.6), weighted by the inverse of the integrand of the overlap function \(\gamma _{IJ}(t;f,\hat{n})\) for a particular direction \(\hat{n}\) on the sky:
where \(\tau \) is the duration of the segments defining the shortterm Fourier transforms. The weighting by the inverse of \(\gamma _{IJ}(t;f,\hat{n})\) is used so that the expected value of \(\tilde{s}_{IJ}(t;f,\hat{n})\) is just the gravitationalwave power in pixel (t; f) for a point source in direction \(\hat{n}\), which follows from (7.7). The data \(\tilde{s}_{IJ}(t;f,\hat{n})\) for a single direction \(\hat{n}\) define a timefrequency map. For a typical analysis using the LIGO Hanford and LIGO Livingston interferometer, a single map has a frequency range from about 50 to \({\sim }1000~\mathrm{Hz}\), and a time duration of a couple hundred seconds (or whatever the expected duration of the transient might be). A strong burst signal shows up as cluster or track of bright pixels in the timefrequency map, which stands out above the noise. The data analysis problem thus becomes a pattern recognition problem.
The procedure for deciding whether or not a signal is present in the data can be broken down into three steps: (i) determine if a statistically significant clump or track of bright pixels is present in a timefrequency map, which requires using some form of patternrecognition or clustering algorithm (see Thrane et al. 2011 and relevant references cited therein); (ii) calculate the value of the detection statistic \(\Lambda \), obtained from a weighted sum of the power in the pixels for each cluster determined by the previous step; (iii) compare the observed value of the detection statistic to a threshold value \(\Lambda _*\), which depends on the desired false alarm rate. This threshold is typically calculated by timeshifting the data to empirically determine the sampling distribution of \(\Lambda \) in the absence of a signal. If \(\Lambda _\mathrm{obs}>\Lambda _*\), then reject the null hypothesis and claim detection as discussed in 3.2.1. (Actually, in practice, this last step is a bit more complicated, as one typically does followup investigations using auxiliary instrumental and environmental channels, and data quality indicators. This provides additional confidence that the gravitationalwave candidate is not some spurious instrumental or environmental artefact.)
Figure 69 is an example of a timefrequency map with a simulated longduration gravitationalwave signal injected into simulated initial LIGO detector noise. This particular signal is an accretion disk instability waveform, based on a model by van Putten (2001); van Putten and Levinson (2003); van Putten (2008). The signal is a (inverse) “chirp” in gravitational radiation having an exponentially decaying frequency. (The magnitude of the signal increases with time as the frequency decreases.) The injected signal is strong enough to be seen by eye in the raw timefrequency map (left panel). After applying a clustering algorithm, the fluctuations in the detector noise have been noticeably reduced (right panel).
Readers should see Thrane et al. (2011) for many more details regarding STAMP, and Abbott et al. (2016c) and Aasi et al. (2013) for results from analyses of LIGO data taken during their 5th and 6th science runs—the first paper describes an allsky search for longduration gravitationalwave transients; the second, a triggeredsearch for longduration gravitationaltransients coincident with long duration gammaray bursts.
Searches for targetedsources of continuous gravitational waves
The gravitationalwave radiometer method (Sect. 7.3.6) can also be used to look for gravitational waves from persistent (continuous) sources at known locations on the sky, e.g., the galactic center, the location of SN 1987A, or from lowmass Xray binaries like Sco X1 (Abadie et al. 2011; Messenger et al. 2015; Abbott et al. 2016a). For example, Sco X1 is expected to emit gravitational waves from the (suspected) rotating neutron star at its core, having nonaxisymmetric distortions produced by the accretion of matter from the lowmass companion. The parameters of this system that determine the phase evolution of the gravitational radiation are not wellconstrained: (i) Since the neutron star at the core has not been observed to emit pulsations in the radio or any electromagnetic band, the orbital parameters of the binary are estimated instead from optical observations of the lowmass companion (Steeghs and Casares 2002; Galloway et al. 2014). These observations do not constrain the orbital parameters as tightly as being able to directly monitor the spin frequency of the neutron star. (ii) The intrinsic spin evolution of the neutron star also has large uncertainties due to the high rate of accretion from the lowmass companion star. Both of these features translate into a large parameter space volume over which to search, making fullycoherent matchedfilter searches for the gravitationalwave signal computationally challenging (Messenger et al. 2015).
Nonetheless, for such sources, one can perform a narrowband, targeted radiometer search, crosscorrelating data from a pair of detectors with a filter function proportional to the integrand \(\gamma _{IJ}(t;f,\hat{n}_0)\) of the overlap function evaluated at the direction \(\hat{n}_0\) to the source on the sky:
where
The search is narrowband in the sense that one doesn’t integrate over the whole frequency band of the detectors, but looks instead for evidence of a gravitational wave in narrow frequency bins that span the sensitive band of the detector. The weighted crosscorrelations are summed over time, to build up signaltonoise ratio, since the source is assumed to be persistent. The frequentist detection statistic is the squared signaltonoise ratio of the crosscorrelated power contained in each narrow frequency band: