Connecting galaxy theory and observables using generative deep learning

SKA research at
Zurich University of Applied Sciences (ZHAW)

Centre for Artificial Intelligence (CAI)
Institute for Business Information Technology (IWI)
Aug 27, 2025
contact_qr.png Philipp Denzel, Yann Billeter, Frank-Peter Schilling, Elena Gavagnin

Generative AI "shortcomings"

Sora demo video

Central Question

  • How do we measure physical plausibility?
  • What are plausible galaxies?

Generative AI for galaxy (component) emulation

domain_translation_scheme.png

Outlook

How to build galaxies

You start with 1% of this:

see lecture by Van den Bosch

The Cosmic Microwave Background

cmb2D_1e0.png

Figure 1: 2006, Credit: ESA/Planck

CMB anisotropies

cmb2D_5e-4.png

Figure 2: 2006, Credit: ESA/Planck

Dark matter takes over

dark_web.png

Figure 3: KIPAC/Stanford; Credit: O. Hahn and T. Abel (2016)

Ingredients for galaxies

recipe_galaxy.jpg

Figure 4: recipe for galaxies as imagined by GPT5

Simulations as an expression of theory

  • complex, realistic models
  • self-consistent dynamics
  • physics: on a wide range of scales
  • implicit models:
    • what if we want to sample more
      of those galaxy models?

TNG300_compilation_with_radio_halos_2k.png

Figure 5: IllustrisTNG simulations

The cost of IllustrisTNG

Stated usage from Nelson et al (2017):

  • CPU core time: 55 Mh
  • on Hazel Hen (Cray XC40: typically 0.5kW per 24 core node)

So, approx. 2.29M node hours @ 0.5kW \(\rightarrow\) 1+ GWh (570'000 kg CO2e)

(A)I can do better

Our model suite ran on a mix of Nvidia V100/A100/H100/H200 GPUs

  • GAN-based models required 140.25 kWh for training (70 kg CO2e)
    • inference: ~1 kWh for inference
  • diffusion-based models required 520.25 kWh for training (260 kg CO2e)
    • inference: double the amount

Multi-domain galaxy image dataset

  • projected Illustris TNG50-1 galaxies
  • 7 domains: dark-matter, stars, gas,
    HI, temperature, magnetic field, 21cm
  • ∼ 2'000+ galaxies, 6 snapshots,
    5 rotations in 3D, ∼ 504'000 images
  • each galaxy \(\ge\) 10'000 particles
  • scale: 2 baryonic half-mass radii

domains_directions.png

Generative Deep Learning Models

  • conditional GANs (generative adversarial networks)
  • diffusion-based models
  • combination of both

conditional GANs

pix2pix_schema.png

DDPM

skais_diffusion_schema.png

Sampling from the models

(input, simulation, AI generated)

GasDM.inference_batch.0027.png

Figure 6: Gas ⟶ DM

(input, simulation, AI generated)

GasStar.inference_batch.0023.png

Figure 7: Gas ⟶ Stars

(input, simulation, AI generated)

GasHI.inference_batch.0023.png

Figure 8: Gas ⟶ HI

(input, simulation, AI generated)

Gas21cm.inference_batch.0023.png

Figure 9: Gas ⟶ mock 21cm brightness temperature

(input, simulation, AI generated)

GasTemp.inference_batch.0027.png

Figure 10: Gas ⟶ temperature

(input, simulation, AI generated)

GasBF.inference_batch.0029.png

Figure 11: Gas ⟶ magnetic field strength

Measuring plausibility?

  • Pixel-level CV metrics do not work well for this:
    • MSE (mean squared error): \[ \text{MSE}\left(x, \hat{x}\right) = \frac{1}{N} \sum_{i=1}^{N} \left(x_i - \hat{x}_i\right)^2 \]
    • PSNR (peak signal noise ratio): \[ \text{PSNR}\left(x, \hat{x}\right) = 10 \cdot \log_{10} \left( \frac{\text{c}^2}{\text{MSE}\left(x, \hat{x}\right)} \right) \]
    • SSIM (structural similarity index measure): \[ \text{SSIM}\left(x, \hat{x}\right) = \frac{\left(2\mu_x\mu_{\hat{x}} + k_1\right)\left(2\sigma_{x\hat{x}} + k_2\right)}{\left(\mu_x^2 + \mu_{\hat{x}}^2 + k_1\right)\left(\sigma_x^2 + \sigma_{\hat{x}}^2 + k_2\right)} \]

Perceptual metrics

  • Fréchet Inception Distance: \[ \|\mu_r - \mu_g\|^2 + \text{Tr}\left(\Sigma_r + \Sigma_g - 2(\Sigma_r \Sigma_g)^{1/2}\right) \]
    • where \(\mu\) and \(\Sigma\) are mean and standard deviation
      of features extracted from neural networks (InceptionV3)
  • or LPIPS (Learned Perceptual Image Patch Similarity)

Astronomical/astrophysical metrics

  • structural astronomical CAS parameters by Conselice (2003)
    • Concentration: Means of spatial distributions within fixed radii
    • Asymmetry: compare original and 180-degree-rotated image
    • Smoothness/Clumpiness: compare original and Gaussian-blurred image
  • Centre of mass drift
  • Radially averaged profiles
  • Integrated quantities
  • Power spectra

Asymmetry deviation (simulations vs AI-generated)

Gas21cm_asymmetry.png

Figure 12: Mean asymmetry deviation of the evaluation set (mock 21cm brightness temperature)

GasStar_asymmetry.png

Figure 13: Mean asymmetry deviation of the evaluation set (stellar mass)

Clumpiness deviation (simulations vs AI-generated)

GasDM_40_clumpiness.png

Figure 14: Mean clumpiness deviation of the evaluation set (Gas ⟶ DM)

DMGas_30_clumpiness.png

Figure 15: Mean clumpiness deviation of the evaluation set (DM ⟶ Gas)

Statistics of integrated quantities

HI

GasHI_map_total.png

DM

GasDM_map_total.png

Star

GasStar_map_total.png

21cm

Gas21cm_map_total.png

Temperature

GasTemp_map_total.png

B-Field

GasBF_map_total.png

Conclusion & Findings

  • Pixel-based metrics work to a degree
  • Perceptual metrics (such as FID) correlate much more with astrophysics
  • Updated and tuned GAN architecture matches performance of diffusion models

Contact

https://phdenzel.github.io/

talk_qr.png



Email: philipp.denzel@zhaw.ch

References

Created by phdenzel.