publications
Publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- CMS-PASA new method for correcting the substructure of multi-prong jets using Lund jet plane reweighting in the CMS experimentCMS CollaborationCMS PAS JME-23-001, 2025
Many analyses at the CERN LHC employ techniques exploiting the substructure of large-radius jets. These techniques aim to identify large-radius jets originating from heavy resonances produced with high momenta that decay into multiple quarks or gluons. The large momentum of the resonance results in all N quarks or gluons from the decay being reconstructed into a single jet with an N-prong substructure. Because of shortcomings in the simulation of these jets, substructure observables are typically calibrated using data samples of large-radius jets originating from decays of boosted W bosons or top quarks. However, this approach cannot be readily applied to jets with four or more prongs because no similar proxies exist in the data. This note presents a new technique for correcting the substructure of simulated large-radius jets from multi-prong decays. The data correspond to an integrated luminosity of 138 fb^-1 collected by the CMS experiment between 2016–2018 at a center-of-mass energy of 13 TeV . The technique is based on reclustering the jet constituents into several subjets such that each subjet represents a single prong, and separately correcting the radiation pattern in the Lund jet plane of each subjet using a correction derived from data. The correction procedure improves the agreement between data and simulation in several different substructure observables of multi-prong jets. This technique establishes, for the first time, a robust calibration for the substructure of jets with four or more prongs, enabling their usage in future measurements and searches for new phenomena.
2024
- Sub. to Rept. Prog. Phys.Model-agnostic search for dijet resonances with anomalous jet substructure in proton-proton collisions at \sqrts = 13 TeVCMS Collaboration2024
This paper presents a model-agnostic search for narrow resonances in the dijet final state in the mass range 1.8-6 TeV. The signal is assumed to produce jets with substructure atypical of jets initiated by light quarks or gluons, with minimal additional assumptions. Search regions are obtained by utilizing multivariate machine-learning methods to select jets with anomalous substructure. A collection of complementary anomaly detection methods - based on unsupervised, weakly supervised, and semisupervised algorithms - are used in order to maximize the sensitivity to unknown new physics signatures. These algorithms are applied to data corresponding to an integrated luminosity of 138 fb−1, recorded by the CMS experiment at the LHC, at a center-of-mass energy of 13 TeV. No significant excesses above background expectations are seen. Exclusion limits are derived on the production cross section of benchmark signal models varying in resonance mass, jet mass, and jet substructure. Many of these signatures have not been previously sought, making several of the limits reported on the corresponding benchmark models the first ever. When compared to benchmark inclusive and substructure-based search strategies, the anomaly detection methods are found to significantly enhance the sensitivity to a variety of models.
- Sub. to ML Sci. Tech.Aspen Open Jets: Unlocking LHC Data for Foundation Models in Particle PhysicsOz Amram, and othersDec 2024
Foundation models are deep learning models pre-trained on large amounts of data which are capable of generalizing to multiple datasets and/or downstream tasks. This work demonstrates how data collected by the CMS experiment at the Large Hadron Collider can be useful in pre-training foundation models for HEP. Specifically, we introduce the AspenOpenJets dataset, consisting of approximately 180M high pT jets derived from CMS 2016 Open Data. We show how pre-training the OmniJet-α foundation model on AspenOpenJets improves performance on generative tasks with significant domain shift: generating boosted top and QCD jets from the simulated JetClass dataset. In addition to demonstrating the power of pre-training of a jet-based foundation model on actual proton-proton collision data, we provide the ML-ready derived AspenOpenJets dataset for further public use.
- Sub. to Rept. Prog. Phys.CaloChallenge 2022: A Community Challenge for Fast Calorimeter SimulationClaudius Krause, and othersOct 2024
We present the results of the "Fast Calorimeter Simulation Challenge 2022" - the CaloChallenge. We study state-of-the-art generative models on four calorimeter shower datasets of increasing dimensionality, ranging from a few hundred voxels to a few tens of thousand voxels. The 31 individual submissions span a wide range of current popular generative architectures, including Variational AutoEncoders (VAEs), Generative Adversarial Networks (GANs), Normalizing Flows, Diffusion models, and models based on Conditional Flow Matching. We compare all submissions in terms of quality of generated calorimeter showers, as well as shower generation time and model size. To assess the quality we use a broad range of different metrics including differences in 1-dimensional histograms of observables, KPD/FPD scores, AUCs of binary classifiers, and the log-posterior of a multiclass classifier. The results of the CaloChallenge provide the most complete and comprehensive survey of cutting-edge approaches to calorimeter fast simulation to date. In addition, our work provides a uniquely detailed perspective on the important problem of how to evaluate generative models. As such, the results presented here should be applicable for other domains that use generative AI and require fast and faithful generation of samples in a large phase space.
- CMS-PASSearch for t-channel scalar and vector leptoquark exchange in the high mass dimuon and dielectron spectrum in proton-proton collisions at \sqrts = 13 \mathrmTeVCMS CollaborationCMS PAS, Oct 2024
A search for t -channel exchange of leptoquarks (LQs) is performed using proton-proton collision data collected at √ s = 13 TeV with the CMS detector at the CERN LHC. The data correspond to an integrated luminosity of 138 fb-1 . The search spans scenarios with scalar and vector LQs that couple up and down quarks to electrons and muons. Dielectron and dimuon final states are considered, with dilepton invariant masses above 500 GeV. The differential distributions of dilepton events are fit to templates built from reweighted samples of simulated standard model events. This method is able to probe higher LQ masses than previous pair-production and single-production searches. Limits are set on LQ-fermion coupling strengths for LQ masses up to 5 TeV. Based on the results, scalar LQs are excluded for masses up to 5 TeV for a coupling strength of 1.2, and vector LQs are excluded for masses up to 5 TeV for a coupling strength of 1.5.
@article{CMS:LQ, author = {Collaboration, CMS}, collaboration = {CMS}, title = {{Search for $t$-channel scalar and vector leptoquark exchange in the high mass dimuon and dielectron spectrum in proton-proton collisions at $\sqrt{s} = 13~\mathrm{TeV}$}}, institution = {CERN}, reportnumber = {CMS-PAS-EXO-22-013}, journal = {CMS PAS}, year = {2024}, }
2023
- Phys. Rev. DDenoising diffusion models with geometry adaptation for high fidelity calorimeter simulationOz Amram, and Kevin PedroPhys. Rev. D, Oct 2023
Simulation is crucial for all aspects of collider data analysis, but the available computing budget in the High Luminosity LHC era will be severely constrained. Generative machine learning models may act as surrogates to replace physics-based full simulation of particle detectors, and diffusion models have recently emerged as the state of the art for other generative tasks. We introduce CaloDiffusion, a denoising diffusion model trained on the public CaloChallenge datasets to generate calorimeter showers. Our algorithm employs 3D cylindrical convolutions, which take advantage of symmetries of the underlying data representation. To handle irregular detector geometries, we augment the diffusion model with a new geometry latent mapping (GLaM) layer to learn forward and reverse transformations to a regular geometry that is suitable for cylindrical convolutions. The showers generated by our approach are nearly indistinguishable from the full simulation, as measured by several different metrics.
@article{Amram:2023onf, author = {Amram, Oz and Pedro, Kevin}, title = {{Denoising diffusion models with geometry adaptation for high fidelity calorimeter simulation}}, eprint = {2308.03876}, archiveprefix = {arXiv}, eid = {arxiv:2308.03876}, primaryclass = {physics.ins-det}, reportnumber = {FERMILAB-PUB-23-384-CSAID-PPD}, doi = {10.1103/PhysRevD.108.072014}, journal = {Phys. Rev. D}, volume = {108}, number = {7}, pages = {072014}, year = {2023}, }
2022
- JHEPMeasurement of the Drell-Yan forward-backward asymmetry at high dilepton masses in proton-proton collisions at \sqrts = 13 TeVCMS CollaborationJHEP, Oct 2022
A measurement of the forward-backward asymmetry of pairs of oppositely charged leptons (dimuons and dielectrons) produced by the Drell-Yan process in proton-proton collisions is presented. The data sample corresponds to an integrated luminosity of 138 fb−1 collected with the CMS detector at the LHC at a center-of-mass energy of 13 TeV. The asymmetry is measured as a function of lepton pair mass for masses larger than 170 GeV and compared with standard model predictions. An inclusive measurement across both channels and the full mass range yields an asymmetry of 0.612 ± 0.005 (stat) ± 0.007 (syst). As a test of lepton flavor universality, the difference between the dimuon and dielectron asymmetries is measured as well. No statistically significant deviations from standard model predictions are observed. The measurements are used to set limits on the presence of additional gauge bosons. For a Z’ boson in the sequential standard model the observed (expected) 95% confidence level lower limit on the Z’ mass is 4.4 (3.7) TeV.
@article{CMS:2022uul, author = {Collaboration, CMS}, collaboration = {CMS}, title = {{Measurement of the Drell-Yan forward-backward asymmetry at high dilepton masses in proton-proton collisions at $\sqrt{s}$ = 13 TeV}}, eprint = {2202.12327}, archiveprefix = {arXiv}, eid = {arxiv:2202.12327}, primaryclass = {hep-ex}, reportnumber = {CMS-SMP-21-002, CERN-EP-2022-013}, doi = {10.1007/JHEP08(2022)063}, journal = {JHEP}, volume = {2022}, number = {08}, pages = {063}, year = {2022}, }
2021
- Rept. Prog. Phys.The LHC Olympics 2020 a community challenge for anomaly detection in high energy physicsGregor Kasieczka, and othersRept. Prog. Phys., Oct 2021
A new paradigm for data-driven, model-agnostic new physics searches at colliders is emerging, and aims to leverage recent breakthroughs in anomaly detection and machine learning. In order to develop and benchmark new anomaly detection methods within this framework, it is essential to have standard datasets. To this end, we have created the LHC Olympics 2020, a community challenge accompanied by a set of simulated collider events. Participants in these Olympics have developed their methods using an R&D dataset and then tested them on black boxes: datasets with an unknown anomaly (or not). This paper will review the LHC Olympics 2020 challenge, including an overview of the competition, a description of methods deployed in the competition, lessons learned from the experience, and implications for data analyses with future datasets as well as future colliders.
@article{Kasieczka:2021xcg, author = {Kasieczka, Gregor and others}, title = {{The LHC Olympics 2020 a community challenge for anomaly detection in high energy physics}}, eprint = {2101.08320}, archiveprefix = {arXiv}, eid = {arxiv:2101.08320}, primaryclass = {hep-ph}, doi = {10.1088/1361-6633/ac36b9}, journal = {Rept. Prog. Phys.}, volume = {84}, number = {12}, pages = {124201}, year = {2021}, }
- JHEPTag N’ Train: a technique to train improved classifiers on unlabeled dataOz Amram, and Cristina Mantilla SuarezJHEP, Oct 2021
There has been substantial progress in applying machine learning techniques to classification problems in collider and jet physics. But as these techniques grow in sophistication, they are becoming more sensitive to subtle features of jets that may not be well modeled in simulation. Therefore, relying on simulations for training will lead to sub-optimal performance in data, but the lack of true class labels makes it difficult to train on real data. To address this challenge we introduce a new approach, called Tag N’ Train (TNT), that can be applied to unlabeled data that has two distinct sub-objects. The technique uses a weak classifier for one of the objects to tag signal-rich and background-rich samples. These samples are then used to train a stronger classifier for the other object. We demonstrate the power of this method by applying it to a dijet resonance search. By starting with autoencoders trained directly on data as the weak classifiers, we use TNT to train substantially improved classifiers. We show that Tag N’ Train can be a powerful tool in model-agnostic searches and discuss other potential applications.
@article{Amram:2020ykb, author = {Amram, Oz and Suarez, Cristina Mantilla}, title = {{Tag N\textquoteright{} Train: a technique to train improved classifiers on unlabeled data}}, eprint = {2002.12376}, archiveprefix = {arXiv}, primaryclass = {hep-ph}, doi = {10.1007/JHEP01(2021)153}, journal = {JHEP}, volume = {01}, pages = {153}, year = {2021}, }
- ApJMerger or Not: Accounting for Human Biases in Identifying Galactic Merger SignaturesErini Lambrides, and OthersThe Astrophysical Journal, Sep 2021
Significant galaxy mergers throughout cosmic time play a fundamental role in theories of galaxy evolution. The widespread usage of human classifiers to visually assess whether galaxies are in merging systems remains a fundamental component of many morphology studies. Studies that employ human classifiers usually construct a control sample, and rely on the assumption that the bias introduced by using humans will be evenly applied to all samples. In this work, we test this assumption and develop methods to correct for it. Using the standard binomial statistical methods employed in many morphology studies, we find that the merger fraction, error, and the significance of the difference between two samples are dependent on the intrinsic merger fraction of any given sample. We propose a method of quantifying merger biases of individual human classifiers and incorporate these biases into a full probabilistic model to determine the merger fraction and the probability of an individual galaxy being in a merger. Using 14 simulated human responses and accuracies, we are able to correctly label a galaxy as ”merger” or ”isolated” to within 1% of the truth. Using 14 real human responses on a set of realistic mock galaxy simulation snapshots our model is able to recover the pre-coalesced merger fraction to within 10%. Our method can not only increase the accuracy of studies probing the merger state of galaxies at cosmic noon, but also can be used to construct more accurate training sets in machine learning studies that use human classified data-sets.