The 20 Papers That Define the Field
These studies form the structural backbone of the review. Each scored a perfect 1.00 in our relevance pipeline. They are not merely highly cited -- they are the papers that established the concepts, classifications, and paradigms that every other study in this collection builds upon or responds to.
If the final manuscript cites only 20 references, these are the 20.
Genomic Foundations
These five studies established the molecular taxonomy of lung cancer that the entire field now operates within.
| Citation | Journal | Why it matters | |
|---|---|---|---|
| 1 | TCGA Network, 2014 -- Comprehensive molecular profiling of lung adenocarcinoma | Nature | The definitive LUAD molecular atlas. Integrated mRNA, miRNA, methylation, protein, and somatic mutation data from 230 resected tumors. Established EGFR, KRAS, and ALK as the principal actionable drivers. 5,053 citations. Every subsequent LUAD multi-omics study references this work. |
| 2 | TCGA Network, 2012 -- Comprehensive genomic characterization of squamous cell lung cancers | Nature | Proved that LUSC is a fundamentally different molecular disease from LUAD, with a distinct spectrum of TP53 mutations, FGFR1/SOX2 amplifications, and CDKN2A inactivation. 3,734 citations. Ended the era of treating all NSCLC as one disease. |
| 3 | George et al., 2015 -- Comprehensive genomic profiles of small cell lung cancer | Nature | First large-scale SCLC genomics. Revealed near-universal TP53/RB1 biallelic inactivation and recurrent alterations in NOTCH, MYC family, and chromatin remodelers. 1,990 citations. Set the foundation for the molecular subtyping that followed. |
| 4 | Rudin et al., 2019 -- Molecular subtypes of small cell lung cancer: a synthesis | Nat Rev Cancer | Proposed the ASCL1/NEUROD1/POU2F3/YAP1 transcription factor taxonomy for SCLC that has become the standard classification. Each subtype has distinct biology and therapeutic vulnerabilities. Changed how SCLC clinical trials are designed. |
| 5 | Skoulidis & Heymach, 2019 -- Co-occurring genomic alterations in NSCLC | Nat Rev Cancer | Demonstrated that co-mutations -- especially KRAS/STK11 and KRAS/KEAP1 -- define biologically and clinically distinct NSCLC subgroups. Showed that co-alteration context determines immunotherapy sensitivity. |
The Proteogenomics Revolution
Three Cell papers published on the same day in July 2020, plus a 2024 follow-up, established that proteins tell a fundamentally different story than genes.
| Citation | Journal | Why it matters | |
|---|---|---|---|
| 6 | Gillette et al., 2020 -- CPTAC LUAD proteogenomic characterization | Cell | The CPTAC flagship. Integrated genomic, proteomic, and phosphoproteomic profiling of LUAD. Identified druggable kinase activities completely invisible to DNA sequencing. 621 citations. Proved that multi-omics is not a luxury -- it reveals actionable biology that genomics alone misses. |
| 7 | Chen et al., 2020 -- East Asia non-smoking LUAD proteogenomics | Cell | Mapped the proteogenomic landscape of the population with the highest never-smoker LUAD burden globally. Revealed unique EGFR-pathway rewiring patterns and population-specific therapeutic opportunities. |
| 8 | Xu et al., 2020 -- Integrative proteomic characterization of human LUAD | Cell | Delineated proteomic subtypes with distinct immune infiltration patterns and metabolic features. 471 citations. Showed that proteomic subtypes crosscut but do not perfectly align with genomic subtypes -- the layers tell complementary stories. |
| 9 | Liu et al., 2024 -- SCLC proteogenomic characterization | Cell | Extended the proteogenomic paradigm to SCLC. Identified subtype-specific phosphoproteomic signatures and candidate drug targets that genomic profiling had missed. The most recent anchor in this category. |
| 10 | Baine et al., 2020 -- SCLC subtypes by ASCL1, NEUROD1, POU2F3, YAP1 | JTO | Clinical validation that the Rudin transcription factor taxonomy can be implemented via immunohistochemistry on routine clinical specimens. Bridged SCLC molecular biology to pathology practice. |
AI/ML and Computational Oncology
The papers that proved machines can read biology from tissue and data.
| Citation | Journal | Why it matters | |
|---|---|---|---|
| 11 | Coudray et al., 2018 -- NSCLC classification and mutation prediction from histopathology via deep learning | Nat Med | The study that launched computational pathology in lung cancer. Demonstrated that a CNN could classify LUAD vs. LUSC at pathologist-level accuracy AND predict STK11, EGFR, and TP53 mutations directly from H&E-stained tissue. 2,407 citations. Established the paradigm that histopathology images encode molecular information decodable by computation. |
| 12 | Chen et al., 2022 -- Pan-cancer integrative histology-genomic analysis via multimodal deep learning | Cancer Cell | Extended histopathology AI to multi-modal integration, fusing histology images with genomic data. Showed that multi-modal deep learning outperforms either modality alone for survival prediction across cancer types. 500 citations. |
| 13 | Argelaguet et al., 2020 -- MOFA+: multi-omics factor analysis framework | Genome Biol | The standard tool for Bayesian multi-omics integration. Handles missing data, batch effects, and single-cell resolution. 736 citations. Widely adopted in both pan-cancer and lung cancer multi-omics studies. |
Population-Specific Biology
The studies that proved lung cancer biology differs by sex, smoking status, ancestry, and environment.
| Citation | Journal | Why it matters | |
|---|---|---|---|
| 14 | Zhang et al., 2021 -- Genomic and evolutionary classification of lung cancer in never-smokers | Nat Genet | Defined the piano/mezzo/forte molecular subtypes of never-smoker LUAD. Proved that never-smoker lung cancer is not one disease but a family of genomically defined entities with distinct evolutionary trajectories and clinical behaviors. The single most important paper for the review's thesis. |
| 15 | Martinez-Ruiz et al., 2023 -- Genomic-transcriptomic evolution in lung cancer and metastasis | Nature | Paradigm-shifting evidence that PM2.5 promotes EGFR-driven LUAD in never-smokers through inflammatory IL-1-beta signaling that expands pre-existing EGFR-mutant clones -- rather than causing new mutations. Rewrites the environmental carcinogenesis model. |
| 16 | Conforti et al., 2018 -- Sex and immunotherapy efficacy meta-analysis | Lancet Oncol | The study that ignited the sex-stratified immunotherapy debate. Meta-analysis across tumor types showing greater anti-PD-1/PD-L1 benefit in men than women. 747 citations. Subsequent studies have been inconsistent, making this a live controversy. |
| 17 | Chen et al., 2020 -- East Asian LUAD genomic landscape | Nat Genet | Mapped ancestry-specific genomic architecture of lung adenocarcinoma. Demonstrated that the driver mutation spectrum, mutational signatures, and clonal evolution patterns differ between East Asian and European-ancestry LUAD. |
Immuno-Oncology
The papers that built the biomarker framework for checkpoint immunotherapy.
| Citation | Journal | Why it matters | |
|---|---|---|---|
| 18 | Rizvi et al., 2015 -- Mutational landscape determines sensitivity to PD-1 blockade in NSCLC | Science | Established tumor mutational burden (TMB) as a genomic biomarker for immunotherapy response. The paper that linked neoantigen load to PD-1 sensitivity. 6,521 citations -- the most cited paper in this entire collection. |
| 19 | Thorsson et al., 2018 -- The Immune Landscape of Cancer | Immunity | Defined six immune subtypes across 33 cancer types based on immune cell composition, cytokine profiles, and survival associations. 4,561 citations. Provided the pan-cancer taxonomy that contextualizes every lung cancer immunoprofiling study. |
| 20 | Bagaev et al., 2021 -- Conserved pan-cancer microenvironment subtypes predict response to immunotherapy | Cancer Cell | Identified four conserved TME archetypes with independent predictive value for immunotherapy response. Showed that microenvironment architecture carries information beyond tumor-intrinsic molecular features. 947 citations. |