The Scale of This Search

"Better to over-collect and triage than under-collect and redo."

We interrogated three major biomedical databases -- PubMed, Europe PMC, and Semantic Scholar -- with 213 structured queries spanning every dimension of the lung cancer multi-omics and AI/ML landscape.

The Funnel

                    T H E   F U N N E L

        +-----------------------------------------+
        |                                         |
        |         78,775  Raw API Hits             |
        |                                         |
        +-------------------+---------------------+
                            |
                     -11,896 duplicates
                            |
        +-------------------v---------------------+
        |                                         |
        |       66,879  Unique Papers              |
        |                                         |
        +------+----------+----------+------------+
               |          |          |
          +----v---+ +----v----+ +--v------+       +----------+
          | Tier A | | Tier B  | | Tier C  |       | Retracted|
          |   49   | |  2,861  | | 63,969  |       |   208    |
          | Must   | | Strong  | | Back-   |       | Excluded |
          | Cite   | | Support | | ground  |       |          |
          +--------+ +---------+ +---------+       +----------+

By the Numbers

	Metric	Value
	Total raw hits	78,775
	Unique after deduplication	66,879
	Non-retracted (master RIS)	66,671
A	Must-cite papers (score >= 0.7)	49
B	Supporting evidence (score 0.4--0.7)	2,861
C	Background reference (score < 0.4)	63,969
	Bridge papers spanning 3+ themes	1,928
	Retracted papers flagged and excluded	208
	Anchor PMIDs recovered	34 / 35

Search Architecture

Primary source: PubMed (NCBI E-utilities) -- up to 800 results per query, relevance-sorted
Supplementary: Europe PMC -- 3 broadest queries per theme, up to 200 results each
Enrichment: Semantic Scholar -- anchor PMIDs + ~30 top papers per theme for citation counts
Deduplication: PMID-based primary, DOI secondary, fuzzy title match (Levenshtein distance <= 3) tertiary
Scoring: Title/abstract keyword density + journal tier + citation count + recency + article type
Caching: All API responses cached for reproducible re-runs

Scale and Methodology

The Scale of This Search

The Funnel

By the Numbers

Search Architecture

results matching ""

No results matching ""