Skip to content
About
seqout searches public sequencing datasets from GEO, SRA, ENA, and ArrayExpress. It indexes over 1 million projects and 40 million samples with relevance-ranked search, consolidated experiment and sample tables, enriched annotations, similarity graphs, and download scripts.seqout is the web companion to pysradb, a Python package for querying next-generation sequencing metadata and data from NCBI Sequence Read Archive.
Features
Frequently Asked Questions
Where does seqout fetch its datasets from?
We maintain a local mirror of all publicly available datasets on NCBI's FTP servers. This includes all SRA datasets and GEO datasets. We also index ArrayExpress and ENA metadata from EBI. We do not own or modify the original data.
Does seqout download sequencing data?
No. seqout only indexes and serves metadata. It does not download or host raw sequencing files such as FASTQ or BAM. Project pages do provide bash scripts for downloading FASTQ/SRA files from NCBI, AWS S3, and Google Cloud Storage.
How is seqout different from browsing NCBI directly?
seqout combines GEO, SRA, ENA & ArrayExpress metadata into one interface with relevance-ranked search and consolidated tabular views. NCBI spreads this across multiple pages. seqout also adds enriched metadata, similarity graphs, citation counts, and download scripts.
Is seqout suitable for large-scale searches?
Yes. The backend handles low-latency queries over millions of records. You can filter and compare across studies without waiting.
Who is seqout intended for?
We built seqout for researchers who explore public sequencing metadata and want faster, more structured ways to find datasets.
How often is seqout updated?
We refresh the metadata index on a regular schedule to stay in sync with NCBI and EBI. The last full refresh was on January 17, 2026. New datasets appear within a few days of their public release.
Can I use seqout programmatically?
Yes. seqout offers a free REST API with no authentication required. All endpoints return JSON and support cursor-based pagination. Rate limits are 60 requests/minute for most endpoints, 30/minute for search, and 10/minute for bulk operations. See the API Reference for full documentation.
What is enriched metadata?
For many projects, we run small language models (SLMs) over free-text sample descriptions to extract structured fields like tissue, cell type, disease, sex, and age. The extractions may contain errors, so treat them as a starting point rather than ground truth. Enriched columns appear in the sample table with a purple AI badge.
What is the MCP server?
seqout exposes a remote Model Context Protocol (MCP) server. LLM clients like Claude Desktop can connect to it and search datasets through chat. The URL is https://seqout.org/api/mcp. See the MCP page for setup instructions.
How does the similarity graph work?
We embed each project into a vector space based on its metadata and precompute nearest-neighbor relationships. The similarity graph renders these as an interactive 3D force-directed layout. You can filter by organism and click through clusters of related studies.
What is the 2D accession map?
The Map page shows a 2D embedding of roughly 1 million datasets, where proximity reflects metadata similarity. You can zoom, pan, filter by country, and click individual points to navigate to project pages. The browser loads data in a binary format for fast rendering.
How do I cite seqout?
Aniruddha Mukherjee and Saket Choudhary. seqout.org.
Is seqout open source?
Yes. The frontend source code lives on GitHub at github.com/saketlab/seqout.
What browsers are supported?
Chrome, Firefox, Safari, and Edge all work. The 3D similarity graph and deck.gl maps require WebGL.
Data Sources
seqout indexes publicly available metadata from these sources. We thank the teams behind these repositories for making sequencing data public. We do not host or redistribute raw sequencing data.
NCBI GEOGene Expression Omnibus. ncbi.nlm.nih.gov/geoNCBI SRASequence Read Archive. ncbi.nlm.nih.gov/sraEMBL-EBI ENAEuropean Nucleotide Archive. ebi.ac.uk/enaEMBL-EBI ArrayExpressFunctional Genomics Data. ebi.ac.uk/arrayexpress
Feedback & Contact
Found a bug or have a feature request? Open an issue on GitHub.
KCDH + IITB Logo© Saket Lab, 2026