Question 1

Where does seqout fetch its datasets from?

Accepted Answer

We maintain a local mirror of all publicly available datasets on NCBI's FTP servers. This includes all SRA datasets and GEO datasets. We also index ArrayExpress and ENA metadata from EBI. We do not own or modify the original data.

Question 2

Does seqout download sequencing data?

Accepted Answer

No. seqout only indexes and serves metadata. It does not download or host raw sequencing files such as FASTQ or BAM. Project pages do provide bash scripts for downloading FASTQ/SRA files from NCBI, AWS S3, and Google Cloud Storage.

Question 3

How is seqout different from browsing NCBI directly?

Accepted Answer

seqout combines GEO, SRA, ENA & ArrayExpress metadata into one interface with relevance-ranked search and consolidated tabular views. NCBI spreads this across multiple pages. seqout also adds enriched metadata, similarity graphs, citation counts, and download scripts.

Question 4

Is seqout suitable for large-scale searches?

Accepted Answer

Yes. The backend handles low-latency queries over millions of records. You can filter and compare across studies without waiting.

Question 5

Who is seqout intended for?

Accepted Answer

We built seqout for researchers who explore public sequencing metadata and want faster, more structured ways to find datasets.

Question 6

How often is seqout updated?

Accepted Answer

We refresh the metadata index on a regular schedule to stay in sync with NCBI and EBI. The last full refresh was on January 17, 2026. New datasets appear within a few days of their public release.

Question 7

Can I use seqout programmatically?

Accepted Answer

Yes. seqout offers a free REST API with no authentication required. All endpoints return JSON and support cursor-based pagination. Rate limits are 60 requests/minute for most endpoints, 30/minute for search, and 10/minute for bulk operations. See the API Reference for full documentation.

Question 8

What is enriched metadata?

Accepted Answer

For many projects, we run small language models (SLMs) over free-text sample descriptions to extract structured fields like tissue, cell type, disease, sex, and age. The extractions may contain errors, so treat them as a starting point rather than ground truth. Enriched columns appear in the sample table with a purple AI badge.

Question 9

What is the MCP server?

Accepted Answer

seqout exposes a remote Model Context Protocol (MCP) server. LLM clients like Claude Desktop can connect to it and search datasets through chat. The URL is https://seqout.org/api/mcp. See the MCP page for setup instructions.

Question 10

How does the similarity graph work?

Accepted Answer

We embed each project into a vector space based on its metadata and precompute nearest-neighbor relationships. The similarity graph renders these as an interactive 3D force-directed layout. You can filter by organism and click through clusters of related studies.

Question 11

What is the 2D accession map?

Accepted Answer

The Map page shows a 2D embedding of roughly 1 million datasets, where proximity reflects metadata similarity. You can zoom, pan, filter by country, and click individual points to navigate to project pages. The browser loads data in a binary format for fast rendering.

Question 12

How do I cite seqout?

Accepted Answer

Aniruddha Mukherjee and Saket Choudhary. seqout.org.

Question 13

Is seqout open source?

Accepted Answer

Yes. The frontend source code lives on GitHub at github.com/saketlab/seqout.

Question 14

What browsers are supported?

Accepted Answer

Chrome, Firefox, Safari, and Edge all work. The 3D similarity graph and deck.gl maps require WebGL.