Professor K. Sekar

Biological Databases
Protein Sequence Databases

PIR- The Protein Information Resource (PIR) is an integrated public bioinformatics resource to support genomic, proteomic and systems biology research and scientific studies.

UniProt- The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and annotation data.

UniProtKB- The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation.

Swiss-Prot (UniProtKB)- Swiss-Prot is a high quality manually annotated and non-redundant protein sequence database, which brings together experimental results, computed features and scientific conclusions.

TrEMBL (UniProtKB)- UniProtKB/TrEMBL contains high quality computationally analyzed records that are enriched with automatic annotation and classification.

Protein (NCBI)- The Protein database is a collection of sequences from several sources, including translations from annotated coding regions in GenBank, RefSeq and TPA, as well as records from SwissProt, PIR, PRF, and PDB.

Motif, Domain & Protein Families Databases

InterPro- InterPro is a resource that provides functional analysis of protein sequences by classifying them into families and predicting the presence of domains and important sites.

PROSITE- PROSITE is a database of protein families and domains.

Conserved Domain Database (CDD)- CDD is a protein annotation resource that consists of a collection of well-annotated multiple sequence alignment models for ancient domains and full-length proteins.

Pfam- The Pfam database is a large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs).

PRINTS-Protein Motif Fingerprint Database
PRINTS is a compendium of protein fingerprints. A fingerprint is a group of conserved motifs used to characterize a protein family;

DOMO- A database of aligned homologous protein domain families. Each entry of the database corresponds to one family of homologous domains.

HOMologous STRucture Alignment Database (HOMSTRAD) A curated database of structure-based alignments for homologous protein families.

ProDom- ProDom is a comprehensive database of protein domain families generated from the global comparison of all available protein sequences.

SBASE- SBASE is a database of annotated protein domains homologies based on sequence database search.

Protein Structural Databases

SCOP- Structural Classification Of Proteins
The SCOP database, created by manual inspection and abetted by a battery of automated methods, aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known.

CATH- Class, Architecture, Topology, Homologous
The CATH database is a hierarchical domain classification of protein structures in the Protein Data Bank. Protein structures are classified using a combination of automated and manual procedures.

Protein Data Bank (PDB)- The single repository of experimentally determined structures of proteins, nucleic acids and complex biomolecular assemblies as managed by the Worldwide Protein Data Bank.

Protein Data Bank in europe (PDBe)- PDBe is the European resource for the collection, organisation and dissemination of data on biological macromolecular structures.

PDBsum- The PDBsum is a pictorial database that provides an at a glance overview of the contents of each 3D structure deposited in the Protein Data Bank (PDB). It shows the molecule(s) that make up the structure (ie protein chains, DNA, ligands and metal ions) and schematic diagrams of their interactions.

Molecular Modeling Database (MMDB)- It's a database of experimentally determined three-dimensional biomolecular structures, and is also referred to as the Entrez Structure database.

3Dee- 3Dee is a Database of Protein Domain Definitions, contains structural domain definitions for all protein chains in the Protein Databank (PDB), currated by the EBI and the RCSB.

Genome Databases

Ensembl- The web server of the European eukaryotic genome resource is developed for accessing and comparing genome-scale data from species of scientific interest from across the taxonomy.

UCSC Genome Information- The genome browser website containing the reference sequence and working draft assemblies for a large collection of genomes at the University of California at Santa Cruz (UCSC).

NCBI Genome- The Genome database contains sequence and map data from the whole genomes of over 1000 species or strains.

GOLD- Genomes Online Database
GOLD is a World Wide Web resource for comprehensive access to information regarding genome and metagenome sequencing projects, and their associated metadata, around the world.

VISTA- VISTA is a comprehensive suite of programs and databases for comparative analysis of genomic sequences.

Database of Model Organisms

MGI- MGI is the international database resource for the laboratory mouse, providing integrated genetic, genomic, and biological data to facilitate the study of human health and disease.

RGD- The goal of RGD is to collect, consolidate, and integrate data generated from ongoing rat genetic and genomic research efforts and make these data widely available to the scientific community.

XenBase- Xenbase's mission is to provide the international research community with a comprehensive, integrated and easy to use web based resource that gives access to the diverse and rich genomic, expression and functional data available from Xenopus research.

ZFIN- Zebrafish model organism database supports the integrated zebrafish genetic, genomic and developmental information

FlyBase- A comprehensive database of drosophila genes and genomes maintained by Indiana University.

WormBase- WormBase is an international consortium of biologists and computer scientists dedicated to providing the research community with accurate, current, accessible information concerning the genetics, genomics and biology of C. elegans and related nematodes.

SGD- The Saccharomyces Genome database (SGD) provides comprehensive integrated biological information for the budding yeast Saccharomyces cerevisiae.

Plant Genome Databases

Phytozome- Phytozome facilitates comparative genomic studies amongst green plants.

Gramene- A curated open-source data resource for comparative genome analysis in the grasses including rice, maize, wheat, barley, sorghum etc, as well as other plants including arabidopsis, poplar and grape.

TAIR- The Arabidopsis Information Resource (TAIR) maintains a database of genetic and molecular biology data for the model higher plant Arabidopsis thaliana.

AtENSEMBL- A genome browser for the commonly studied plant model organism Arabidopsis thaliana.

Oryzabase- A comprehensive rice science database aimed to gather as much knowledge as possible ranging from classical rice genetics to recent genomics and from fundamental information to hot topics.

MaizeGDB- A community-oriented, long-term, federally funded informatics service to researchers focused on the crop plant and model organism Zea mays.

SoyBase- Integrating Genetics and Molecular Biology for Soybean Researchers.

SGN- The SOL Genomics Network (SGN) is a Clade Oriented Database (COD) containing genomic, genetic, phenotypic and taxonomic information for plant genomes.

ICuGI- The web portal for the International Cucurbit Genomics Initiative including melon, cucumber, watermelen, pumpkin, etc.

Bacterial Genome Databases

PATRIC- The Bacterial Bioinformatics Resource Center, an information system designed to support the biomedical research community's work on bacterial infectious diseases via integration of vital pathogen information with rich data and analysis tools.

GenoList- Integrated environment for comparative exploration of microbial genomes.

CyanoBase- A genome database for cyanobacteria and provides an easy way of accessing the sequences and all-inclusive annotation data on the structures of the cyanobacterial genomes.

Virus Genome Databases

Viral Genomes- NCBI viral genome information resource provides viral and viroid genome sequence data and related information.

GISAID- Global Initiative on Sharing Avian Influenza Data platform improves the sharing of influenza data.

OpenFlu- A database for human and animal influenza virus contains genomic and protein sequences as well as epidemiological data from more than 25,000 isolates.

NCBI Flu- NCBI Influenza Virus Resource with influenza genomic data, analysis tools for flu sequence analysis, annotation and submission to GenBank.

Plant Viruses- This site provides a central source of information about viruses, viroids and satellites of plants, fungi and protozoa.

Nucleotide Sequence Databases

GenBank (NCBI)- GenBank is part of the International Nucleotide Sequence Database Collaboration (the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI), contains an annotated collection of all publicly available DNA sequences.

ENA (EMBL)- The European Nucleotide Archive (ENA) is developed and maintained at the EMBL-EBI which captures and presents information relating to experimental workflows that are based around nucleotide sequencing.

RefSeq (NCBI)- The Reference Sequence (RefSeq) collection provides a comprehensive, integrated, non-redundant, well-annotated set of sequences, including genomic DNA, transcripts, and proteins. RefSeq sequences form a foundation for medical, functional, and diversity studies.

UniGene (NCBI)- UniGene computationally identifies transcripts from the same locus; analyzes expression by tissue, age, and health status; and reports related proteins (protEST) and clone resources.

DNA Data Bank of Japan (DDBJ)- DDBJ Center collects nucleotide sequence data as a member of INSDC (International Nucleotide Sequence Database Collaboration) and provides freely available nucleotide sequence data and supercomputer system, to support research activities in life science.

dbSNP (NCBI)- The Single Nucleotide Polymorphism Database serve as a central repository for both single base nucleotide substitutions and short deletion and insertion polymorphisms.

Nucleotide Structure Databases

Nucleic Acid Database (NDB)- The goal of the Nucleic Acid Database Project (NDB) is to archive and distribute structural information about nucleic acids. The NDB contains information about experimentally-determined nucleic acids and complex assemblies.