Rexis
  • Rexis: The L1 for DeSci
  • Key Challenges in BioDeSci
    • Overview
  • Data Level: Secure Biomedical Sharing
  • Model Level: Privacy-Preserving Fine-Tuning
  • Evaluation Level: Reproducible Computation
  • The Rexis Solution: Layer for DeSci
    • Overview
    • Data Level: Decentralized Biomedical Data Market
    • Model Level: Privacy-Preserving Training & Inference via Equivariant Encryption
    • Evaluation Level: Secure and Verifiable Biomedical Computation
  • Example BioDeSci Data and Models
    • Overview
  • Tabular Data
  • Biomedical Signals
  • Biological Sequences
  • Medical Imaging
  • Volumetric Medical Imaging
  • Spatial Omics Data
  • Tokenomics
    • $REX Overview
  • Links
    • rexis.io
  • Term of Use
  • Privacy Policy
  • Community
Powered by GitBook
On this page

Biological Sequences

PreviousBiomedical SignalsNextMedical Imaging

Last updated 1 month ago

Biological Sequences

Biological sequences are a fundamental form of 1D biomedical data. They represent the linear order of molecular building blocks—such as nucleotides in DNA/RNA or amino acids in proteins. Analyzing these sequences is essential for understanding the structure, function, and evolution of living systems.


Genomic Sequences (DNA, RNA)

Genomic sequences encode the genetic blueprint of an organism. DNA (deoxyribonucleic acid) is the long-term store of genetic information, while RNA (ribonucleic acid) plays critical roles in gene expression and regulation.

DNA sequences use the bases: A, T, C, and G, while RNA replaces thymine (T) with U (uracil). Data formats commonly used include:

  • FASTA / FASTQ – for raw or annotated sequences

  • SAM / BAM / CRAM – for aligned reads

  • GFF / GTF – for gene annotations

Data Support

Genomic data is large, complex, and sensitive. Our platform provides a secure and scalable infrastructure for storing, sharing, and analyzing such datasets. Using Equivariant Encryption (EE), we enable:

  • Encrypted genomic data repositories

  • Collaborative analysis workflows

  • Compliance with privacy and bioethics standards

Model Support

Foundation models have dramatically improved genomic analysis by enabling:

  • Gene function prediction

  • Regulatory element detection

  • Variant impact modeling

We support encrypted evaluation and fine-tuning of genomics foundation models.


Proteomic Sequences

Proteins are chains of amino acids that fold into 3D structures to carry out cellular functions. There are 20 standard amino acids, and their linear order determines a protein's folding and function.

Proteomic sequences are typically stored in:

  • FASTA – for raw sequences

  • mzML / mzXML / mzIdentML – for mass spectrometry data

Data Support

Proteomic data often includes sensitive therapeutic and diagnostic information. Our platform supports secure storage and encrypted processing of proteomic sequences, enabling:

  • Secure data sharing between labs

  • Privacy-preserving collaboration on biomarker discovery

  • Scalable proteomics pipelines using EE

Model Support

Modern deep learning models are capable of:

  • Predicting 3D protein structure

  • Modeling protein interactions

  • Designing novel therapeutics

We support secure inference and fine-tuning for proteomics foundation models.


A generative pre-trained model for single-cell biology. It handles imputation, denoising, and synthetic state generation for gene expression data.

A BERT-style model trained on human and bacterial genomes. It excels at tasks like promoter detection, splice site prediction, and regulatory element discovery.

A groundbreaking deep learning model that predicts the 3D structure of proteins from amino acid sequences with near-experimental accuracy. Its use cases span drug discovery, structural biology, and protein design.

scGPT
DNABERT
AlphaFold