# Biological Sequences

### Biological Sequences

Biological sequences are a fundamental form of 1D biomedical data. They represent the linear order of molecular building blocks—such as nucleotides in DNA/RNA or amino acids in proteins.\
Analyzing these sequences is essential for understanding the structure, function, and evolution of living systems.

![](https://lh7-rt.googleusercontent.com/slidesz/AGV_vUfqLWj0NlNFmehEai_VLhtixxC9cSaB-3QFfjmpIercyFch7k4hnk0UQHAnP7zLq3c6ec8d9vNN07Ve63ypwMgHjSL-mcace7t7ca5oDgdys8v4OPZrATAfH0uuVbZYlpF9XJHnZw=s2048?key=_t06K9zDHrDgeHDIPqC4uaCR)

***

#### Genomic Sequences (DNA, RNA)

Genomic sequences encode the genetic blueprint of an organism. DNA (deoxyribonucleic acid) is the long-term store of genetic information, while RNA (ribonucleic acid) plays critical roles in gene expression and regulation.

DNA sequences use the bases: **A**, **T**, **C**, and **G**, while RNA replaces thymine (T) with **U** (uracil). Data formats commonly used include:

* **FASTA / FASTQ** – for raw or annotated sequences
* **SAM / BAM / CRAM** – for aligned reads
* **GFF / GTF** – for gene annotations

**Data Support**

Genomic data is large, complex, and sensitive.\
Our platform provides **a secure and scalable infrastructure** for storing, sharing, and analyzing such datasets. Using **Equivariant Encryption (EE)**, we enable:

* Encrypted genomic data repositories
* Collaborative analysis workflows
* Compliance with privacy and bioethics standards

**Model Support**

Foundation models have dramatically improved genomic analysis by enabling:

* Gene function prediction
* Regulatory element detection
* Variant impact modeling

We support **encrypted evaluation and fine-tuning of genomics foundation models**.

* [**scGPT**](https://github.com/bowang-lab/scGPT)\
  A generative pre-trained model for single-cell biology. It handles imputation, denoising, and synthetic state generation for gene expression data.
* [**DNABERT**](https://github.com/MAGICS-LAB/DNABERT_2)\
  A BERT-style model trained on human and bacterial genomes. It excels at tasks like promoter detection, splice site prediction, and regulatory element discovery.

***

#### Proteomic Sequences

Proteins are chains of amino acids that fold into 3D structures to carry out cellular functions.\
There are 20 standard amino acids, and their linear order determines a protein's folding and function.

Proteomic sequences are typically stored in:

* **FASTA** – for raw sequences
* **mzML / mzXML / mzIdentML** – for mass spectrometry data

**Data Support**

Proteomic data often includes sensitive therapeutic and diagnostic information.\
Our platform supports **secure storage and encrypted processing** of proteomic sequences, enabling:

* Secure data sharing between labs
* Privacy-preserving collaboration on biomarker discovery
* Scalable proteomics pipelines using EE

**Model Support**

Modern deep learning models are capable of:

* Predicting 3D protein structure
* Modeling protein interactions
* Designing novel therapeutics

We support **secure inference and fine-tuning** for proteomics foundation models.

* [**AlphaFold**](https://github.com/google-deepmind/alphafold)\
  A groundbreaking deep learning model that predicts the 3D structure of proteins from amino acid sequences with near-experimental accuracy. Its use cases span drug discovery, structural biology, and protein design.

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://rexis.gitbook.io/rexis/biological-sequences.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.