Sung Research Group

Computational Precision Medicine & Surgery Laboratory

Source Codes & Datasets

8. Gut Microbiome Wellness Index 2 (GMWI2) (Chang and Gupta et al., Nature Communications, 2024)

A command-line tool for computing the GMWI2 score of a stool metagenome from its corresponding raw .fastq sequence file can be installed via Anaconda (https://anaconda.org/bioconda/GMWI2).
The source code for the tool, processed datasets (including the taxonomic profiles of all metagenome samples analyzed in this study), and code notebooks essential to reproduce all results presented in our study, as well as complete instructions for installation and usage, are freely available online at https://github.com/danielchang2002/GMWI2.

7. TaxiBGC: a Taxonomy-guided Approach for Profiling Experimentally Characterized Microbial Biosynthetic Gene Clusters and Secondary Metabolite Production Potential in Metagenomes (Gupta et al., mSystems, 2022)

TaxiBGC is a command line tool implemented in Python and Bash. To avoid dependency conflicts, TaxiBGC can be installed via Anaconda: https://anaconda.org/danielchang2002/TaxiBGC.
Source code, MIBiG BGC gene sequences, the TaxiBGC reference database, and full instructions for installation and usage are freely available at https://github.com/danielchang2002/TaxiBGC_2022.

6. Patients with ACPA-positive and ACPA-negative Rheumatoid Arthritis Show Different Serological Autoantibody Repertoires and Autoantibody Associations with Disease Activity (Cunningham et al., Scientific Reports, 2023)

The source code and data sets used to generate the results presented in this study are available on our lab’s GitHub page.

5. Global Transcriptomic Profiling Identifies Differential Gene expression Signatures between Inflammatory and Non-inflammatory Aortic Aneurysms (Hur et al. Arthritis & Rheumatology, 2022)

The source code and data sets used to generate the results presented in this study are available on our lab’s GitHub page.

4. Plasma Metabolomic Profiling in Patients with Rheumatoid Arthritis Identifies Biochemical Features Indicative of Quantitative Disease Activity (Hur et al. Arthritis Res Ther., 2021).

Data and Code Availability: Raw metabolomic datasets, as well as source codes used to reproduce all results of this study, are available on our lab’s GitHub page.

3. A Predictive Index for Health Status Using Species-level Gut Microbiome Profiling (Gupta et al., Nat. Comm., 2020)

Data Availability:
- Raw sequencing data accession IDs of all publicly available stool metagenomes used in our study:
  - (discovery cohort) Supplementary Data 1
  - (validation cohort) Supplementary Data 4
- Sequences for rheumatoid arthritis (RA) stool metagenomes (.fastq files from 49 patients with RA) used for GMHI validation have been deposited at NCBI’s Sequence Read Archive (SRA):
  - BioProject number PRJNA598446
Code Availability:
- R scripts demonstrating how to reproduce all findings shown in the main figures of our paper, as well as how to calculate GMHI for a given stool metagenome sample, are available on our lab’s GitHub page.

2. Metabolic Influence between Ordered Pairs of Microbial Entities (Sung et al., Nat. Comm., 2017)

In complex, microbial ecosystems, a microbial entity can provide nutrients to another entity via interspecies cross-feeding of metabolic byproducts and/or release of macromolecule degradation products. This positive impact may potentially promote microbial growth. In contrast, a microbial entity can limit another entity’s access to nutrients via competition for the same metabolites. This negative impact may potentially inhibit microbial growth. Accordingly, we can leverage information from our microbial metabolite transport network (NJS16) to formulate and quantify the net metabolic influence of a given microbial entity on another entity. This approach allows us to construct a community-scale network of positive and negative metabolic influences between pairs of microbial entities differentially abundant or scarce in a given context, e.g. gut microbiomes of T2D patients vs. non-diabetic controls. Note: code written in C++.
- DOCUMENTATION
- DOWNLOAD (example dataset and scripts for Linux/Mac users)
- DOWNLOAD (example dataset and scripts for Windows users)
To provide a global framework for understanding community metabolism within the human gut, we present NJS16, the first literature-curated, community-level network of the human gut microbiota organized through metabolite transport. The network is a compilation of 4,483 annotated transport or degradation reactions (from about 400 research articles, reviews, and textbooks) between 244 metabolic compounds (229 small molecules and 15 macromolecules) and 570 microbial species and human cell types (511 bacteria, 56 archaea, and 3 host cells). Specifically, our network shows how individual microbes interact with their chemical environment (via metabolite import, export, and macromolecule degradation), and thereby with other microbes (via resource competition, interspecies cross-feeding, and releasing macromolecule degradation products as public goods).
- DOCUMENTATION
- DOWNLOAD (NJS16 in .xlsx, .txt, & .xml formats )

1. Identification of Structured Signatures and Classifiers (ISSAC) (Sung et al., PLoS Comp. Bio., 2013)

The identification of molecular signatures from blood, saliva, or urine that accurately reflect major pathologies of a unique organ system will be a significant advance in molecular cancer diagnostics. ISSAC is a machine-learning algorithm that stratifies multiple clinical phenotypes simultaneously based on relative expression of biological features (e.g. genes, proteins). ISSAC uses a data-driven, hierarchical approach to first organize phenotypes into a global hierarchy, and then learns the corresponding (binary) classifiers on training data. More specifically, it first constructs a tree-structured hierarchy of disease phenotypes based on agglomerative clustering; and then learns binary classifiers corresponding to the nodes and edges of the classification hierarchy. These classifiers were based on comparing ranked expression values of gene-pair sets. The genes appearing in the hierarchy of decision rules can then be accumulated into a panel of biomarkers, which can then direct disease stratification down a classification tree towards a particular phenotype. Note: code written in MATLAB.
- DOCUMENTATION
- DOWNLOAD (example dataset and scripts)