Software and Data
Some tools that I build and contribute to that you may also find useful.
Software
formatters. [R] We provide a framework for rendering complex tables to ASCII, and a set of formatters for transforming values or sets of values into ASCII-ready display strings.
rtables. [R] Reporting tables often have structure that goes beyond simple rectangular data. The ‘rtables’ package provides a framework for declaring complex multi-level tabulations and then applying them to data. This framework models both tabulation and the resulting tables as hierarchical, tree-like objects which support sibling sub-tables, arbitrary splitting or grouping of data in row and column dimensions, cells containing multiple values, and the concept of contextual summary computations. A convenient pipe-able interface is provided for declaring table layouts and the corresponding computations, and then applying them to data.
rlistings. [R] Listings are often part of the submission of clinical trial data in regulatory settings. We provide a framework for the specific formatting features often used when displaying large datasets in that context.
tern. [R] Table, Listings, and Graphs (TLG) library for common outputs used in clinical trials.
teal.modules.clinical. [R] This package contains a set of standard teal modules to be used with CDISC data in order to generate many of the standard outputs used in clinical trials.
DEploid. [c++, R] To deconvolve mixed Plasmodium falciparum sequence data, and report the mixture proportions. An R version is available at CRAN.
Hybrid-lambda. [c++] A simulation tool for lambda coalescent of a given species network. The population structure (species tree/network) is expressed as a Newick string.
Hybrid-coal. [c++] To compute gene tree probabilities given a species network under the coalescent process using dynamic programming.
smcsmc. [c++] This software package infers demographic events and rates from multiple-sample whole-genome sequence data.
scrm. [c++] A kingman coalescent simulator suitable for large- scale whole-genome sequences simulations.
Chamaeleo. [python] Chamaeleo is currently the only collection focused on different codec methods for DNA storage. This kit is mainly developed and operated by BGI-Research (Shenzhen). It provides you a chance to use the classical DNA encoding and decoding methods to save files into DNA sequences or load them from DNA sequences.
I have also contributed to the code base of the following programs:
GA4GH-server. [python], Reference implementation of the APIs defined in ga4gh-schemas.
msprime. [c, python] A reimplementation of Hudson’s classical ms simulator for modern data sets.
TreeWAS. [R] Use tree-structured healthcare data to perform genetic association studies using the UK Biobank data.
Data
Pf3k Deconvoluted Plasmodium falciparum haplotypes.