3 minute read

GWAVA, Genome Wide Annotation of VAriants, is a tool aiming to predict the functional impact of non-coding genetic variants. It consists of two parts:

  1. a procedure of variant annotation
  2. a random-forest classifier using variant annotations to predict functional variants vs non-functional ones

GWAVA was published in 2014. As you can see from the README from its FTP site, it depends on some older versions of libs:

  • numpy==1.7.0
  • scipy==0.11.0
  • pandas==0.12.0
  • scikit-learn==0.14.1
  • pybedtools==0.6.4
  • tabix (0.2.5)

The source code can be found in the src folder on its FTP, or in the Supplementary Information section on its Nature Methods page.

Of course, it’s better to establish an virtualenv specially to run GWAVA code. The project folder structure is like:

  • {PROJECT}
    • src (download fom FTP)
    • annotated (Ditto)
    • paper_data (Ditto)
    • source_data (Ditto. Caution: >22G; plan ahead.)
    • training_sets (Ditto)
    • tmp (just mkdir a new folder; it’s required but won’t be created automatically by GWAVA)

Issue 1: numpy compatibility

My current python2 version, 2.7.12, is not compatible with such an old numpy, so I just installed the newest numpy==1.11.1.

Issue 2: tabix installation

You can go with apt install tabix and probably get tabix (1.2.1-2ubuntu1).

To install the specific version manually, go to SAM tools site to download and then run:

tar jxvf tabix-0.2.5.tar.bz2
cd tabix-0.2.5
make
cp bgzip tabix /YourBinFolder

Update 2018-04-23

I met a make error today:

erik:tabix-0.2.5$ make
make[1]: Entering directory '/home/erik/Downloads/tabix-0.2.5'
gcc -g -Wall -O2 -fPIC  -o tabix main.o -lm  -lz -L. -ltabix
./libtabix.a(bgzf.o): In function `deflate_block':
/home/erik/Downloads/tabix-0.2.5/bgzf.c:311: undefined reference to `deflate'
/home/erik/Downloads/tabix-0.2.5/bgzf.c:313: undefined reference to `deflateEnd'
/home/erik/Downloads/tabix-0.2.5/bgzf.c:305: undefined reference to `deflateInit2_'
/home/erik/Downloads/tabix-0.2.5/bgzf.c:329: undefined reference to `deflateEnd'
/home/erik/Downloads/tabix-0.2.5/bgzf.c:345: undefined reference to `crc32'
/home/erik/Downloads/tabix-0.2.5/bgzf.c:346: undefined reference to `crc32'
./libtabix.a(bgzf.o): In function `inflate_block':
/home/erik/Downloads/tabix-0.2.5/bgzf.c:380: undefined reference to `inflateInit2_'
/home/erik/Downloads/tabix-0.2.5/bgzf.c:385: undefined reference to `inflate'
/home/erik/Downloads/tabix-0.2.5/bgzf.c:391: undefined reference to `inflateEnd'
/home/erik/Downloads/tabix-0.2.5/bgzf.c:387: undefined reference to `inflateEnd'
./libtabix.a(bedidx.o): In function `ks_getuntil':
/home/erik/Downloads/tabix-0.2.5/bedidx.c:11: undefined reference to `gzread'
./libtabix.a(bedidx.o): In function `bed_read':
/home/erik/Downloads/tabix-0.2.5/bedidx.c:103: undefined reference to `gzdopen'
/home/erik/Downloads/tabix-0.2.5/bedidx.c:138: undefined reference to `gzclose'
./libtabix.a(bedidx.o): In function `ks_getc':
/home/erik/Downloads/tabix-0.2.5/bedidx.c:11: undefined reference to `gzread'
./libtabix.a(bedidx.o): In function `bed_read':
/home/erik/Downloads/tabix-0.2.5/bedidx.c:103: undefined reference to `gzopen64'
collect2: error: ld returned 1 exit status
Makefile:41: recipe for target 'tabix' failed
make[1]: *** [tabix] Error 1
make[1]: Leaving directory '/home/erik/Downloads/tabix-0.2.5'
Makefile:18: recipe for target 'all-recur' failed
make: *** [all-recur] Error 1
erik:tabix-0.2.5$ make
make[1]: Entering directory '/home/erik/Downloads/tabix-0.2.5'
gcc -g -Wall -O2 -fPIC  -o tabix main.o -lm  -L. -ltabix -lz
gcc -c -g -Wall -O2 -fPIC  -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE  bgzip.c -o bgzip.o

I am not a C++ expert but I found a possible cause mentioned by djcsdy:

The problem was that the dynamic linking policy has changed in recent versions of GNU ld.

More details from djcsdy’s comments:

This is required due to changes in DSO linking policy in recent versions of GNU ld.

Previously ld would automatically link transitive dependencies, but it no longer does so.

libpng depends on zlib, so we must now also explicitly link in zlib.

The workaround is mentioned in Undefined reference to _gzopen etc. In my case, simply moving the -lz option to the end of line 41 of the Makefile would do:

tabix:lib $(AOBJS)
		$(CC) $(CFLAGS) -o $@ $(AOBJS) -lm $(LIBPATH) -L. -ltabix -lz

Issue 3: pybedtools, the python interface, needs its implementation, a binary lib bedtools

The latest bedtools 2.25.0, available via apt, is not back-compatible and would raise errors like:

***** ERROR: Unrecognized parameter: -ops *****
***** ERROR: Unrecognized parameter: freqdesc *****

even though -ops and freqdesc are “legal” options listed in its man page… (Jesus!)

From the title of the post bedtools 2.18.2 and pybedtools 0.6.4 from Google Group - bedtools-discuss by Aaron Quinlan, one of the developers of bedtools, we can see that we should use bedtools 2.18.2.

bedtools was originally hosted on Google Code - bedtools, but now on Github - arq5x/bedtools2. Follow this document to install:

tar -zxvf bedtools-2.18.2.tar.gz
cd bedtools-2.18.2
make
cp ./bin/bedtools /usr/local/bin

If you don’t want to mess up your /usr/local/bin directory, add the following line to gwava_annotate.py:

# pybedtools.set_bedtools_path("~/Downloads/bedtools-2.18.2/bin")  # WRONG. See update 2018-11-13

You can get other versions rather than 2.25.0 of bedtools by apt, but I am not sure whether those versions are compatible or not. See Ubuntu - bedtools package for more details.

Update 2018-11-13

Python cannot recognize ~ as home path. So use either full path like

pybedtools.set_bedtools_path("/home/erik/Downloads/bedtools-2.18.2/bin")

or os.path.expanduser:

import os

pybedtools.set_bedtools_path(os.path.expanduser("~/Downloads/bedtools-2.18.2/bin"))

Issue 4: nobody ever told me that samtools is required…

… and which version?

Luckily, the Availability and Requirements section of a 2014 paper, SPANDx: a genomics pipeline for comparative analysis of large haploid whole genome re-sequencing datasets, indicated:

Other requirements: … SAMtools 0.1.19, BEDTools 2.18.2…

Well, let’s use samtools 0.1.19. (SMH…) And this time finally we can just use apt install samtools! Yikes! See Ubuntu - samtools package for more details.

Categories:

Updated:

Comments