Over the last few years a number of people have asked for re-implementation of permutation tests for the proximity coefficient.  It used to be in the original software but it got left out of our revamped Java version because of other priorities. Below is a version for Perl. Copy and paste this to a text edit and save it as proxcalc_permute.pl.  It’s written in a way that you don’t need to download any extra packages, etc..  You just need perl, which you have if you’re using OSX and can download for free (e.g., Active State Perl) if you’re on Windows.  Once you’ve installed perl, just open your Command line prompt, navigate to the appropriate folder (using “cd folder…”) and then type “perl proxcalc_permute.pl”.  It will run any .txt file found in the same directory as proxcalc_permute.pl (so save it to the same directory).  A proper data file should have one code on one line (put in time order, obviously).

It outputs two things.  The first is an output file containing P values and p-values.  The second is a folder, data, with the raw data and R code.  If you drop the R code into the R console it will draw you a histogram of the Proximity coefficients that came out at every permutation, and the value found for your actual data.

#!/usr/bin/perl
use strict;
use List::Util qw(shuffle);
use Cwd;

my $file;
my $handle;
my @sequence;
my %key;
my %data;
my $permutations = 1000; #Change this value to increase or decrease permutes
my $give_me_plots = 1; #Change this value to 1 if you want raw data outputed with R script to plot

opendir (DIR,”.”);

while ($file = readdir(DIR)){

#These lines just ensure only .txt files in the folder are examined
$handle = substr ($file,-4,4);
if ($handle =~ “.txt”){
open IN, “$file” or die print “Cannot open input file\n”;

#These lines load in the .txt file, adding each line onto an array
while (<IN>){
s/\r\n|\r/\n/g;
chomp;
next if /^$/;
push @sequence,$_;
}
close IN;

#Here’s the main code. Loops the same calculations 1001 times. First one gets the actual result
for (my $runs = 1; $runs <= ($permutations + 1); $runs++){

#Variable declarations. Length = sequence length; count_d = distance between contingency; number_d = number of contingency; nomore = stops calculation after first find of contingency
my $length = scalar @sequence;
my %count_d;
my %number_d;
my %nomore;

print “Progress: $runs\n”;

#now run through the sequence using two ‘pointers’ p and q, where q is always ahead of p, looking for the response to each cue represented by p. Every time p changes we must reset nomore
for (my $p = 0; $p <= ($length – 1); $p++){

undef %nomore;

for ((my $q = $p + 1); $q <= ($length – 1); $q++){
next if exists $nomore{$sequence[$q]};
$count_d{$sequence[$p].$sequence[$q]} += (($q – $p) – 1);
$number_d{$sequence[$p].$sequence[$q]}++;
$nomore{$sequence[$q]} = 1;
}
}

#Push the result into a hash of arrays. By using push we know the first entry is the ‘actual’ result
foreach my $key (keys %count_d){
my $proximity = 1 – ($count_d{$key} / ($number_d{$key}*($length – 2)));
push @{$data{$key}},$proximity;
}

undef %count_d;
undef %number_d;

@sequence = shuffle @sequence;
}

#OK, now let’s write some output
my $result = 0;
my $above = 0;
my $finding;

open OUT, “>results\_$file.text” or die print “Cannot open output file\n”;
print OUT “Cont.\tP coeff.\tAbs. p-value\n”;

#Here, for each contingency, we take the ‘actual’ result and compare it to the other 10000 runs
foreach my $key (sort keys %data){
$above = 0;
$finding = shift @{$data{$key}};

foreach (@{$data{$key}}){
$above++ if ($_ >= $finding);
}

$result = ($above / $permutations);
print OUT “$key\t$finding\t$result\n”;
}

if ($give_me_plots == 1){
mkdir “data” if ( !-d “data”);
chdir “data”;
foreach my $key (keys %data){
open DATA, “>$key\_data.dat” or die print “Cannot open output file”;
open R, “>$key\_rplot\_code.R”;
$finding = shift @{$data{$key}};

foreach (@{$data{$key}}){
print DATA “$_\n”;
}
close DATA;
print R “$key \= read\.table\(\”$key\_data\.dat\”\)\n”;
print R “hist \($key\$V1\)\n”;
print R “abline\(v = $finding\, col\=2\, lty\=2\, lwd\=2\)\n”;
}
}

undef %data;
}
}

If you have no idea what the above means but know that you want to determine whether a particular proximity coefficient in your sequence is higher than might be expected by chance, then get in touch–you’re on the right lines.