[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sheflug] Weird algorithm question



On Fri, Jun 25, 2010 at 04:57:39PM +0100, Alex Hudson wrote:
> On 25/06/10 16:37, Adam Funk wrote:
>> I also posted this to the corpora mailing list (related to natural
>> language processing) and got some interesting suggestions about graph
>> search algorithms.
>>    
>
> Yeah; thinking about it, it's not really travelling salesman because  
> you're looking for the longest route, not the longest route visiting all  
> nodes (as I understand the problem, anyway).
>
> Thinking about this, there ought to be some MapReduce magic you could  
> work here. If there are "loops" where you have start and end nodes with  
> multiple "routes" between them but only two vertices per node, e.g.:
>
> --A--C--E--T--B--
>     \----F--G----/
>
> .. then you can automatically throw away the A->F->G->B route because  
> it's never going to be part of any longest route.
>
> I don't know how much of that kind of pruning you can actually do, but I  
> would have thought that could help cut down the search space pretty  
> usefully.
>
> It's still going to be a pretty crazy complex problem to solve tho ;)

On the other hand, the OP's dataset sounds like more like a simple tree,
in which case a simple O(N) algorithm should work:

    #!/usr/bin/perl -w

    my %parent;
    my %nonterminal;
    while (<DATA>) {
	chomp;
	my ($child,$parent) = split /,/, $_;
	$parent{$child} = $parent;
	$nonterminal{$parent} = 1;
    }


    for my $base (keys %parent) {
	next if $nonterminal{$base};
	my $n = $base;
	while (defined $n) {
	    print "$n,";
	    $n = $parent{$n};
	}
	print "\n";
    }

    __END__
    100001930,100001740
    100002137,100001740
    100002452,100001930
    100002684,100001930
    100003553,100002684
    100003993,100003553
    100004258,100003553
    100004475,100004258
    100005787,100004475
    100005930,100004475

outputs

100002452,100001930,100001740,
100005930,100004475,100004258,100003553,100002684,100001930,100001740,
100005787,100004475,100004258,100003553,100002684,100001930,100001740,
100002137,100001740,
100003993,100003553,100002684,100001930,100001740,



-- 
Nothing ventured, nothing lost.

_______________________________________________
Sheffield Linux User's Group
http://sheflug.org.uk/mailman/listinfo/sheflug_sheflug.org.uk
FAQ at: http://www.sheflug.org.uk/mailfaq.html

GNU - The Choice of a Complete Generation