[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Sheflug] Weird algorithm question



I'm trying to figure out a good algorithm for some NLP work, but I don't
know what the algorithm or computational problem is called.  I'd
appreciate any pointers in the right direction.

I have a file of pairs of WordNet synset numbers that represent
hypernym-hyponym pairs (words' definitions that represent superclasses
and subclasses, such as organism-animal, organism-plant, animal-vertebrate).

100001930,100001740
100002137,100001740
100002452,100001930
100002684,100001930
100003553,100002684
100003993,100003553
100004258,100003553
100004475,100004258
100005787,100004475
100005930,100004475

and I want to produce all the longest possible chains that can be made
by joining any overlapping chains together and discarding any chains
that are completely contained in other ones.  The output for that list
(the head of my real dataset) is as follows:

100002137,100001740
100002452,100001930,100001740
100003993,100003553,100002684,100001930,100001740
100005787,100004475,100004258,100003553,100002684,100001930,100001740
100005930,100004475,100004258,100003553,100002684,100001930,100001740

I have a program that does this correctly for small samples of the data
I'm supposed to process, but I've estimated (using a log plot) that it
will take about 10^595 seconds to run over the whole input file  (about
89000 pairs), so I think I'm doing it wrong.  (I think the sun will
expand to earth's orbit sooner than that.)

_______________________________________________
Sheffield Linux User's Group
http://sheflug.org.uk/mailman/listinfo/sheflug_sheflug.org.uk
FAQ at: http://www.sheflug.org.uk/mailfaq.html

GNU - The Choice of a Complete Generation