inflect

2024-03-28 11:31| 来源: 网络整理| 查看: 265

姓名

inflect.py - 正确生成复数、单数名词、序数、不定冠词；将数字转换为单词。

概要 import inflect p = inflect.engine() # METHODS: # plural plural_noun plural_verb plural_adj singular_noun no num # compare compare_nouns compare_nouns compare_adjs # a an # present_participle # ordinal number_to_words # join # inflect classical gender # defnoun defverb defadj defa defan # UNCONDITIONALLY FORM THE PLURAL print("The plural of ", word, " is ", p.plural(word)) # CONDITIONALLY FORM THE PLURAL print("I saw", cat_count, p.plural("cat", cat_count)) # FORM PLURALS FOR SPECIFIC PARTS OF SPEECH print( p.plural_noun("I", N1), p.plural_verb("saw", N1), p.plural_adj("my", N2), p.plural_noun("saw", N2), ) # FORM THE SINGULAR OF PLURAL NOUNS print("The singular of ", word, " is ", p.singular_noun(word)) # SELECT THE GENDER OF SINGULAR PRONOUNS print(p.singular_noun("they")) # 'it' p.gender("f") print(p.singular_noun("they")) # 'she' # DEAL WITH "0/1/N" -> "no/1/N" TRANSLATION: print("There ", p.plural_verb("was", errors), p.no(" error", errors)) # USE DEFAULT COUNTS: print( p.num(N1, ""), p.plural("I"), p.plural_verb(" saw"), p.num(N2), p.plural_noun(" saw"), ) print("There ", p.num(errors, ""), p.plural_verb("was"), p.no(" error")) # COMPARE TWO WORDS "NUMBER-INSENSITIVELY": if p.compare(word1, word2): print("same") if p.compare_nouns(word1, word2): print("same noun") if p.compare_verbs(word1, word2): print("same verb") if p.compare_adjs(word1, word2): print("same adj.") # ADD CORRECT "a" OR "an" FOR A GIVEN WORD: print("Did you want ", p.a(thing), " or ", p.an(idea)) # CONVERT NUMERALS INTO ORDINALS (i.e. 1->1st, 2->2nd, 3->3rd, etc.) print("It was", p.ordinal(position), " from the left\n") # CONVERT NUMERALS TO WORDS (i.e. 1->"one", 101->"one hundred and one", etc.) # RETURNS A SINGLE STRING... words = p.number_to_words(1234) # "one thousand, two hundred and thirty-four" words = p.number_to_words(p.ordinal(1234)) # "one thousand, two hundred and thirty-fourth" # GET BACK A LIST OF STRINGS, ONE FOR EACH "CHUNK"... words = p.number_to_words(1234, wantlist=True) # ("one thousand","two hundred and thirty-four") # OPTIONAL PARAMETERS CHANGE TRANSLATION: words = p.number_to_words(12345, group=1) # "one, two, three, four, five" words = p.number_to_words(12345, group=2) # "twelve, thirty-four, five" words = p.number_to_words(12345, group=3) # "one twenty-three, forty-five" words = p.number_to_words(1234, andword="") # "one thousand, two hundred thirty-four" words = p.number_to_words(1234, andword=", plus") # "one thousand, two hundred, plus thirty-four" # TODO: I get no comma before plus: check perl words = p.number_to_words(555_1202, group=1, zero="oh") # "five, five, five, one, two, oh, two" words = p.number_to_words(555_1202, group=1, one="unity") # "five, five, five, unity, two, oh, two" words = p.number_to_words(123.456, group=1, decimal="mark") # "one two three mark four five six" # TODO: DOCBUG: perl gives commas here as do I # LITERAL STYLE ONLY NAMES NUMBERS LESS THAN A CERTAIN THRESHOLD... words = p.number_to_words(9, threshold=10) # "nine" words = p.number_to_words(10, threshold=10) # "ten" words = p.number_to_words(11, threshold=10) # "11" words = p.number_to_words(1000, threshold=10) # "1,000" # JOIN WORDS INTO A LIST: mylist = p.join(("apple", "banana", "carrot")) # "apple, banana, and carrot" mylist = p.join(("apple", "banana")) # "apple and banana" mylist = p.join(("apple", "banana", "carrot"), final_sep="") # "apple, banana and carrot" # REQUIRE "CLASSICAL" PLURALS (EG: "focus"->"foci", "cherub"->"cherubim") p.classical() # USE ALL CLASSICAL PLURALS p.classical(all=True) # USE ALL CLASSICAL PLURALS p.classical(all=False) # SWITCH OFF CLASSICAL MODE p.classical(zero=True) # "no error" INSTEAD OF "no errors" p.classical(zero=False) # "no errors" INSTEAD OF "no error" p.classical(herd=True) # "2 buffalo" INSTEAD OF "2 buffalos" p.classical(herd=False) # "2 buffalos" INSTEAD OF "2 buffalo" p.classical(persons=True) # "2 chairpersons" INSTEAD OF "2 chairpeople" p.classical(persons=False) # "2 chairpeople" INSTEAD OF "2 chairpersons" p.classical(ancient=True) # "2 formulae" INSTEAD OF "2 formulas" p.classical(ancient=False) # "2 formulas" INSTEAD OF "2 formulae" # INTERPOLATE "plural()", "plural_noun()", "plural_verb()", "plural_adj()", "singular_noun()", # a()", "an()", "num()" AND "ordinal()" WITHIN STRINGS: print(p.inflect("The plural of {0} is plural('{0}')".format(word))) print(p.inflect("The singular of {0} is singular_noun('{0}')".format(word))) print(p.inflect("I saw {0} plural('cat',{0})".format(cat_count))) print( p.inflect( "plural('I',{0}) " "plural_verb('saw',{0}) " "plural('a',{1}) " "plural_noun('saw',{1})".format(N1, N2) ) ) print( p.inflect( "num({0}, False)plural('I') " "plural_verb('saw') " "num({1}, False)plural('a') " "plural_noun('saw')".format(N1, N2) ) ) print(p.inflect("I saw num({0}) plural('cat')\nnum()".format(cat_count))) print(p.inflect("There plural_verb('was',{0}) no('error',{0})".format(errors))) print(p.inflect("There num({0}, False)plural_verb('was') no('error')".format(errors))) print(p.inflect("Did you want a('{0}') or an('{1}')".format(thing, idea))) print(p.inflect("It was ordinal('{0}') from the left".format(position))) # ADD USER-DEFINED INFLECTIONS (OVERRIDING INBUILT RULES): p.defnoun("VAX", "VAXen") # SINGULAR => PLURAL p.defverb( "will", # 1ST PERSON SINGULAR "shall", # 1ST PERSON PLURAL "will", # 2ND PERSON SINGULAR "will", # 2ND PERSON PLURAL "will", # 3RD PERSON SINGULAR "will", # 3RD PERSON PLURAL ) p.defadj("hir", "their") # SINGULAR => PLURAL p.defa("h") # "AY HALWAYS SEZ 'HAITCH'!" p.defan("horrendous.*") # "AN HORRENDOUS AFFECTATION" 描述

模块inflect.py中类引擎的方法提供了复数变形、单数名词变形、英语单词的“a”/“an”选择以及将数字作为单词进行操作。

提供了所有名词、大多数动词和一些形容词的复数形式。在适当的情况下，还提供了“经典”变体（例如：“brother”->“brethren”、“dogma”->“dogmata”等）。

还提供了单一形式的名词。可以选择单数代词的性别（例如“他们”->“它”或“她”或“他”或“他们”）。

为所有英语单词和大多数首字母缩写提供基于发音的“a”/“an”选择。

也可以将数字 (1,2,3) 变形为序数 (1st, 2nd, 3rd) 和英语单词 (“one”, “two”, “three”)。

在生成这些屈折变化时，inflect.py遵循牛津英语词典和 Fowler 的现代英语用法中的指南，在两者不同意的情况下更喜欢前者。

该模块围绕标准英式拼写构建，但也旨在应对常见的美式变体。俚语、行话和其他英语方言没有被明确提供。

如果单个单词存在两个或多个变形形式（通常是“经典”形式和“现代”形式），inflect.py更喜欢更常见的形式（通常是“现代”形式），除非“经典”处理已经过指定（参见现代 VS 经典影响）。

形成复数和单数屈折复数和单数

所有的复数...复数变形方法都将要变形的单词作为它们的第一个参数并返回相应的变形。请注意，所有此类方法都需要单词的单数形式。传递复数形式的结果是不确定的（并且不太可能是正确的）。同样，si...单数变形方法需要单词的复数形式。

复数...方法还带有一个可选的第二个参数，它指示单词的语法“数字”（或被变形的单词必须与之一致的另一个单词）。如果提供了“number”参数并且不是1（或“one”或“a”，或其他暗示单数的形容词），则返回单词的复数形式。如果“数字”参数确实表示奇点，则返回（未变形的）单词本身。如果省略 number 参数，则无条件返回复数形式。

si...方法以类似的方式接受第二个参数。如果它是数字1的某种形式，或者被省略，则返回单数形式。否则复数原样返回。

inflect.engine的各种方法是：

复数名词（单词，计数=无）

方法plural_noun()采用单数英语名词或代词并返回其复数。处理主格（“我”->“我们”）和宾格（“我”->“我们”）中的代词，以及所有格代词（“我的”->“我们的”）。

复数动词（单词，计数=无）

方法plural_verb()采用变位动词的单数形式（即已经处于正确的“人”和“心情”中的动词）并返回相应的复数变位。

复数形容词（单词，计数=无）

方法plural_adj()采用某些类型形容词的单数形式并返回相应的复数形式。正确处理的形容词包括：“数字”形容词（“a” -> “some”）、指示形容词（“this” -> “these”、“that” -> “those”）和所有格（“my” -> “我们的”、“猫的” -> “猫的”、“孩子的” -> “孩子的”等）

复数（单词，计数=无）

方法复数（）采用单数英语名词、代词、动词或形容词，并返回其复数形式。如果一个词根据其词性有多个变形（例如，名词“thought”变形为“thoughts”，动词“thought”变形为“thought”），则（单数）名词意义优于（单数）动词意义。

因此，复数（“刀”）将返回“刀”（“刀”已被视为单数名词），而复数（“刀”）将返回“刀”（“刀”已被视为第三人称单数动词）。

这种情况的固有歧义表明，在词性已知的情况下，应优先使用复数名词、复数动词和复数adj。

单数名词（单词，计数=无）

方法singular_noun()采用复数英语名词或代词并返回其单数。处理主格（“我们”->“我”）和宾格（“我们”->“我”）情况下的代词，以及所有格代词（“我们的”->“我的”）。当返回第三人称单数代词时，它们默认采用中性性别（“他们”->“它”），而不是（“他们”->“她”）或（“他们”->“他”）。这可以通过gender()来改变。

请注意，所有这些方法都会忽略正在变形的单词周围的任何空格，但在返回结果时保留该空格。例如，复数（“cat”）返回“cats”。

性别（性别字母）

第三人称复数代词对女性、男性和中性采取相同的形式（例如“他们”）。然而，单数取决于性别（例如“she”、“he”、“it”和“they”——“y”是中性形式。）默认情况下， singular_noun返回中性形式，但是可以选择性别用性别方法。将性别的第一个字母传递给性别以返回单数的 f(eminine)、m(asculine)、n(euter) 或 t(hey) 形式。例如，gender('f') 后跟singular_noun('themselves') 返回'herself'。

编号复数

复数...方法仅返回屈折词，而不是用于屈折词的计数。因此，为了产生“我看到 3 只鸭子”，有必要使用：

print("I saw", N, p.plural_noun(animal, N))

由于产生复数的通常目的是使其与前面的计数一致，所以 inflect.py 提供了一个方法（no(word, count)），给定一个单词和一个（n 可选）计数，返回后面跟着的计数正确变形的词。因此前面的例子可以重写：

print("I saw ", p.no(animal, N))

此外，如果计数为零（或其他一些暗示零的术语，例如“零”、“零”等），则计数被替换为“否”。因此，如果N的值为零，则前面的示例将打印（更优雅一些）：

I saw no animals

而不是：

I saw 0 animals

请注意，该方法的名称是双关语：该方法在变形词前面返回一个数字（a No.）或“no” 。

减少所需的计数

在某些情况下，需要为各种复数...方法提供显式计数会导致令人厌烦的重复。例如：

print( plural_adj("This", errors), plural_noun(" error", errors), plural_verb(" was", errors), " fatal.", )

因此，inflect.py 提供了一种方法（num(count=None, show=None)），可用于设置持久的“默认数字”值。如果设置了这样的值，则随后在省略可选的第二个“数字”参数时使用它。随后可以通过不带参数调用num()来删除由此设置的默认值。因此我们可以重写前面的例子：

p.num(errors) print(p.plural_adj("This"), p.plural_noun(" error"), p.plural_verb(" was"), "fatal.") p.num()

通常，num()返回它的第一个参数，因此它也可以在以下上下文中“内联”：

print(p.num(errors), p.plural_noun(" error"), p.plural_verb(" was"), " detected.") if severity > 1: print( p.plural_adj("This"), p.plural_noun(" error"), p.plural_verb(" was"), "fatal." )

但是，在某些情况下（请参阅INTERPOLATING INFLECTIONS IN STRINGS），最好num()返回一个空字符串。因此num() 提供了一个可选的第二个参数。如果提供了该参数（即，如果它已定义）并且计算结果为 false，则num返回一个空字符串而不是其第一个参数。例如：

print(p.num(errors, 0), p.no("error"), p.plural_verb(" was"), " detected.") if severity > 1: print( p.plural_adj("This"), p.plural_noun(" error"), p.plural_verb(" was"), "fatal." ) 数字不敏感的相等

inflect.py 还通过方法compare(word1, word2)、compare_nouns(word1, word2)、 compare_verbs(word1, word2)和compare_adjs(word1, word2)为比较不同的多个词的问题提供了解决方案。这些方法中的每一个都接受两个字符串，并使用相应的复数变形方法（分别为复数（）、复数_名词（）、复数_动词（）和复数_adj（））比较它们。

比较在以下情况下返回 true：

字符串相等，或

一个字符串等于另一个字符串的复数形式，或

字符串是一个单词的两种不同的复数形式。

因此，以下所有情况都返回 true：

p.compare("index", "index") # RETURNS "eq" p.compare("index", "indexes") # RETURNS "s:p" p.compare("index", "indices") # RETURNS "s:p" p.compare("indexes", "index") # RETURNS "p:s" p.compare("indices", "index") # RETURNS "p:s" p.compare("indices", "indexes") # RETURNS "p:p" p.compare("indexes", "indices") # RETURNS "p:p" p.compare("indices", "indices") # RETURNS "eq"

如前面示例中的注释所示，各种比较方法返回的实际值对三个相等规则中的哪一个进行了编码：如果字符串相同，则返回“eq”，如果字符串是单数则返回“s:p”，以及分别为复数，“p:s”表示复数和单数，“p:p”表示两个不同的复数。不等式通过返回一个空字符串来表示。

应该注意的是，恰好采用相同复数形式的两个不同的单数词不被认为是相等的，一个（单数）词的复数是另一个（复数）词的单数的情况也不被认为是相等的。因此，以下所有内容都返回 false：

p.compare("base", "basis") # ALTHOUGH BOTH -> "bases" p.compare("syrinx", "syringe") # ALTHOUGH BOTH -> "syringes" p.compare("she", "he") # ALTHOUGH BOTH -> "they" p.compare("opus", "operas") # ALTHOUGH "opus" -> "opera" -> "operas" p.compare("taxi", "taxes") # ALTHOUGH "taxi" -> "taxis" -> "taxes"

还要注意，虽然比较是“数字不敏感”，但它不区分大小写（即，plural("time","Times")返回 false。要获得数字和大小写不敏感，请使用lower()方法在两个字符串上（即复数（“time”.lower（）， “Times”.lower（））返回true）。

inflect

inflect

今日新闻

推荐新闻