|||
Douglas Biber (1988,1995)discuss a multidimensional approach to register variation.
The question is:
How to search those linguistic features using different software?
Here enclosed the search algorithms designed by the authors of "Corpus-based language studies: an advanced resource book".
http://cw.routledge.com/textbooks/0415286239/Resources/default.html
Search Algorithms
These search algorithms are designed to extract the 58 linguistic features from CLAWS tagged corpora (C7) for use in a multi-feature/multi-dimensional analysis. A detailed discussion of the functions of these linguistic features can be found in Biber (1988: 211-245). File-based search patterns can be downloaded below. After downloading, extract these compressed text files into c:wsmith. These algorithms are designed for use with WordSmith Tools version 3.
After starting WordSmith, go to ‘Settings – Tags’ and activate ‘Tags to ignore’ (<*>). This allows the program to ignore all elements included in the angular brackets (metadata, comments, etc) in the corpus files. Copy and paste these search patterns into the text box ‘Search word or phrase’. Adjust ‘Context words & Context search horizons’ (left and right) where appropriate as specified for individual algorithms.
Factor 1 (28 linguistic features):
(1) private verbs: c:wsmithprivatev.txt
(2) THAT deletion: c:wsmiththatdel1.txt – c:wsmiththatdel8.txt
(3) contraction: *'*
Context 1L 2R =~*_GE/~"_"/~*_NP*/~*_NN*/~*_MC*/~*_RA/~*_UH*/~*_FO/~'_"
(4) present tense verbs: c:wsmithpresent.txt
(5) 2nd person pronouns: *_PPY/your_APPGE/yourself_PPX1/yourselves_PPX2/ yours_PPGE
(6) DO as pro-verb: *_VD*
Context 0L 4R =~*_XX/~*_PPY/~*_PP?S*/~*_V?I
(7) analytic negation: *_XX
(8) demonstrative pronouns: this_DD1/that_DD1/these_DD2/those_DD2
Context 0L 3R=~*_NN*/~*_NP*/~*_PN1
(9) general emphatics: c:wsmithemphatic.txt
(10) 1st person pronouns: *_PPI*/my_APPGE/our_APPGE/myself_PPX1/ourselves _PPX2/mine_PPGE/ours_PPGE
(11) pronoun IT: it_PPH1
(12) BE as main verb: *_VB*
Context 0L 3R =*_D*/*_A*/*_NNB/*_I*/*_J*/~*_V?G/~*_V?N
(13) causative subordination: because_CS
(14) discourse markers:
a) well_* context 1L 0R = ~AS_*/~FEEL*_V*/~FELT_V*;
b) now_*/anyway*_*/anyhow_*
Context 2L 0R =?_?/AND_*/BUT_*/*_UH/~*_V*/~RIGHT_*
(15) indefinite pronouns: none_PN/*_PN1
(16) general hedges: c:wsmithhedge.txt
(17) amplifiers: c:wsmithamplify.txt
(18) sentence relatives: ,_, which_DDQ
(19) WH questions: ?_? WHAT_DDQ/?_? *_RRQ
Context 0L R3 =*_VD*/*_VB*/*_VH*/*_VM*
(20) possibility modals: can_VM/ca_VM/could_VM/may*_VM/might_VM
(21) non-phrasal coordination:
a) ,_, AND_CC IT_P*/,_, AND_CC SO_*/,_, AND_CC THEN_*/,_, AND_CC YOU_PPY*
b) ,_, AND_CC YOU_PPY/,_, AND_CC THERE_EX *_VB*
c) ,_, AND_CC TH*_DD1/,_, AND_CC TH*_DD2/,_, AND_CC *_PP?S*
(22) WH clauses: c:wsmithpps.txt context 0L 3R= *_DDQ/~?_?/~*_I*
(23) final prepositions: *_I* context 0L 2R=?_?/~(_(
(24) other nouns: *_NN*/*_NP*/*_ND1
Context 0L 0R = ~*TION*_N*/~*MENT*_N*/~*NESS*_N*/~*ITY_N*/~*ITIES _N*
(25) word length: (WordSmith wordlist function: average word length)
(26) prepositions: *_I*
(27) type/token ratio: (WordSmith wordlist function: standardized type/token ratio)
(28) attributive adjectives: *_JJ *_NN*/*_JJ *_JJ
Factor 2 (6 linguistic features):
(29) past tense verbs: *_V?D*
(30) 3rd person pronouns: c:wsmith3persprn.txt
(31) perfect aspect verbs: c:wsmithperf_asp.txt
(32) public verbs: c:wsmithpublicv.txt
(33) synthetic negation: no_AT/neither_*/nor_*
(34) present participial clauses: ,_, *_V?G *_I*/,_, *_V?G *_D*/,_, *_V?G *_P*/,_, *_V?G *_R*
Context L3 0R= ~*_VB*
Factor 3 (7 linguistic features):
(35) WH relative clauses: *_NN* *_PNQ*/WHICH*_DDQ*/WHOSE_DDQGE
Context 1L 0R= ~ASK*_V*/~TELL*_V*/~TOLD_V*/~*_I*/~?_?
(36) pied piping constructions: *_NN* *_PNQ*/WHICH*_DDQ*/WHOSE_DDQGE
Context 1L 0R =*_I*
(37) phrasal coordination: *_R* and _CC *_R*/*_J* and_CC *_J*/*_V* and_CC *_V*/*_N* and_CC *_N*
(38) nominalizations: *tion_N*/*_tions_N*/*ment_N*/*ments_N*/*ness_N*/ *nesses_N*/*ity_N*/*ities_N*
(39) time adverbials: *_RT*
(40) place adverbials: *_RL*
(41) other adverbs: *_R* minus all totals of hedges, amplifiers, downtoners, place adverbials and time adverbials
Factor 4 (6 linguistic features):
(42) infinitives: to_TO *_V?I/to_TO *_R* *_V?I/to_TO *_R* R_* *_V?I
(43) prediction modals: will_VM/wo_VM/shall_VM/sha_VM/'ll_VM/would_VM/ 'd_VM
(44) suasive verbs: c:wsmithsuasivev.txt
(45) conditional subordination: if_CS/unless_CS
(46) necessity modals: ought_VM*/should_VM/must_VM
(47) split auxiliaries: c:wsmithsplitaux.txt
Factor 5 (6 linguistic features):
(48) conjuncts: c:wsmithconjunct.txt
(49) agentless passives: c:wsmithagtlspsv.txt
Context 0L 6R=~by_II
(50) past participial clauses: ?_? *_V?N *_I*/?_? *_V?N *_R*
(51) BY-passives: c:wsmithby_psv.txt
Context 0L 6R=by_II
(52) past participial WHIZ deletions: c:wsmithwhizdel.txt
Context 2L 0R= ~GET*_V*/~GOT_V*/~*_VH*
(53) other adverbial subordinators: c:wsmithotheradv.txt
Factor 6 (4 linguistic features):
(54) THAT clauses as verb complements: *_V* that_CST
(55) demonstratives: THESE_DD2/THOSE_DD2/THIS_DD1/THAT_DD1
Context 0L 3R= *_NN*/*_NP*/*_PN1
(56) THAT relative clauses: *_NN* THAT_CST
Context 0L 4R= *_AT*/*_D*/*_NP*/*_PP*/*_N*2*
(57) THAT clauses as adjective complements: *_JJ that_CST
Context 1L 0R= ~so_*
Factor 7 (1 linguistic feature):
(58) SEEM/APPEAR: seem*_V*/appear*_V
Archiver|手机版|科学网 ( 京ICP备07017567号-12 )
GMT+8, 2023-3-29 06:01
Powered by ScienceNet.cn
Copyright © 2007- 中国科学报社