15 Commits

Author SHA1 Message Date
nemerle
97f093feaa This build requires LLVM, does not need ncurses - modify CMakeLists.txt to match 2016-02-13 15:14:14 +01:00
Artur K.
3561de6e12 Merge pull request #5 from Arthur2e5/patch-1
README: Recognizing code segments
2015-10-20 06:18:10 +00:00
Mingye Wang
e84d09b97c README: Recognizing code segments 2015-10-20 01:15:59 -04:00
Artur K.
d8a4fe1c04 Merge pull request #4 from Arthur2e5/patch-1
README: tweak formatting by a bit
2015-10-20 05:15:49 +00:00
Mingye Wang
e4e6ad6415 README: tweak formatting by a bit
Trying to get a nice balance between Markdown rendering and plain text readability. And I think I got it.
2015-10-20 01:11:34 -04:00
nemerle
2543617930 Remove llvm as a build requirement 2015-08-13 20:46:54 +02:00
nemerle
bc5784a8f2 Fix #1 - just use QFileInfo. 2015-05-28 15:13:43 +02:00
Artur K
842687726f Update the dcc tools code 2015-04-28 14:59:00 +02:00
nemerle
c5c9196561 Fix for functional tests when running on clean checkout 2015-02-10 17:31:57 +01:00
nemerle
a697ad05c0 Add original dcc tools to repository
* makedsig has been integrated with makedstp, it should handle both LIB and TPL files
* other tools have not been modified
2015-02-10 17:28:50 +01:00
Artur K.
d8c66e7791 Update Readme.md 2014-06-05 15:01:12 +02:00
nemerle
337a6c44aa Added original readme 2014-05-25 12:36:39 +02:00
nemerle
cde4484821 Remove unused local 2014-05-25 12:33:18 +02:00
nemerle
36b063c183 Working towards gui integration with exetoc_qt 2014-05-24 17:08:05 +02:00
nemerle
3603877f42 Qt5 command options processing 2014-03-07 20:01:36 +01:00
69 changed files with 10638 additions and 840 deletions

68
3rd_party/libdisasm/INTEL_BUGS vendored Normal file
View File

@@ -0,0 +1,68 @@
PMOVMSKB
Gd, Pq1H
PMOVMSKB
(66)
Gd, Vdq1H
should be
PMOVMSKB
Gd, Qq1H
PMOVMSKB
(66)
Gd, Wdq1H
The instruction represented by this opcode expression does not support any
operand to be a memory location.
MASKMOVQ
Pq, Pq1H
MASKMOVDQU
(66)
Vdq, Vdq1H
should be
MASKMOVQ
Pq, Pq1H
MASKMOVDQU
(66)
Vdq, Wdq1H
MOVMSKPS
Gd, Vps1H
MOVMSKPD
(66)
Gd, Vpd1H
should be
MOVMSKPS
Gd, Wps1H
MOVMSKPD
(66)
Gd, Wpd1H
The opcode table entries for LFS, LGS, and LSS
L[FGS]S
Mp
should be
L[FGS]S
Gv,Mp
MOVHLPS
Vps, Vps
MOVLHPS
Vps, Vps
should be
MOVHLPS
Vps, Wps
MOVLHPS
Vps, Wps

137
3rd_party/libdisasm/LICENSE vendored Normal file
View File

@@ -0,0 +1,137 @@
The "Clarified Artistic License"
Preamble
The intent of this document is to state the conditions under which a
Package may be copied, such that the Copyright Holder maintains some
semblance of artistic control over the development of the package,
while giving the users of the package the right to use and distribute
the Package in a more-or-less customary fashion, plus the right to make
reasonable modifications.
Definitions:
"Package" refers to the collection of files distributed by the
Copyright Holder, and derivatives of that collection of files
created through textual modification.
"Standard Version" refers to such a Package if it has not been
modified, or has been modified in accordance with the wishes
of the Copyright Holder as specified below.
"Copyright Holder" is whoever is named in the copyright or
copyrights for the package.
"You" is you, if you're thinking about copying or distributing
this Package.
"Distribution fee" is a fee you charge for providing a copy of this
Package to another party.
"Freely Available" means that no fee is charged for the right to use
the item, though there may be fees involved in handling the item.
1. You may make and give away verbatim copies of the source form of the
Standard Version of this Package without restriction, provided that you
duplicate all of the original copyright notices and associated disclaimers.
2. You may apply bug fixes, portability fixes and other modifications
derived from the Public Domain, or those made Freely Available, or from
the Copyright Holder. A Package modified in such a way shall still be
considered the Standard Version.
3. You may otherwise modify your copy of this Package in any way, provided
that you insert a prominent notice in each changed file stating how and
when you changed that file, and provided that you do at least ONE of the
following:
a) place your modifications in the Public Domain or otherwise make them
Freely Available, such as by posting said modifications to Usenet or
an equivalent medium, or placing the modifications on a major archive
site allowing unrestricted access to them, or by allowing the Copyright
Holder to include your modifications in the Standard Version of the
Package.
b) use the modified Package only within your corporation or organization.
c) rename any non-standard executables so the names do not conflict
with standard executables, which must also be provided, and provide
a separate manual page for each non-standard executable that clearly
documents how it differs from the Standard Version.
d) make other distribution arrangements with the Copyright Holder.
e) permit and encourge anyone who receives a copy of the modified Package
permission to make your modifications Freely Available in some specific
way.
4. You may distribute the programs of this Package in object code or
executable form, provided that you do at least ONE of the following:
a) distribute a Standard Version of the executables and library files,
together with instructions (in the manual page or equivalent) on where
to get the Standard Version.
b) accompany the distribution with the machine-readable source of
the Package with your modifications.
c) give non-standard executables non-standard names, and clearly
document the differences in manual pages (or equivalent), together
with instructions on where to get the Standard Version.
d) make other distribution arrangements with the Copyright Holder.
e) offer the machine-readable source of the Package, with your
modifications, by mail order.
5. You may charge a distribution fee for any distribution of this Package.
If you offer support for this Package, you may charge any fee you choose
for that support. You may not charge a license fee for the right to use
this Package itself. You may distribute this Package in aggregate with
other (possibly commercial and possibly nonfree) programs as part of a
larger (possibly commercial and possibly nonfree) software distribution,
and charge license fees for other parts of that software distribution,
provided that you do not advertise this Package as a product of your own.
If the Package includes an interpreter, You may embed this Package's
interpreter within an executable of yours (by linking); this shall be
construed as a mere form of aggregation, provided that the complete
Standard Version of the interpreter is so embedded.
6. The scripts and library files supplied as input to or produced as
output from the programs of this Package do not automatically fall
under the copyright of this Package, but belong to whoever generated
them, and may be sold commercially, and may be aggregated with this
Package. If such scripts or library files are aggregated with this
Package via the so-called "undump" or "unexec" methods of producing a
binary executable image, then distribution of such an image shall
neither be construed as a distribution of this Package nor shall it
fall under the restrictions of Paragraphs 3 and 4, provided that you do
not represent such an executable image as a Standard Version of this
Package.
7. C subroutines (or comparably compiled subroutines in other
languages) supplied by you and linked into this Package in order to
emulate subroutines and variables of the language defined by this
Package shall not be considered part of this Package, but are the
equivalent of input as in Paragraph 6, provided these subroutines do
not change the language in any way that would cause it to fail the
regression tests for the language.
8. Aggregation of the Standard Version of the Package with a commercial
distribution is always permitted provided that the use of this Package is
embedded; that is, when no overt attempt is made to make this Package's
interfaces visible to the end user of the commercial distribution.
Such use shall not be construed as a distribution of this Package.
9. The name of the Copyright Holder may not be used to endorse or promote
products derived from this software without specific prior written permission.
10. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
The End

12
3rd_party/libdisasm/NAMESPACE.TXT vendored Normal file
View File

@@ -0,0 +1,12 @@
The rewritten libdisasm code uses the following namespaces:
Prefix Namespace
----------------------------------------------------
x86_ Global 'libdisasm' namespace
ia32_ Internal IA32 ISA namespace
ia64_ Internal IA64 ISA namespace
ix64_ Internal X86-64 ISA namespace
Note that the 64-bit ISAs are not yet supported/written.

2
3rd_party/libdisasm/README vendored Normal file
View File

@@ -0,0 +1,2 @@
This is a cut-up version of libdisasm originally from the bastard project http://bastard.sourceforge.net/

43
3rd_party/libdisasm/TODO vendored Normal file
View File

@@ -0,0 +1,43 @@
x86_format.c
------------
intel: jmpf -> jmp, callf -> call
att: jmpf -> ljmp, callf -> lcall
opcode table
------------
finish typing instructions
fix flag clear/set/toggle types
ix64 stuff
----------
document output file formats in web page
features doc: register aliases, implicit operands, stack mods,
ring0 flags, eflags, cpu model/isa
ia32_handle_* implementation
fix operand 0F C2
CMPPS
* sysenter, sysexit as CALL types -- preceded by MSR writes
* SYSENTER/SYSEXIT stack : overwrites SS, ESP
* stos, cmps, scas, movs, ins, outs, lods -> OP_PTR
* OP_SIZE in implicit operands
* use OP_SIZE to choose reg sizes!
DONE?? :
implicit operands: provide action ?
e.g. add/inc for stach, write, etc
replace table numbers in opcodes.dat with
#defines for table names
replace 0 with INSN_INVALID [or maybe FF for imnvalid and 00 for Not Applicable */
no wait that is only for prefix tables -- n/p
if ( prefx) only use if insn != invalid
these should cover all the wacky disasm exceptions
for the rep one we can chet, match only a 0x90
todo: privilege | ring

36
3rd_party/libdisasm/ia32_fixup.cpp vendored Normal file
View File

@@ -0,0 +1,36 @@
#include <stdio.h>
static const char * mem_fixup[256] = {
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 00 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 08 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 10 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 18 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 20 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 28 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 30 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 38 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 40 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 48 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 50 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 58 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 60 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 68 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 70 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 78 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 80 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 88 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 90 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* 98 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* A0 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* A8 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* B0 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* B8 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* C0 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* C8 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* D0 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* D8 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* E0 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* E8 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, /* F0 */
NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL /* F8 */
};

3206
3rd_party/libdisasm/ia32_opcode.dat vendored Normal file

File diff suppressed because it is too large Load Diff

49
3rd_party/libdisasm/libdisasm.def vendored Normal file
View File

@@ -0,0 +1,49 @@
;libdisasm.def : Declares the module parameters
LIBRARY "libdisasm.dll"
DESCRIPTION "libdisasm exported functions"
EXPORTS
x86_addr_size @1
x86_cleanup @2
x86_disasm @3
x86_disasm_forward @4
x86_disasm_range @5
x86_endian @6
x86_format_header @7
x86_format_insn @8
x86_format_mnemonic @9
x86_format_operand @10
x86_fp_reg @11
x86_get_branch_target @12
x86_get_imm @13
x86_get_options @14
x86_get_raw_imm @15
x86_get_rel_offset @16
x86_imm_signsized @17
x86_imm_sized @18
x86_init @19
x86_insn_is_tagged @20
x86_insn_is_valid @21
x86_invariant_disasm @22
x86_ip_reg @23
x86_max_insn_size @24
x86_op_size @25
x86_operand_1st @26
x86_operand_2nd @27
x86_operand_3rd @28
x86_operand_count @29
x86_operand_foreach @30
x86_operand_new @31
x86_operand_size @32
x86_oplist_free @33
x86_reg_from_id @34
x86_report_error @35
x86_set_insn_addr @36
x86_set_insn_block @37
x86_set_insn_function @38
x86_set_insn_offset @39
x86_set_options @40
x86_set_reporter @41
x86_size_disasm @42
x86_sp_reg @43
x86_tag_insn @44

View File

@@ -1,5 +1,8 @@
PROJECT(dcc_original)
CMAKE_MINIMUM_REQUIRED(VERSION 2.8)
cmake_minimum_required(VERSION 2.8.9)
set(CMAKE_INCLUDE_CURRENT_DIR ON)
set(CMAKE_AUTOMOC ON)
find_package(Qt5Core)
OPTION(dcc_build_tests "Enable unit tests." OFF)
#SET(LIBRARY_OUTPUT_PATH ${PROJECT_SOURCE_DIR})
@@ -9,31 +12,41 @@ IF(CMAKE_BUILD_TOOL MATCHES "(msdev|devenv|nmake)")
ADD_DEFINITIONS(/W4)
ELSE()
#-D_GLIBCXX_DEBUG
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall --std=c++0x")
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall -std=c++11")
SET(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} " ) #--coverage
ENDIF()
SET(CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/CMakeScripts;${CMAKE_MODULE_PATH})
SET(EXECUTABLE_OUTPUT_PATH ${PROJECT_SOURCE_DIR})
include(cotire)
FIND_PACKAGE(LLVM)
FIND_PACKAGE(Boost)
IF(dcc_build_tests)
enable_testing()
FIND_PACKAGE(GMock)
ENDIF()
ADD_SUBDIRECTORY(3rd_party)
llvm_map_components_to_libraries(REQ_LLVM_LIBRARIES jit native mc support tablegen)
find_package(LLVM REQUIRED CONFIG)
llvm_map_components_to_libnames(REQ_LLVM_LIBRARIES native mc support tablegen)
INCLUDE_DIRECTORIES(
3rd_party/libdisasm
include
include/idioms
common
${Boost_INCLUDE_DIRS}
${LLVM_INCLUDE_DIRS}
)
ADD_SUBDIRECTORY(3rd_party)
ADD_SUBDIRECTORY(common)
ADD_SUBDIRECTORY(tools)
set(dcc_LIB_SOURCES
src/CallConvention.cpp
src/ast.cpp
src/backend.cpp
src/bundle.cpp
@@ -42,9 +55,9 @@ set(dcc_LIB_SOURCES
src/control.cpp
src/dataflow.cpp
src/disassem.cpp
src/DccFrontend.cpp
src/error.cpp
src/fixwild.cpp
src/frontend.cpp
src/graph.cpp
src/hlicode.cpp
src/hltype.cpp
@@ -63,7 +76,6 @@ set(dcc_LIB_SOURCES
src/locident.cpp
src/liveness_set.cpp
src/parser.cpp
src/perfhlib.cpp
src/procs.cpp
src/project.cpp
src/Procedure.cpp
@@ -73,7 +85,7 @@ set(dcc_LIB_SOURCES
src/symtab.cpp
src/udm.cpp
src/BasicBlock.cpp
src/CallConvention.cpp
src/dcc_interface.cpp
)
set(dcc_SOURCES
src/dcc.cpp
@@ -82,6 +94,7 @@ set(dcc_HEADERS
include/ast.h
include/bundle.h
include/BinaryImage.h
include/DccFrontend.h
include/dcc.h
include/disassem.h
include/dosdcc.h
@@ -100,7 +113,7 @@ set(dcc_HEADERS
include/idioms/shift_idioms.h
include/idioms/xor_idioms.h
include/locident.h
include/perfhlib.h
include/CallConvention.h
include/project.h
include/scanner.h
include/state.h
@@ -109,20 +122,23 @@ set(dcc_HEADERS
include/Procedure.h
include/StackFrame.h
include/BasicBlock.h
include/CallConvention.h
include/dcc_interface.h
)
SOURCE_GROUP(Source FILES ${dcc_SOURCES})
SOURCE_GROUP(Headers FILES ${dcc_HEADERS})
ADD_LIBRARY(dcc_lib STATIC ${dcc_LIB_SOURCES} ${dcc_HEADERS})
qt5_use_modules(dcc_lib Core)
#cotire(dcc_lib)
ADD_EXECUTABLE(dcc_original ${dcc_SOURCES} ${dcc_HEADERS})
ADD_DEPENDENCIES(dcc_original dcc_lib)
TARGET_LINK_LIBRARIES(dcc_original LLVMSupport dcc_lib disasm_s ${REQ_LLVM_LIBRARIES} LLVMSupport)
TARGET_LINK_LIBRARIES(dcc_original dcc_lib dcc_hash disasm_s ${REQ_LLVM_LIBRARIES} LLVMSupport)
qt5_use_modules(dcc_original Core)
#ADD_SUBDIRECTORY(gui)
if(dcc_build_tests)
ADD_SUBDIRECTORY(src)
endif()

File diff suppressed because it is too large Load Diff

127
Readme.md Normal file
View File

@@ -0,0 +1,127 @@
I've fixed many issues in this codebase, among other things - memory reallocation during decompilation.
To reflect those fixes, I've edited the original readme a bit.
* * *
dcc Distribution
================
The code provided in this distribution is (C) by their authors:
- Cristina Cifuentes (most of dcc code)
- Mike van Emmerik (signatures and prototype code)
- Jeff Ledermann (some disassembly code)
and is provided "as is". Additional contributor list is available
[on GitHub](https://github.com/nemerle/dcc/graphs/contributors).
The following files are included in the dccoo.tar.gz distribution:
- dcc.zip (dcc.exe DOS program, 1995)
- dccsrc.zip (source code *.c, *.h for dcc, 1993-1994)
- dcc32.zip (dcc_oo.exe 32 bit console (Win95/Win-NT) program, 1997)
- dccsrcoo.zip (source code *.cpp, *.h for "oo" dcc, 1993-1997)
- dccbsig.zip (library signatures for Borland C compilers, 1994)
- dccmsig.zip (library signatures for Microsoft C compilers, 1994)
- dcctpsig.zip (library signatures for Turbo Pascal compilers, 1994)
- dcclibs.dat (prototype file for C headers, 1994)
- test.zip (sample test files: *.c *.exe *.b, 1993-1996)
- makedsig.zip (creates a .sig file from a .lib C file, 1994)
- makedstp.zip (creates a .sig file from a Pascal library file, 1994)
- readsig.zip (reads signatures in a .sig file, 1994)
- dispsrch.zip (displays the name of a function given a signature, 1994)
- parsehdr.zip (generates a prototype file (dcclibs.dat) from C *.h files, 1994)
Note that the dcc_oo.exe program (in dcc32.zip) is a 32 bit program,
so it won't work under Windows 3.1. Also, it is a console mode program,
meaning that it has to be run in the "Command Prompt" window (sometimes
known as the "Dos Box"). It is not a GUI program.
The following files are included in the test.zip file: fibo,
benchsho, benchlng, benchfn, benchmul, byteops, intops, longops,
max, testlong, matrixmu, strlen, dhamp.
The version of dcc included in this distribution (dccsrcoo.zip and
dcc32.exe) is a bit better than the first release, but it is still
broken in some cases, and we do not have the time to work in this
project at present so we cannot provide any changes.
Comments on individual files:
- fibo (fibonacci): the small model (fibos.exe) decompiles correctly,
the large model (fibol.exe) expects an extra argument for
`scanf()`. This argument is the segment and is not displayed.
- benchsho: the first `scanf()` takes loc0 as an argument. This is
part of a long variable, but dcc does not have any clue at that
stage that the stack offset pushed on the stack is to be used
as a long variable rather than an integer variable.
- benchlng: as part of the `main()` code, `LO(loc1) | HI(loc1)` should
be displayed instead of `loc3 | loc9`. These two integer variables
are equivalent to the one long loc1 variable.
- benchfn: see benchsho.
- benchmul: see benchsho.
- byteops: decompiles correctly.
- intops: the du analysis for `DIV` and `MOD` is broken. dcc currently
generates code for a long and an integer temporary register that
were used as part of the analysis.
- longops: decompiles correctly.
- max: decompiles correctly.
- testlong: this example decompiles correctly given the algorithms
implemented in dcc. However, it shows that when long variables
are defined and used as integers (or long) without giving dcc
any hint that this is happening, the variable will be treated as
two integer variables. This is due to the fact that the assembly
code is in terms of integer registers, and long registers are not
available in 80286, so a long variable is equivalent to two integer
registers. dcc only knows of this through idioms such as add two
long variables.
- matrixmu: decompiles correctly. Shows that arrays are not supported
in dcc.
- strlen: decompiles correctly. Shows that pointers are partially
supported by dcc.
- dhamp: this program has far more data types than what dcc recognizes
at present.
Our thanks to Gary Shaffstall for some debugging work. Current bugs
are:
- [ ] if the code generated in the one line is too long, the (static)
buffer used for that line is clobbered. Solution: make the buffer
larger (currently 200 chars).
- [ ] the large memory model problem & `scanf()`
- [ ] dcc's error message shows a p option available which doesn't
exist, and doesn't show an i option which exists.
- [x] there is a nasty problem whereby some arrays can get reallocated
to a new address, and some pointers can become invalid. This mainly
tends to happen to larger executable files. A major rewrite will
probably be required to fix this.
For more information refer to the thesis "Reverse Compilation
Techniques" by Cristina Cifuentes, Queensland University of
Technology, 1994, and the dcc home page:
http://www.it.uq.edu.au/groups/csm/dcc_readme.html
Please note that the executable version of dcc provided in this
distribution does not necessarily match the source code provided,
some changes were done without us keeping track of every change.
Using dcc
---------
Here is a very brief summary of switches for dcc:
* `a1`, `a2`: assembler output, before and after re-ordering of input code
* `c`: Attempt to follow control through indirect call instructions
* `i`: Enter interactive disassembler
* `m`: Memory map
* `s`: Statistics summary
* `v`, `V`: verbose (and Very verbose)
* `o` filename: Use filename as assembler output file
If dcc encounters illegal instructions, it will attempt to enter the so called
interactive disassembler. The idea of this was to allow commands to fix the
problem so that dcc could continue, but no such changes are implemented
as yet. (Note: the Unix versions do not have the interactive disassembler). If
you get into this, you can get out of it by pressing `^X` (control-X). Once dcc
has entered the interactive disassembler, however, there is little chance that
it will recover and produce useful output.
If dcc loads the signature file `dccxxx.sig`, this means that it has not
recognised the compiler library used. You can place the signatures in a
different direcory to where you are working if you set the DCC environment
variable to point to their path. Note that if dcc can't find its signature
files, it will be severely handicapped.

View File

@@ -2,5 +2,6 @@
#cd bld
#make -j5
#cd ..
mkdir -p tests/outputs
./test_use_base.sh
./regression_tester.rb ./dcc_original -s -c 2>stderr >stdout; diff -wB tests/prev/ tests/outputs/

7
common/CMakeLists.txt Normal file
View File

@@ -0,0 +1,7 @@
set(SRC
perfhlib.cpp
perfhlib.h
PatternCollector.h
)
add_library(dcc_hash STATIC ${SRC})

82
common/PatternCollector.h Normal file
View File

@@ -0,0 +1,82 @@
#ifndef PATTERNCOLLECTOR
#define PATTERNCOLLECTOR
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <vector>
#define SYMLEN 16 /* Number of chars in the symbol name, incl null */
#define PATLEN 23 /* Number of bytes in the pattern part */
struct HASHENTRY
{
char name[SYMLEN]; /* The symbol name */
uint8_t pat [PATLEN]; /* The pattern */
uint16_t offset; /* Offset (needed temporarily) */
};
struct PatternCollector {
uint8_t buf[100], bufSave[7]; /* Temp buffer for reading the file */
uint16_t readShort(FILE *f)
{
uint8_t b1, b2;
if (fread(&b1, 1, 1, f) != 1)
{
printf("Could not read\n");
exit(11);
}
if (fread(&b2, 1, 1, f) != 1)
{
printf("Could not read\n");
exit(11);
}
return (b2 << 8) + b1;
}
void grab(FILE *f,int n)
{
if (fread(buf, 1, n, f) != (size_t)n)
{
printf("Could not read\n");
exit(11);
}
}
uint8_t readByte(FILE *f)
{
uint8_t b;
if (fread(&b, 1, 1, f) != 1)
{
printf("Could not read\n");
exit(11);
}
return b;
}
uint16_t readWord(FILE *fl)
{
uint8_t b1, b2;
b1 = readByte(fl);
b2 = readByte(fl);
return b1 + (b2 << 8);
}
/* Called by map(). Return the i+1th key in *pKeys */
uint8_t *getKey(int i)
{
return keys[i].pat;
}
/* Display key i */
void dispKey(int i)
{
printf("%s", keys[i].name);
}
std::vector<HASHENTRY> keys; /* array of keys */
virtual int readSyms(FILE *f)=0;
};
#endif // PATTERNCOLLECTOR

440
common/perfhlib.cpp Normal file
View File

@@ -0,0 +1,440 @@
/*
*$Log: perfhlib.c,v $
* Revision 1.5 93/09/29 14:45:02 emmerik
* Oops, didn't do the casts last check in
*
* Revision 1.4 93/09/29 14:41:45 emmerik
* Added casts to mod instructions to keep the SVR4 compiler happy
*
*
* Perfect hashing function library. Contains functions to generate perfect
* hashing functions
*/
#include "perfhlib.h"
#include "PatternCollector.h"
#include <stdio.h>
#include <cassert>
#include <stdlib.h>
#include <string.h>
/* Private data structures */
//static int NumEntry; /* Number of entries in the hash table (# keys) */
//static int EntryLen; /* Size (bytes) of each entry (size of keys) */
//static int SetSize; /* Size of the char set */
//static char SetMin; /* First char in the set */
//static int NumVert; /* c times NumEntry */
//static uint16_t *T1base, *T2base; /* Pointers to start of T1, T2 */
static uint16_t *T1, *T2; /* Pointers to T1[i], T2[i] */
static int *graphNode; /* The array of edges */
static int *graphNext; /* Linked list of edges */
static int *graphFirst;/* First edge at a vertex */
static int numEdges; /* An edge counter */
static bool *visited; /* Array of bools: whether visited */
static bool *deleted; /* Array of bools: whether deleted */
/* Private prototypes */
static void initGraph(void);
static void addToGraph(int e, int v1, int v2);
static bool isCycle(void);
static void duplicateKeys(int v1, int v2);
void PerfectHash::setHashParams(int _NumEntry, int _EntryLen, int _SetSize, char _SetMin,
int _NumVert)
{
/* These parameters are stored in statics so as to obviate the need for
passing all these (or defererencing pointers) for every call to hash()
*/
NumEntry = _NumEntry;
EntryLen = _EntryLen;
SetSize = _SetSize;
SetMin = _SetMin;
NumVert = _NumVert;
/* Allocate the variable sized tables etc */
if ((T1base = (uint16_t *)malloc(EntryLen * SetSize * sizeof(uint16_t))) == 0)
{
goto BadAlloc;
}
if ((T2base = (uint16_t *)malloc(EntryLen * SetSize * sizeof(uint16_t))) == 0)
{
goto BadAlloc;
}
if ((graphNode = (int *)malloc((NumEntry*2 + 1) * sizeof(int))) == 0)
{
goto BadAlloc;
}
if ((graphNext = (int *)malloc((NumEntry*2 + 1) * sizeof(int))) == 0)
{
goto BadAlloc;
}
if ((graphFirst = (int *)malloc((NumVert + 1) * sizeof(int))) == 0)
{
goto BadAlloc;
}
if ((g = (short *)malloc((NumVert+1) * sizeof(short))) == 0)
{
goto BadAlloc;
}
if ((visited = (bool *)malloc((NumVert+1) * sizeof(bool))) == 0)
{
goto BadAlloc;
}
if ((deleted = (bool *)malloc((NumEntry+1) * sizeof(bool))) == 0)
{
goto BadAlloc;
}
return;
BadAlloc:
printf("Could not allocate memory\n");
hashCleanup();
exit(1);
}
void PerfectHash::hashCleanup(void)
{
/* Free the storage for variable sized tables etc */
if (T1base) free(T1base);
if (T2base) free(T2base);
if (graphNode) free(graphNode);
if (graphNext) free(graphNext);
if (graphFirst) free(graphFirst);
if (g) free(g);
if (visited) free(visited);
if (deleted) free(deleted);
}
void PerfectHash::map(PatternCollector *collector)
{
m_collector = collector;
assert(nullptr!=collector);
int i, j, c;
uint16_t f1, f2;
bool cycle;
uint8_t *keys;
c = 0;
do
{
initGraph();
cycle = false;
/* Randomly generate T1 and T2 */
for (i=0; i < SetSize*EntryLen; i++)
{
T1base[i] = rand() % NumVert;
T2base[i] = rand() % NumVert;
}
for (i=0; i < NumEntry; i++)
{
f1 = 0; f2 = 0;
keys = m_collector->getKey(i);
for (j=0; j < EntryLen; j++)
{
T1 = T1base + j * SetSize;
T2 = T2base + j * SetSize;
f1 += T1[keys[j] - SetMin];
f2 += T2[keys[j] - SetMin];
}
f1 %= (uint16_t)NumVert;
f2 %= (uint16_t)NumVert;
if (f1 == f2)
{
/* A self loop. Reject! */
printf("Self loop on vertex %d!\n", f1);
cycle = true;
break;
}
addToGraph(numEdges++, f1, f2);
}
if (cycle || (cycle = isCycle())) /* OK - is there a cycle? */
{
printf("Iteration %d\n", ++c);
}
else
{
break;
}
}
while (/* there is a cycle */ 1);
}
/* Initialise the graph */
void PerfectHash::initGraph()
{
int i;
for (i=1; i <= NumVert; i++)
{
graphFirst[i] = 0;
}
for (i= -NumEntry; i <= NumEntry; i++)
{
/* No need to init graphNode[] as they will all be filled by successive
calls to addToGraph() */
graphNext[NumEntry+i] = 0;
}
numEdges = 0;
}
/* Add an edge e between vertices v1 and v2 */
/* e, v1, v2 are 0 based */
void PerfectHash::addToGraph(int e, int v1, int v2)
{
e++; v1++; v2++; /* So much more convenient */
graphNode[NumEntry+e] = v2; /* Insert the edge information */
graphNode[NumEntry-e] = v1;
graphNext[NumEntry+e] = graphFirst[v1]; /* Insert v1 to list of alphas */
graphFirst[v1]= e;
graphNext[NumEntry-e] = graphFirst[v2]; /* Insert v2 to list of omegas */
graphFirst[v2]= -e;
}
bool PerfectHash::DFS(int parentE, int v)
{
int e, w;
/* Depth first search of the graph, starting at vertex v, looking for
cycles. parent and v are origin 1. Note parent is an EDGE,
not a vertex */
visited[v] = true;
/* For each e incident with v .. */
for (e = graphFirst[v]; e; e = graphNext[NumEntry+e])
{
uint8_t *key1;
if (deleted[abs(e)])
{
/* A deleted key. Just ignore it */
continue;
}
key1 = m_collector->getKey(abs(e)-1);
w = graphNode[NumEntry+e];
if (visited[w])
{
/* Did we just come through this edge? If so, ignore it. */
if (abs(e) != abs(parentE))
{
/* There is a cycle in the graph. There is some subtle code here
to work around the distinct possibility that there may be
duplicate keys. Duplicate keys will always cause unit
cycles, since f1 and f2 (used to select v and w) will be the
same for both. The edges (representing an index into the
array of keys) are distinct, but the key values are not.
The logic is as follows: for the candidate edge e, check to
see if it terminates in the parent vertex. If so, we test
the keys associated with e and the parent, and if they are
the same, we can safely ignore e for the purposes of cycle
detection, since edge e adds nothing to the cycle. Cycles
involving v, w, and e0 will still be found. The parent
edge was not similarly eliminated because at the time when
it was a candidate, v was not yet visited.
We still have to remove the key from further consideration,
since each edge is visited twice, but with a different
parent edge each time.
*/
/* We save some stack space by calculating the parent vertex
for these relatively few cases where it is needed */
int parentV = graphNode[NumEntry-parentE];
if (w == parentV)
{
uint8_t *key2;
key2=m_collector->getKey(abs(parentE)-1);
if (memcmp(key1, key2, EntryLen) == 0)
{
printf("Duplicate keys with edges %d and %d (",
e, parentE);
m_collector->dispKey(abs(e)-1);
printf(" & ");
m_collector->dispKey(abs(parentE)-1);
printf(")\n");
deleted[abs(e)] = true; /* Wipe the key */
}
else
{
/* A genuine (unit) cycle. */
printf("There is a unit cycle involving vertex %d and edge %d\n", v, e);
return true;
}
}
else
{
/* We have reached a previously visited vertex not the
parent. Therefore, we have uncovered a genuine cycle */
printf("There is a cycle involving vertex %d and edge %d\n", v, e);
return true;
}
}
}
else /* Not yet seen. Traverse it */
{
if (DFS(e, w))
{
/* Cycle found deeper down. Exit */
return true;
}
}
}
return false;
}
bool PerfectHash::isCycle(void)
{
int v, e;
for (v=1; v <= NumVert; v++)
{
visited[v] = false;
}
for (e=1; e <= NumEntry; e++)
{
deleted[e] = false;
}
for (v=1; v <= NumVert; v++)
{
if (!visited[v])
{
if (DFS(-32767, v))
{
return true;
}
}
}
return false;
}
void PerfectHash::traverse(int u)
{
int w, e;
visited[u] = true;
/* Find w, the neighbours of u, by searching the edges e associated with u */
e = graphFirst[1+u];
while (e)
{
w = graphNode[NumEntry+e]-1;
if (!visited[w])
{
g[w] = (abs(e)-1 - g[u]) % NumEntry;
if (g[w] < 0) g[w] += NumEntry; /* Keep these positive */
traverse(w);
}
e = graphNext[NumEntry+e];
}
}
void PerfectHash::assign(void)
{
int v;
for (v=0; v < NumVert; v++)
{
g[v] = 0; /* g is sparse; leave the gaps 0 */
visited[v] = false;
}
for (v=0; v < NumVert; v++)
{
if (!visited[v])
{
g[v] = 0;
traverse(v);
}
}
}
int PerfectHash::hash(uint8_t *string)
{
uint16_t u, v;
int j;
u = 0;
for (j=0; j < EntryLen; j++)
{
T1 = T1base + j * SetSize;
u += T1[string[j] - SetMin];
}
u %= NumVert;
v = 0;
for (j=0; j < EntryLen; j++)
{
T2 = T2base + j * SetSize;
v += T2[string[j] - SetMin];
}
v %= NumVert;
return (g[u] + g[v]) % NumEntry;
}
#if 0
void dispRecord(int i);
void
duplicateKeys(int v1, int v2)
{
int i, j;
uint8_t *keys;
int u, v;
v1--; v2--; /* These guys are origin 1 */
printf("Duplicate keys:\n");
for (i=0; i < NumEntry; i++)
{
getKey(i, &keys);
u = 0;
for (j=0; j < EntryLen; j++)
{
T1 = T1base + j * SetSize;
u += T1[keys[j] - SetMin];
}
u %= NumVert;
if ((u != v1) && (u != v2)) continue;
v = 0;
for (j=0; j < EntryLen; j++)
{
T2 = T2base + j * SetSize;
v += T2[keys[j] - SetMin];
}
v %= NumVert;
if ((v == v2) || (v == v1))
{
printf("Entry #%d key: ", i+1);
for (j=0; j < EntryLen; j++) printf("%02X ", keys[j]);
printf("\n");
dispRecord(i+1);
}
}
exit(1);
}
#endif

37
common/perfhlib.h Normal file
View File

@@ -0,0 +1,37 @@
#include <stdint.h>
/** Perfect hashing function library. Contains functions to generate perfect
hashing functions */
struct PatternCollector;
struct PerfectHash {
uint16_t *T1base;
uint16_t *T2base; /* Pointers to start of T1, T2 */
short *g; /* g[] */
int NumEntry; /* Number of entries in the hash table (# keys) */
int EntryLen; /* Size (bytes) of each entry (size of keys) */
int SetSize; /* Size of the char set */
char SetMin; /* First char in the set */
int NumVert; /* c times NumEntry */
/** Set the parameters for the hash table */
void setHashParams(int _numEntry, int _entryLen, int _setSize, char _setMin, int _numVert);
public:
void map(PatternCollector * collector); /* Part 1 of creating the tables */
void hashCleanup(); /* Frees memory allocated by setHashParams() */
void assign(); /* Part 2 of creating the tables */
int hash(uint8_t *string); /* Hash the string to an int 0 .. NUMENTRY-1 */
const uint16_t *readT1(void) const { return T1base; }
const uint16_t *readT2(void) const { return T2base; }
const uint16_t *readG(void) const { return (uint16_t *)g; }
uint16_t *readT1(void){ return T1base; }
uint16_t *readT2(void){ return T2base; }
uint16_t *readG(void) { return (uint16_t *)g; }
private:
void initGraph();
void addToGraph(int e, int v1, int v2);
bool isCycle();
bool DFS(int parentE, int v);
void traverse(int u);
PatternCollector *m_collector; /* used to retrieve the keys */
};

View File

@@ -1,3 +1,4 @@
#!/bin/bash
makedir -p tests/outputs
./test_use_all.sh
./regression_tester.rb ./dcc_original -s -c 2>stderr >stdout; diff -wB tests/prev/ tests/outputs/

View File

@@ -1,5 +1,6 @@
#pragma once
#include <stdint.h>
#include <vector>
struct PROG /* Loaded program image parameters */
{
int16_t initCS;
@@ -8,15 +9,17 @@ struct PROG /* Loaded program image parameters */
uint16_t initSP;
bool fCOM; /* Flag set if COM program (else EXE)*/
int cReloc; /* No. of relocation table entries */
uint32_t * relocTable; /* Ptr. to relocation table */
std::vector<uint32_t> relocTable; /* Ptr. to relocation table */
uint8_t * map; /* Memory bitmap ptr */
int cProcs; /* Number of procedures so far */
int offMain; /* The offset of the main() proc */
uint16_t segMain; /* The segment of the main() proc */
bool bSigs; /* True if signatures loaded */
int cbImage; /* Length of image in bytes */
const uint8_t *image() const {return Imagez;}
uint8_t * Imagez; /* Allocated by loader to hold entire program image */
int addressingMode;
public:
const uint8_t *image() const {return Imagez;}
void displayLoadInfo();
};

19
include/CallGraph.h Normal file
View File

@@ -0,0 +1,19 @@
#pragma once
#include "Procedure.h"
/* CALL GRAPH NODE */
struct CALL_GRAPH
{
ilFunction proc; /* Pointer to procedure in pProcList */
std::vector<CALL_GRAPH *> outEdges; /* array of out edges */
public:
void write();
CALL_GRAPH()
{
}
public:
void writeNodeCallGraph(int indIdx);
bool insertCallGraph(ilFunction caller, ilFunction callee);
bool insertCallGraph(Function *caller, ilFunction callee);
void insertArc(ilFunction newProc);
};
//extern CALL_GRAPH * callGraph; /* Pointer to the head of the call graph */

17
include/DccFrontend.h Normal file
View File

@@ -0,0 +1,17 @@
#pragma once
#include <QObject>
class Project;
class DccFrontend : public QObject
{
Q_OBJECT
void LoadImage();
void parse(Project &proj);
std::string m_fname;
public:
explicit DccFrontend(QObject *parent = 0);
bool FrontEnd(); /* frontend.c */
signals:
public slots:
};

View File

@@ -9,6 +9,7 @@
#include <utility>
#include <algorithm>
#include <bitset>
#include <QtCore/QString>
#include "Enums.h"
#include "types.h"
@@ -26,7 +27,7 @@ extern bundle cCode; /* Output C procedure's declaration and code */
/**** Global variables ****/
extern char *asm1_name, *asm2_name; /* Assembler output filenames */
extern QString asm1_name, asm2_name; /* Assembler output filenames */
typedef struct { /* Command line option flags */
unsigned verbose : 1;
@@ -37,7 +38,7 @@ typedef struct { /* Command line option flags */
unsigned Stats : 1;
unsigned Interact : 1; /* Interactive mode */
unsigned Calls : 1; /* Follow register indirect calls */
char filename[80]; /* The input filename */
QString filename; /* The input filename */
} OPTION;
extern OPTION option; /* Command line options */
@@ -71,22 +72,11 @@ extern STATS stats; /* Icode statistics */
/**** Global function prototypes ****/
class DccFrontend
{
void LoadImage(Project &proj);
void parse(Project &proj);
std::string m_fname;
public:
DccFrontend(const std::string &fname) : m_fname(fname)
{
}
bool FrontEnd(); /* frontend.c */
};
void udm(void); /* udm.c */
void freeCFG(BB * cfg); /* graph.c */
BB * newBB(BB *, int, int, uint8_t, int, Function *); /* graph.c */
void BackEnd(char *filename, CALL_GRAPH *); /* backend.c */
void BackEnd(CALL_GRAPH *); /* backend.c */
extern char *cChar(uint8_t c); /* backend.c */
eErrorId scan(uint32_t ip, ICODE &p); /* scanner.c */
void parse (CALL_GRAPH * *); /* parser.c */

25
include/dcc_interface.h Normal file
View File

@@ -0,0 +1,25 @@
#pragma once
#include "Procedure.h"
#include <QtCore/QObject>
#include <QtCore/QDir>
#include <llvm/ADT/ilist.h>
class IXmlTarget;
struct IDcc {
static IDcc *get();
virtual void BaseInit()=0;
virtual void Init(QObject *tgt)=0;
virtual lFunction::iterator GetFirstFuncHandle()=0;
virtual lFunction::iterator GetCurFuncHandle()=0;
virtual void analysis_Once()=0;
virtual void load(QString name)=0; // load and preprocess -> find entry point
virtual void prtout_asm(IXmlTarget *,int level=0)=0;
virtual void prtout_cpp(IXmlTarget *,int level=0)=0;
virtual size_t getFuncCount()=0;
virtual const lFunction &validFunctions() const =0;
virtual void SetCurFunc_by_Name(QString )=0;
virtual QDir installDir()=0;
virtual QDir dataDir(QString kind)=0;
};

View File

@@ -1,38 +0,0 @@
#pragma once
/* Perfect hashing function library. Contains functions to generate perfect
hashing functions
* (C) Mike van Emmerik
*/
#include <stdint.h>
/* Prototypes */
void hashCleanup(void); /* Frees memory allocated by hashParams() */
void map(void); /* Part 1 of creating the tables */
/* The application must provide these functions: */
void getKey(int i, uint8_t **pKeys);/* Set *keys to point to the i+1th key */
void dispKey(int i); /* Display the key */
class PatternHasher
{
uint16_t *T1base, *T2base; /* Pointers to start of T1, T2 */
int NumEntry; /* Number of entries in the hash table (# keys) */
int EntryLen; /* Size (bytes) of each entry (size of keys) */
int SetSize; /* Size of the char set */
char SetMin; /* First char in the set */
int NumVert; /* c times NumEntry */
int *graphNode; /* The array of edges */
int *graphNext; /* Linked list of edges */
int *graphFirst;/* First edge at a vertex */
public:
uint16_t *readT1(void); /* Returns a pointer to the T1 table */
uint16_t *readT2(void); /* Returns a pointer to the T2 table */
uint16_t *readG(void); /* Returns a pointer to the g table */
void init(int _NumEntry, int _EntryLen, int _SetSize, char _SetMin,int _NumVert); /* Set the parameters for the hash table */
void cleanup();
int hash(unsigned char *string); //!< Hash the string to an int 0 .. NUMENTRY-1
};
extern PatternHasher g_pattern_hasher;
/* Macro reads a LH uint16_t from the image regardless of host convention */
#ifndef LH
#define LH(p) ((int)((uint8_t *)(p))[0] + ((int)((uint8_t *)(p))[1] << 8))
#endif

View File

@@ -8,22 +8,25 @@
#include <boost/icl/interval_map.hpp>
#include <boost/icl/split_interval_map.hpp>
#include <unordered_set>
#include <QtCore/QString>
#include "symtab.h"
#include "BinaryImage.h"
#include "Procedure.h"
class QString;
class SourceMachine;
struct CALL_GRAPH;
class IProject
{
virtual PROG *binary()=0;
virtual const std::string & project_name() const =0;
virtual const std::string & binary_path() const =0;
virtual const QString & project_name() const =0;
virtual const QString & binary_path() const =0;
};
class Project : public IProject
{
static Project *s_instance;
std::string m_fname;
std::string m_project_name;
QString m_fname;
QString m_project_name;
QString m_output_path;
public:
typedef llvm::iplist<Function> FunctionListType;
@@ -41,9 +44,12 @@ typedef FunctionListType lFunction;
Project(); // default constructor,
public:
void create(const std::string & a);
const std::string &project_name() const {return m_project_name;}
const std::string &binary_path() const {return m_fname;}
void create(const QString &a);
bool load();
const QString &output_path() const {return m_output_path;}
const QString &project_name() const {return m_project_name;}
const QString &binary_path() const {return m_fname;}
QString output_name(const char *ext);
ilFunction funcIter(Function *to_find);
ilFunction findByEntry(uint32_t entry);
ilFunction createFunction(FunctionType *f,const std::string &name);
@@ -60,6 +66,7 @@ public:
PROG * binary() {return &prog;}
SourceMachine *machine();
const FunctionListType &functions() const { return pProcList; }
protected:
void initialize();
void writeGlobSymTable();

View File

@@ -36,21 +36,13 @@ struct SYM : public SymbolCommon
struct STKSYM : public SymbolCommon
{
typedef int16_t tLabel;
Expr *actual; /* Expression tree of actual parameter */
AstIdent *regs; /* For register arguments only */
tLabel label; /* Immediate off from BP (+:args, -:params) */
uint8_t regOff; /* Offset is a register (e.g. SI, DI) */
bool hasMacro; /* This type needs a macro */
Expr *actual=0; /* Expression tree of actual parameter */
AstIdent *regs=0; /* For register arguments only */
tLabel label=0; /* Immediate off from BP (+:args, -:params) */
uint8_t regOff=0; /* Offset is a register (e.g. SI, DI) */
bool hasMacro=false; /* This type needs a macro */
std::string macro; /* Macro name */
bool invalid; /* Boolean: invalid entry in formal arg list*/
STKSYM()
{
actual=0;
regs=0;
label=0;
regOff=0;
invalid=hasMacro = false;
}
bool invalid=false; /* Boolean: invalid entry in formal arg list*/
void setArgName(int i)
{
char buf[32];

BIN
prototypes/dcclibs.dat Normal file

Binary file not shown.

View File

@@ -14,9 +14,9 @@ def perform_test(exepath,filepath,outname,args)
filepath=path_local(filepath)
joined_args = args.join(' ')
printf("calling:" + "#{exepath} -a1 #{joined_args} -o#{output_path}.a1 #{filepath}\n")
STDERR << "Errors for : #{filepath}"
result = `#{exepath} -a1 -o#{output_path}.a1 #{filepath}`
result = `#{exepath} -a2 #{joined_args} -o#{output_path}.a2 #{filepath}`
STDERR << "Errors for : #{filepath}\n"
result = `#{exepath} -a 1 -o#{output_path}.a1 #{filepath}`
result = `#{exepath} -a 2 #{joined_args} -o#{output_path}.a2 #{filepath}`
result = `#{exepath} #{joined_args} -o#{output_path} #{filepath}`
puts result
p $?

BIN
sigs/dccb2s.sig Normal file

Binary file not shown.

View File

@@ -28,11 +28,11 @@ BB *BB::Create(const rCODE &r,eBBKind _nodeType, Function *parent)
pnewBB->loopHead = pnewBB->caseHead = pnewBB->caseTail =
pnewBB->latchNode= pnewBB->loopFollow = NO_NODE;
pnewBB->instructions = r;
int addr = pnewBB->begin()->loc_ip;
/* Mark the basic block to which the icodes belong to, but only for
* real code basic blocks (ie. not interval bbs) */
if(parent)
{
int addr = pnewBB->begin()->loc_ip;
//setInBB should automatically handle if our range is empty
parent->Icode.SetInBB(pnewBB->instructions, pnewBB);
@@ -40,10 +40,10 @@ BB *BB::Create(const rCODE &r,eBBKind _nodeType, Function *parent)
parent->m_ip_to_bb[addr] = pnewBB;
parent->m_actual_cfg.push_back(pnewBB);
pnewBB->Parent = parent;
}
if ( r.begin() != parent->Icode.end() ) /* Only for code BB's */
stats.numBBbef++;
}
return pnewBB;
}
@@ -90,7 +90,7 @@ void BB::displayDfs()
dfsFirstNum, dfsLastNum,
immedDom == MAX ? -1 : immedDom);
printf("loopType = %s, loopHead = %d, latchNode = %d, follow = %d\n",
s_loopType[loopType],
s_loopType[(int)loopType],
loopHead == MAX ? -1 : loopHead,
latchNode == MAX ? -1 : latchNode,
loopFollow == MAX ? -1 : loopFollow);
@@ -136,12 +136,14 @@ void BB::displayDfs()
*/
ICODE* BB::writeLoopHeader(int &indLevel, Function* pProc, int *numLoc, BB *&latch, bool &repCond)
{
if(loopType == eNodeHeaderType::NO_TYPE)
return nullptr;
latch = pProc->m_dfsLast[this->latchNode];
std::ostringstream ostr;
ICODE* picode;
switch (loopType)
{
case WHILE_TYPE:
case eNodeHeaderType::WHILE_TYPE:
picode = &this->back();
/* Check for error in while condition */
@@ -169,15 +171,16 @@ ICODE* BB::writeLoopHeader(int &indLevel, Function* pProc, int *numLoc, BB *&lat
picode->invalidate();
break;
case REPEAT_TYPE:
case eNodeHeaderType::REPEAT_TYPE:
ostr << "\n"<<indentStr(indLevel)<<"do {\n";
picode = &latch->back();
picode->invalidate();
break;
case ENDLESS_TYPE:
case eNodeHeaderType::ENDLESS_TYPE:
ostr << "\n"<<indentStr(indLevel)<<"for (;;) {\n";
picode = &latch->back();
break;
}
cCode.appendCode(ostr.str());
stats.numHLIcode += 1;
@@ -209,10 +212,7 @@ void BB::writeCode (int indLevel, Function * pProc , int *numLoc,int _latchNode,
/* Check for start of loop */
repCond = false;
latch = nullptr;
if (loopType)
{
picode=writeLoopHeader(indLevel, pProc, numLoc, latch, repCond);
}
/* Write the code for this basic block */
if (repCond == false)
@@ -227,12 +227,12 @@ void BB::writeCode (int indLevel, Function * pProc , int *numLoc,int _latchNode,
return;
/* Check type of loop/node and process code */
if ( loopType ) /* there is a loop */
if ( loopType!=eNodeHeaderType::NO_TYPE ) /* there is a loop */
{
assert(latch);
if (this != latch) /* loop is over several bbs */
{
if (loopType == WHILE_TYPE)
if (loopType == eNodeHeaderType::WHILE_TYPE)
{
succ = edges[THEN].BBptr;
if (succ->dfsLastNum == loopFollow)
@@ -248,7 +248,7 @@ void BB::writeCode (int indLevel, Function * pProc , int *numLoc,int _latchNode,
/* Loop epilogue: generate the loop trailer */
indLevel--;
if (loopType == WHILE_TYPE)
if (loopType == eNodeHeaderType::WHILE_TYPE)
{
std::ostringstream ostr;
/* Check if there is need to repeat other statements involved
@@ -260,9 +260,9 @@ void BB::writeCode (int indLevel, Function * pProc , int *numLoc,int _latchNode,
ostr <<indentStr(indLevel)<< "} /* end of while */\n";
cCode.appendCode(ostr.str());
}
else if (loopType == ENDLESS_TYPE)
else if (loopType == eNodeHeaderType::ENDLESS_TYPE)
cCode.appendCode( "%s} /* end of loop */\n",indentStr(indLevel));
else if (loopType == REPEAT_TYPE)
else if (loopType == eNodeHeaderType::REPEAT_TYPE)
{
string e = "//*failed*//";
if (picode->hl()->opcode != HLI_JCOND)

413
src/DccFrontend.cpp Normal file
View File

@@ -0,0 +1,413 @@
#include "dcc.h"
#include "DccFrontend.h"
#include "project.h"
#include "disassem.h"
#include "CallGraph.h"
#include <QtCore/QFileInfo>
#include <QtCore/QDebug>
#include <cstdio>
class Loader
{
bool loadIntoProject(IProject *);
};
struct PSP { /* PSP structure */
uint16_t int20h; /* interrupt 20h */
uint16_t eof; /* segment, end of allocation block */
uint8_t res1; /* reserved */
uint8_t dosDisp[5]; /* far call to DOS function dispatcher */
uint8_t int22h[4]; /* vector for terminate routine */
uint8_t int23h[4]; /* vector for ctrl+break routine */
uint8_t int24h[4]; /* vector for error routine */
uint8_t res2[22]; /* reserved */
uint16_t segEnv; /* segment address of environment block */
uint8_t res3[34]; /* reserved */
uint8_t int21h[6]; /* opcode for int21h and far return */
uint8_t res4[6]; /* reserved */
uint8_t fcb1[16]; /* default file control block 1 */
uint8_t fcb2[16]; /* default file control block 2 */
uint8_t res5[4]; /* reserved */
uint8_t cmdTail[0x80]; /* command tail and disk transfer area */
};
static struct MZHeader { /* EXE file header */
uint8_t sigLo; /* .EXE signature: 0x4D 0x5A */
uint8_t sigHi;
uint16_t lastPageSize; /* Size of the last page */
uint16_t numPages; /* Number of pages in the file */
uint16_t numReloc; /* Number of relocation items */
uint16_t numParaHeader; /* # of paragraphs in the header */
uint16_t minAlloc; /* Minimum number of paragraphs */
uint16_t maxAlloc; /* Maximum number of paragraphs */
uint16_t initSS; /* Segment displacement of stack */
uint16_t initSP; /* Contents of SP at entry */
uint16_t checkSum; /* Complemented checksum */
uint16_t initIP; /* Contents of IP at entry */
uint16_t initCS; /* Segment displacement of code */
uint16_t relocTabOffset; /* Relocation table offset */
uint16_t overlayNum; /* Overlay number */
} header;
#define EXE_RELOCATION 0x10 /* EXE images rellocated to above PSP */
//static void LoadImage(char *filename);
static void displayMemMap(void);
/****************************************************************************
* displayLoadInfo - Displays low level loader type info.
***************************************************************************/
void PROG::displayLoadInfo(void)
{
int i;
printf("File type is %s\n", (fCOM)?"COM":"EXE");
if (! fCOM) {
printf("Signature = %02X%02X\n", header.sigLo, header.sigHi);
printf("File size %% 512 = %04X\n", LH(&header.lastPageSize));
printf("File size / 512 = %04X pages\n", LH(&header.numPages));
printf("# relocation items = %04X\n", LH(&header.numReloc));
printf("Offset to load image = %04X paras\n", LH(&header.numParaHeader));
printf("Minimum allocation = %04X paras\n", LH(&header.minAlloc));
printf("Maximum allocation = %04X paras\n", LH(&header.maxAlloc));
}
printf("Load image size = %04" PRIiPTR "\n", cbImage - sizeof(PSP));
printf("Initial SS:SP = %04X:%04X\n", initSS, initSP);
printf("Initial CS:IP = %04X:%04X\n", initCS, initIP);
if (option.VeryVerbose && cReloc)
{
printf("\nRelocation Table\n");
for (i = 0; i < cReloc; i++)
{
printf("%06X -> [%04X]\n", relocTable[i],LH(image() + relocTable[i]));
}
}
printf("\n");
}
/*****************************************************************************
* fill - Fills line for displayMemMap()
****************************************************************************/
static void fill(int ip, char *bf)
{
PROG &prog(Project::get()->prog);
static uint8_t type[4] = {'.', 'd', 'c', 'x'};
uint8_t i;
for (i = 0; i < 16; i++, ip++)
{
*bf++ = ' ';
*bf++ = (ip < prog.cbImage)? type[(prog.map[ip >> 2] >> ((ip & 3) * 2)) & 3]: ' ';
}
*bf = '\0';
}
/*****************************************************************************
* displayMemMap - Displays the memory bitmap
****************************************************************************/
static void displayMemMap(void)
{
PROG &prog(Project::get()->prog);
char c, b1[33], b2[33], b3[33];
uint8_t i;
int ip = 0;
printf("\nMemory Map\n");
while (ip < prog.cbImage)
{
fill(ip, b1);
printf("%06X %s\n", ip, b1);
ip += 16;
for (i = 3, c = b1[1]; i < 32 && c == b1[i]; i += 2)
; /* Check if all same */
if (i > 32)
{
fill(ip, b2); /* Skip until next two are not same */
fill(ip+16, b3);
if (! (strcmp(b1, b2) || strcmp(b1, b3)))
{
printf(" :\n");
do
{
ip += 16;
fill(ip+16, b1);
} while (! strcmp(b1, b2));
}
}
}
printf("\n");
}
DccFrontend::DccFrontend(QObject *parent) :
QObject(parent)
{
}
/*****************************************************************************
* FrontEnd - invokes the loader, parser, disassembler (if asm1), icode
* rewritter, and displays any useful information.
****************************************************************************/
bool DccFrontend::FrontEnd ()
{
/* Do depth first flow analysis building call graph and procedure list,
* and attaching the I-code to each procedure */
parse (*Project::get());
if (option.asm1)
{
qWarning() << "dcc: writing assembler file "<<asm1_name<<'\n';
}
/* Search through code looking for impure references and flag them */
Disassembler ds(1);
for(Function &f : Project::get()->pProcList)
{
f.markImpure();
if (option.asm1)
{
ds.disassem(&f);
}
}
if (option.Interact)
{
interactDis(&Project::get()->pProcList.front(), 0); /* Interactive disassembler */
}
/* Converts jump target addresses to icode offsets */
for(Function &f : Project::get()->pProcList)
{
f.bindIcodeOff();
}
/* Print memory bitmap */
if (option.Map)
displayMemMap();
return(true); // we no longer own proj !
}
struct DosLoader {
protected:
void prepareImage(PROG &prog,size_t sz,QFile &fp) {
/* Allocate a block of memory for the program. */
prog.cbImage = sz + sizeof(PSP);
prog.Imagez = new uint8_t [prog.cbImage];
prog.Imagez[0] = 0xCD; /* Fill in PSP int 20h location */
prog.Imagez[1] = 0x20; /* for termination checking */
/* Read in the image past where a PSP would go */
if (sz != fp.read((char *)prog.Imagez + sizeof(PSP),sz))
fatalError(CANNOT_READ, fp.fileName().toLocal8Bit().data());
}
};
struct ComLoader : public DosLoader {
bool canLoad(QFile &fp) {
fp.seek(0);
char sig[2];
if(2==fp.read(sig,2)) {
return not (sig[0] == 0x4D && sig[1] == 0x5A);
}
return false;
}
bool load(PROG &prog,QFile &fp) {
fp.seek(0);
/* COM file
* In this case the load module size is just the file length
*/
auto cb = fp.size();
/* COM programs start off with an ORG 100H (to leave room for a PSP)
* This is also the implied start address so if we load the image
* at offset 100H addresses should all line up properly again.
*/
prog.initCS = 0;
prog.initIP = 0x100;
prog.initSS = 0;
prog.initSP = 0xFFFE;
prog.cReloc = 0;
prepareImage(prog,cb,fp);
/* Set up memory map */
cb = (prog.cbImage + 3) / 4;
prog.map = (uint8_t *)malloc(cb);
memset(prog.map, BM_UNKNOWN, (size_t)cb);
return true;
}
};
struct ExeLoader : public DosLoader {
bool canLoad(QFile &fp) {
if(fp.size()<sizeof(header))
return false;
MZHeader tmp_header;
fp.seek(0);
fp.read((char *)&tmp_header, sizeof(header));
if(not (tmp_header.sigLo == 0x4D && tmp_header.sigHi == 0x5A))
return false;
/* This is a typical DOS kludge! */
if (LH(&header.relocTabOffset) == 0x40)
{
qDebug() << "Don't understand new EXE format";
return false;
}
return true;
}
bool load(PROG &prog,QFile &fp) {
/* Read rest of header */
fp.seek(0);
if (fp.read((char *)&header, sizeof(header)) != sizeof(header))
return false;
/* Calculate the load module size.
* This is the number of pages in the file
* less the length of the header and reloc table
* less the number of bytes unused on last page
*/
uint32_t cb = (uint32_t)LH(&header.numPages) * 512 - (uint32_t)LH(&header.numParaHeader) * 16;
if (header.lastPageSize)
{
cb -= 512 - LH(&header.lastPageSize);
}
/* We quietly ignore minAlloc and maxAlloc since for our
* purposes it doesn't really matter where in real memory
* the program would end up. EXE programs can't really rely on
* their load location so setting the PSP segment to 0 is fine.
* Certainly programs that prod around in DOS or BIOS are going
* to have to load DS from a constant so it'll be pretty
* obvious.
*/
prog.initCS = (int16_t)LH(&header.initCS) + EXE_RELOCATION;
prog.initIP = (int16_t)LH(&header.initIP);
prog.initSS = (int16_t)LH(&header.initSS) + EXE_RELOCATION;
prog.initSP = (int16_t)LH(&header.initSP);
prog.cReloc = (int16_t)LH(&header.numReloc);
/* Allocate the relocation table */
if (prog.cReloc)
{
prog.relocTable.resize(prog.cReloc);
fp.seek(LH(&header.relocTabOffset));
/* Read in seg:offset pairs and convert to Image ptrs */
uint8_t buf[4];
for (int i = 0; i < prog.cReloc; i++)
{
fp.read((char *)buf,4);
prog.relocTable[i] = LH(buf) + (((int)LH(buf+2) + EXE_RELOCATION)<<4);
}
}
/* Seek to start of image */
uint32_t start_of_image= LH(&header.numParaHeader) * 16;
fp.seek(start_of_image);
/* Allocate a block of memory for the program. */
prepareImage(prog,cb,fp);
/* Set up memory map */
cb = (prog.cbImage + 3) / 4;
prog.map = (uint8_t *)malloc(cb);
memset(prog.map, BM_UNKNOWN, (size_t)cb);
/* Relocate segment constants */
for(uint32_t v : prog.relocTable) {
uint8_t *p = &prog.Imagez[v];
uint16_t w = (uint16_t)LH(p) + EXE_RELOCATION;
*p++ = (uint8_t)(w & 0x00FF);
*p = (uint8_t)((w & 0xFF00) >> 8);
}
return true;
}
};
/*****************************************************************************
* LoadImage
****************************************************************************/
bool Project::load()
{
// addTask(loaderSelection,PreCond(BinaryImage))
// addTask(applyLoader,PreCond(Loader))
const char *fname = binary_path().toLocal8Bit().data();
QFile finfo(binary_path());
/* Open the input file */
if(!finfo.open(QFile::ReadOnly)) {
fatalError(CANNOT_OPEN, fname);
}
/* Read in first 2 bytes to check EXE signature */
if (finfo.size()<=2)
{
fatalError(CANNOT_READ, fname);
}
ComLoader com_loader;
ExeLoader exe_loader;
if(exe_loader.canLoad(finfo)) {
prog.fCOM = false;
return exe_loader.load(prog,finfo);
}
if(com_loader.canLoad(finfo)) {
prog.fCOM = true;
return com_loader.load(prog,finfo);
}
return false;
}
uint32_t SynthLab;
/* Parses the program, builds the call graph, and returns the list of
* procedures found */
void DccFrontend::parse(Project &proj)
{
PROG &prog(proj.prog);
STATE state;
/* Set initial state */
state.setState(rES, 0); /* PSP segment */
state.setState(rDS, 0);
state.setState(rCS, prog.initCS);
state.setState(rSS, prog.initSS);
state.setState(rSP, prog.initSP);
state.IP = ((uint32_t)prog.initCS << 4) + prog.initIP;
SynthLab = SYNTHESIZED_MIN;
/* Check for special settings of initial state, based on idioms of the
startup code */
state.checkStartup();
Function *start_proc;
/* Make a struct for the initial procedure */
if (prog.offMain != -1)
{
start_proc = proj.createFunction(0,"main");
start_proc->retVal.loc = REG_FRAME;
start_proc->retVal.type = TYPE_WORD_SIGN;
start_proc->retVal.id.regi = rAX;
/* We know where main() is. Start the flow of control from there */
start_proc->procEntry = prog.offMain;
/* In medium and large models, the segment of main may (will?) not be
the same as the initial CS segment (of the startup code) */
state.setState(rCS, prog.segMain);
state.IP = prog.offMain;
}
else
{
start_proc = proj.createFunction(0,"start");
/* Create initial procedure at program start address */
start_proc->procEntry = (uint32_t)state.IP;
}
/* The state info is for the first procedure */
start_proc->state = state;
/* Set up call graph initial node */
proj.callGraph = new CALL_GRAPH;
proj.callGraph->proc = start_proc;
/* This proc needs to be called to set things up for LibCheck(), which
checks a proc to see if it is a know C (etc) library */
SetupLibCheck();
//BUG: proj and g_proj are 'live' at this point !
/* Recursively build entire procedure list */
start_proc->FollowCtrl(proj.callGraph, &state);
/* This proc needs to be called to clean things up from SetupLibCheck() */
CleanupLibCheck();
}

View File

@@ -4,6 +4,8 @@
* Purpose: Back-end module. Generates C code for each procedure.
* (C) Cristina Cifuentes
****************************************************************************/
#include <QDir>
#include <QFile>
#include <cassert>
#include <string>
#include <boost/range.hpp>
@@ -167,13 +169,13 @@ void Project::writeGlobSymTable()
/* Writes the header information and global variables to the output C file
* fp. */
static void writeHeader (std::ostream &_ios, char *fileName)
static void writeHeader (std::ostream &_ios, const std::string &fileName)
{
PROG &prog(Project::get()->prog);
/* Write header information */
cCode.init();
cCode.appendDecl( "/*\n");
cCode.appendDecl( " * Input file\t: %s\n", fileName);
cCode.appendDecl( " * Input file\t: %s\n", fileName.c_str());
cCode.appendDecl( " * File type\t: %s\n", (prog.fCOM)?"COM":"EXE");
cCode.appendDecl( " */\n\n#include \"dcc.h\"\n\n");
@@ -341,22 +343,21 @@ static void backBackEnd (CALL_GRAPH * pcallGraph, std::ostream &_ios)
/* Invokes the necessary routines to produce code one procedure at a time. */
void BackEnd (char *fileName, CALL_GRAPH * pcallGraph)
void BackEnd(CALL_GRAPH * pcallGraph)
{
std::ofstream fs; /* Output C file */
/* Get output file name */
std::string outNam(fileName);
outNam = outNam.substr(0,outNam.rfind("."))+".b"; /* b for beta */
QString outNam(Project::get()->output_name("b")); /* b for beta */
/* Open output file */
fs.open(outNam);
fs.open(outNam.toStdString());
if(!fs.is_open())
fatalError (CANNOT_OPEN, outNam.c_str());
printf ("dcc: Writing C beta file %s\n", outNam.c_str());
fatalError (CANNOT_OPEN, outNam.toStdString().c_str());
std::cout<<"dcc: Writing C beta file "<<outNam.toStdString()<<"\n";
/* Header information */
writeHeader (fs, option.filename);
writeHeader (fs, option.filename.toStdString());
/* Initialize total Icode instructions statistics */
stats.totalLL = 0;
@@ -367,7 +368,7 @@ void BackEnd (char *fileName, CALL_GRAPH * pcallGraph)
/* Close output file */
fs.close();
printf ("dcc: Finished writing C beta file\n");
std::cout << "dcc: Finished writing C beta file\n";
}

View File

@@ -5,18 +5,17 @@
* (C) Mike van Emmerik
*/
#include <stdio.h>
#include <stdlib.h>
#ifdef __BORLAND__
#include <mem.h>
#else
#include <memory.h>
#endif
#include <string.h>
#include "dcc.h"
#include "project.h"
#include "perfhlib.h"
#include "dcc_interface.h"
#include <QDir>
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <string.h>
PerfectHash g_pattern_hasher;
#define NIL -1 /* Used like NULL, but 0 is valid */
/* Hash table structure */
@@ -68,7 +67,6 @@ void readFileSection(uint16_t* p, int len, FILE *_file);
void cleanup(void);
void checkStartup(STATE *state);
void readProtoFile(void);
void fixNewline(char *s);
int searchPList(char *name);
void checkHeap(char *msg); /* For debugging */
@@ -301,10 +299,11 @@ void SetupLibCheck(void)
PROG &prog(Project::get()->prog);
uint16_t w, len;
int i;
if ((g_file = fopen(sSigName, "rb")) == nullptr)
IDcc *dcc = IDcc::get();
QString fpath = dcc->dataDir("sigs").absoluteFilePath(sSigName);
if ((g_file = fopen(qPrintable(fpath), "rb")) == nullptr)
{
printf("Warning: cannot open signature file %s\n", sSigName);
printf("Warning: cannot open signature file %s\n", qPrintable(fpath));
return;
}
@@ -332,7 +331,7 @@ void SetupLibCheck(void)
/* Initialise the perfhlib stuff. Also allocates T1, T2, g, etc */
/* Set the parameters for the hash table */
g_pattern_hasher.init(
g_pattern_hasher.setHashParams(
numKeys, /* The number of symbols */
PatLen, /* The length of the pattern to be hashed */
256, /* The character set of the pattern (0-FF) */
@@ -639,7 +638,6 @@ void STATE::checkStartup()
char chModel = 'x';
char chVendor = 'x';
char chVersion = 'x';
char *pPath;
char temp[4];
startOff = ((uint32_t)prog.initCS << 4) + prog.initIP;
@@ -830,21 +828,6 @@ void STATE::checkStartup()
gotVendor:
/* Use the DCC environment variable to set where the .sig files will
be found. Otherwise, assume current directory */
pPath = getenv("DCC");
if (pPath)
{
strcpy(sSigName, pPath); /* Use path given */
if (sSigName[strlen(sSigName)-1] != '/')
{
strcat(sSigName, "/"); /* Append a slash if necessary */
}
}
else
{
strcpy(sSigName, "./"); /* Current directory */
}
strcat(sSigName, "dcc");
temp[1] = '\0';
temp[0] = chVendor;
@@ -867,45 +850,29 @@ gotVendor:
*/
void readProtoFile(void)
{
IDcc *dcc = IDcc::get();
QString szProFName = dcc->dataDir("prototypes").absoluteFilePath(DCCLIBS); /* Full name of dclibs.lst */
FILE *fProto;
char *pPath; /* Point to the environment string */
char szProFName[81]; /* Full name of dclibs.lst */
int i;
/* Use the DCC environment variable to set where the dcclibs.lst file will
be found. Otherwise, assume current directory */
pPath = getenv("DCC");
if (pPath)
if ((fProto = fopen(qPrintable(szProFName), "rb")) == nullptr)
{
strcpy(szProFName, pPath); /* Use path given */
if (szProFName[strlen(szProFName)-1] != '/')
{
strcat(szProFName, "/"); /* Append a slash if necessary */
}
}
else
{
strcpy(szProFName, "./"); /* Current directory */
}
strcat(szProFName, DCCLIBS);
if ((fProto = fopen(szProFName, "rb")) == nullptr)
{
printf("Warning: cannot open library prototype data file %s\n", szProFName);
printf("Warning: cannot open library prototype data file %s\n", qPrintable(szProFName));
return;
}
grab(4, fProto);
if (strncmp(buf, "dccp", 4) != 0)
{
printf("%s is not a dcc prototype file\n", szProFName);
printf("%s is not a dcc prototype file\n", qPrintable(szProFName));
exit(1);
}
grab(2, fProto);
if (strncmp(buf, "FN", 2) != 0)
{
printf("FN (Function Name) subsection expected in %s\n", szProFName);
printf("FN (Function Name) subsection expected in %s\n", qPrintable(szProFName));
exit(2);
}
@@ -932,7 +899,7 @@ void readProtoFile(void)
grab(2, fProto);
if (strncmp(buf, "PM", 2) != 0)
{
printf("PM (Parameter) subsection expected in %s\n", szProFName);
printf("PM (Parameter) subsection expected in %s\n", qPrintable(szProFName));
exit(2);
}

View File

@@ -168,7 +168,7 @@ static void findNodesInLoop(BB * latchNode,BB * head,Function * pProc,queue &int
(inList (loopNodes, head->edges[THEN].BBptr->dfsLastNum) &&
inList (loopNodes, head->edges[ELSE].BBptr->dfsLastNum)))
{
head->loopType = REPEAT_TYPE;
head->loopType = eNodeHeaderType::REPEAT_TYPE;
if (latchNode->edges[0].BBptr == head)
head->loopFollow = latchNode->edges[ELSE].BBptr->dfsLastNum;
else
@@ -177,7 +177,7 @@ static void findNodesInLoop(BB * latchNode,BB * head,Function * pProc,queue &int
}
else
{
head->loopType = WHILE_TYPE;
head->loopType = eNodeHeaderType::WHILE_TYPE;
if (inList (loopNodes, head->edges[THEN].BBptr->dfsLastNum))
head->loopFollow = head->edges[ELSE].BBptr->dfsLastNum;
else
@@ -186,7 +186,7 @@ static void findNodesInLoop(BB * latchNode,BB * head,Function * pProc,queue &int
}
else /* head = anything besides 2-way, latch = 2-way */
{
head->loopType = REPEAT_TYPE;
head->loopType = eNodeHeaderType::REPEAT_TYPE;
if (latchNode->edges[THEN].BBptr == head)
head->loopFollow = latchNode->edges[ELSE].BBptr->dfsLastNum;
else
@@ -196,12 +196,12 @@ static void findNodesInLoop(BB * latchNode,BB * head,Function * pProc,queue &int
else /* latch = 1-way */
if (latchNode->nodeType == LOOP_NODE)
{
head->loopType = REPEAT_TYPE;
head->loopType = eNodeHeaderType::REPEAT_TYPE;
head->loopFollow = latchNode->edges[0].BBptr->dfsLastNum;
}
else if (intNodeType == TWO_BRANCH)
{
head->loopType = WHILE_TYPE;
head->loopType = eNodeHeaderType::WHILE_TYPE;
pbb = latchNode;
thenDfs = head->edges[THEN].BBptr->dfsLastNum;
elseDfs = head->edges[ELSE].BBptr->dfsLastNum;
@@ -222,7 +222,7 @@ static void findNodesInLoop(BB * latchNode,BB * head,Function * pProc,queue &int
* loop, so it is safer to consider it an endless loop */
if (pbb->dfsLastNum <= head->dfsLastNum)
{
head->loopType = ENDLESS_TYPE;
head->loopType = eNodeHeaderType::ENDLESS_TYPE;
findEndlessFollow (pProc, loopNodes, head);
break;
}
@@ -234,7 +234,7 @@ static void findNodesInLoop(BB * latchNode,BB * head,Function * pProc,queue &int
}
else
{
head->loopType = ENDLESS_TYPE;
head->loopType = eNodeHeaderType::ENDLESS_TYPE;
findEndlessFollow (pProc, loopNodes, head);
}

View File

@@ -143,7 +143,6 @@ void Function::elimCondCodes ()
//auto reversed_instructions = pBB->range() | reversed;
for (useAt = pBB->rbegin(); useAt != pBB->rend(); useAt++)
{
ICODE &useIcode(*useAt);
llIcode useAtOp = llIcode(useAt->ll()->getOpcode());
use = useAt->ll()->flagDU.u;
if ((useAt->type != LOW_LEVEL) || ( ! useAt->valid() ) || ( 0 == use ))
@@ -159,7 +158,6 @@ void Function::elimCondCodes ()
continue;
notSup = false;
LLOperand *dest_ll = defIcode.ll()->get(DST);
LLOperand *src_ll = defIcode.ll()->get(SRC);
if ((useAtOp >= iJB) && (useAtOp <= iJNS))
{
iICODE befDefAt = (++riICODE(defAt)).base();

View File

@@ -5,23 +5,10 @@
****************************************************************************/
#include <cstring>
#include "dcc.h"
#include "project.h"
#include "CallGraph.h"
/* Global variables - extern to other modules */
extern char *asm1_name, *asm2_name; /* Assembler output filenames */
extern SYMTAB symtab; /* Global symbol table */
extern STATS stats; /* cfg statistics */
//PROG prog; /* programs fields */
extern OPTION option; /* Command line options */
//Function * pProcList; /* List of procedures, topologically sort */
//Function * pLastProc; /* Pointer to last node in procedure list */
//FunctionListType pProcList;
//CALL_GRAPH *callGraph; /* Call graph of the program */
static char *initargs(int argc, char *argv[]);
static void displayTotalStats(void);
#include <iostream>
#include <QtCore/QCoreApplication>
#include <QCommandLineParser>
#ifdef LLVM_EXPERIMENTAL
#include <llvm/Support/raw_os_ostream.h>
#include <llvm/Support/CommandLine.h>
#include <llvm/Support/TargetSelect.h>
@@ -33,14 +20,29 @@ static void displayTotalStats(void);
#include <llvm/Target/TargetInstrInfo.h>
#include <llvm/MC/MCAsmInfo.h>
#include <llvm/CodeGen/MachineInstrBuilder.h>
#include <llvm/TableGen/Main.h>
#include <llvm/TableGen/TableGenBackend.h>
#include <llvm/TableGen/Record.h>
#endif
#include <QtCore/QFile>
#include "dcc.h"
#include "project.h"
#include "CallGraph.h"
#include "DccFrontend.h"
/* Global variables - extern to other modules */
extern QString asm1_name, asm2_name; /* Assembler output filenames */
extern SYMTAB symtab; /* Global symbol table */
extern STATS stats; /* cfg statistics */
extern OPTION option; /* Command line options */
static char *initargs(int argc, char *argv[]);
static void displayTotalStats(void);
/****************************************************************************
* main
***************************************************************************/
#include <iostream>
#ifdef LLVM_EXPERIMENTAL
using namespace llvm;
bool TVisitor(raw_ostream &OS, RecordKeeper &Records)
{
@@ -65,63 +67,128 @@ bool TVisitor(raw_ostream &OS, RecordKeeper &Records)
// rec = Records.getDef("CCR");
// if(rec)
// rec->dump();
for(auto val : Records.getDefs())
{
//std::cout<< "Def "<<val.first<<"\n";
}
// for(auto val : Records.getDefs())
// {
// //std::cout<< "Def "<<val.first<<"\n";
// }
return false;
}
int testTblGen(int argc, char **argv)
{
using namespace llvm;
sys::PrintStackTraceOnErrorSignal();
PrettyStackTraceProgram(argc,argv);
cl::ParseCommandLineOptions(argc,argv);
return llvm::TableGenMain(argv[0],TVisitor);
InitializeNativeTarget();
Triple TheTriple;
std::string def = sys::getDefaultTargetTriple();
std::string MCPU="i386";
std::string MARCH="x86";
InitializeAllTargetInfos();
InitializeAllTargetMCs();
InitializeAllAsmPrinters();
InitializeAllAsmParsers();
InitializeAllDisassemblers();
std::string TargetTriple("i386-pc-linux-gnu");
TheTriple = Triple(Triple::normalize(TargetTriple));
MCOperand op=llvm::MCOperand::CreateImm(11);
MCAsmInfo info;
raw_os_ostream wrap(std::cerr);
op.print(wrap,&info);
wrap.flush();
std::cerr<<"\n";
std::string lookuperr;
TargetRegistry::printRegisteredTargetsForVersion();
const Target *t = TargetRegistry::lookupTarget(MARCH,TheTriple,lookuperr);
TargetOptions opts;
std::string Features;
opts.PrintMachineCode=1;
TargetMachine *tm = t->createTargetMachine(TheTriple.getTriple(),MCPU,Features,opts);
std::cerr<<tm->getInstrInfo()->getName(97)<<"\n";
const MCInstrDesc &ds(tm->getInstrInfo()->get(97));
const MCOperandInfo *op1=ds.OpInfo;
uint16_t impl_def = ds.getImplicitDefs()[0];
std::cerr<<lookuperr<<"\n";
// using namespace llvm;
// sys::PrintStackTraceOnErrorSignal();
// PrettyStackTraceProgram(argc,argv);
// cl::ParseCommandLineOptions(argc,argv);
// return llvm::TableGenMain(argv[0],TVisitor);
// InitializeNativeTarget();
// Triple TheTriple;
// std::string def = sys::getDefaultTargetTriple();
// std::string MCPU="i386";
// std::string MARCH="x86";
// InitializeAllTargetInfos();
// InitializeAllTargetMCs();
// InitializeAllAsmPrinters();
// InitializeAllAsmParsers();
// InitializeAllDisassemblers();
// std::string TargetTriple("i386-pc-linux-gnu");
// TheTriple = Triple(Triple::normalize(TargetTriple));
// MCOperand op=llvm::MCOperand::CreateImm(11);
// MCAsmInfo info;
// raw_os_ostream wrap(std::cerr);
// op.print(wrap,&info);
// wrap.flush();
// std::cerr<<"\n";
// std::string lookuperr;
// TargetRegistry::printRegisteredTargetsForVersion();
// const Target *t = TargetRegistry::lookupTarget(MARCH,TheTriple,lookuperr);
// TargetOptions opts;
// std::string Features;
// opts.PrintMachineCode=1;
// TargetMachine *tm = t->createTargetMachine(TheTriple.getTriple(),MCPU,Features,opts);
// std::cerr<<tm->getInstrInfo()->getName(97)<<"\n";
// const MCInstrDesc &ds(tm->getInstrInfo()->get(97));
// const MCOperandInfo *op1=ds.OpInfo;
// uint16_t impl_def = ds.getImplicitDefs()[0];
// std::cerr<<lookuperr<<"\n";
exit(0);
// exit(0);
}
#endif
void setupOptions(QCoreApplication &app) {
//[-a1a2cmsi]
QCommandLineParser parser;
parser.setApplicationDescription("dcc");
parser.addHelpOption();
//parser.addVersionOption();
//QCommandLineOption showProgressOption("p", QCoreApplication::translate("main", "Show progress during copy"));
QCommandLineOption boolOpts[] {
QCommandLineOption {"v", QCoreApplication::translate("main", "verbose")},
QCommandLineOption {"V", QCoreApplication::translate("main", "very verbose")},
QCommandLineOption {"c", QCoreApplication::translate("main", "Follow register indirect calls")},
QCommandLineOption {"m", QCoreApplication::translate("main", "Print memory maps of program")},
QCommandLineOption {"s", QCoreApplication::translate("main", "Print stats")}
};
for(QCommandLineOption &o : boolOpts) {
parser.addOption(o);
}
QCommandLineOption assembly("a", QCoreApplication::translate("main", "Produce assembly"),"assembly_level");
// A boolean option with multiple names (-f, --force)
//QCommandLineOption forceOption(QStringList() << "f" << "force", "Overwrite existing files.");
// An option with a value
QCommandLineOption targetFileOption(QStringList() << "o" << "output",
QCoreApplication::translate("main", "Place output into <file>."),
QCoreApplication::translate("main", "file"));
parser.addOption(targetFileOption);
parser.addOption(assembly);
//parser.addOption(forceOption);
// Process the actual command line arguments given by the user
parser.addPositionalArgument("source", QCoreApplication::translate("main", "Dos Executable file to decompile."));
parser.process(app);
const QStringList args = parser.positionalArguments();
if(args.empty()) {
parser.showHelp();
}
// source is args.at(0), destination is args.at(1)
option.verbose = parser.isSet(boolOpts[0]);
option.VeryVerbose = parser.isSet(boolOpts[1]);
if(parser.isSet(assembly)) {
option.asm1 = parser.value(assembly).toInt()==1;
option.asm2 = parser.value(assembly).toInt()==2;
}
option.Map = parser.isSet(boolOpts[3]);
option.Stats = parser.isSet(boolOpts[4]);
option.Interact = false;
option.Calls = parser.isSet(boolOpts[2]);
option.filename = args.first();
if(parser.isSet(targetFileOption))
asm1_name = asm2_name = parser.value(targetFileOption);
else if(option.asm1 || option.asm2) {
asm1_name = option.filename+".a1";
asm2_name = option.filename+".a2";
}
}
int main(int argc, char **argv)
{
/* Extract switches and filename */
strcpy(option.filename, initargs(argc, argv));
QCoreApplication app(argc,argv);
QCoreApplication::setApplicationVersion("0.1");
setupOptions(app);
/* Front end reads in EXE or COM file, parses it into I-code while
* building the call graph and attaching appropriate bits of code for
* each procedure.
*/
DccFrontend fe(option.filename);
Project::get()->create(option.filename);
DccFrontend fe(&app);
if(!Project::get()->load()) {
return -1;
}
if (option.verbose)
Project::get()->prog.displayLoadInfo();
if(false==fe.FrontEnd ())
return -1;
if(option.asm1)
@@ -138,98 +205,16 @@ int main(int argc, char **argv)
* analysis, data flow etc. and outputs it to output file ready for
* re-compilation.
*/
BackEnd(asm1_name ? asm1_name:option.filename, Project::get()->callGraph);
BackEnd(Project::get()->callGraph);
Project::get()->callGraph->write();
if (option.Stats)
displayTotalStats();
/*
freeDataStructures(pProcList);
*/
return 0;
}
/****************************************************************************
* initargs - Extract command line arguments
***************************************************************************/
static char *initargs(int argc, char *argv[])
{
char *pc;
while (--argc > 0 && (*++argv)[0] == '-')
{
for (pc = argv[0]+1; *pc; pc++)
switch (*pc)
{
case 'a': /* Print assembler listing */
if (*(pc+1) == '2')
option.asm2 = true;
else
option.asm1 = true;
if (*(pc+1) == '1' || *(pc+1) == '2')
pc++;
break;
case 'c':
option.Calls = true;
break;
case 'i':
option.Interact = true;
break;
case 'm': /* Print memory map */
option.Map = true;
break;
case 's': /* Print Stats */
option.Stats = true;
break;
case 'V': /* Very verbose => verbose */
option.VeryVerbose = true;
case 'v':
option.verbose = true; /* Make everything verbose */
break;
case 'o': /* assembler output file */
if (*(pc+1)) {
asm1_name = asm2_name = pc+1;
goto NextArg;
}
else if (--argc > 0) {
asm1_name = asm2_name = *++argv;
goto NextArg;
}
default:
fatalError(INVALID_ARG, *pc);
return *argv;
}
NextArg:;
}
if (argc == 1)
{
if (option.asm1 || option.asm2)
{
if (! asm1_name)
{
asm1_name = strcpy((char*)malloc(strlen(*argv)+4), *argv);
pc = strrchr(asm1_name, '.');
if (pc > strrchr(asm1_name, '/'))
{
*pc = '\0';
}
asm2_name = (char*)malloc(strlen(asm1_name)+4) ;
strcat(strcpy(asm2_name, asm1_name), ".a2");
unlink(asm2_name);
strcat(asm1_name, ".a1");
}
unlink(asm1_name); /* Remove asm output files */
}
return *argv; /* filename of the program to decompile */
}
fatalError(USAGE);
return *argv; // does not reach this.
}
static void
displayTotalStats ()
/* Displays final statistics for the complete program */

61
src/dcc_interface.cpp Normal file
View File

@@ -0,0 +1,61 @@
#include "dcc_interface.h"
#include "dcc.h"
#include "project.h"
struct DccImpl : public IDcc{
// IDcc interface
public:
void BaseInit()
{
}
void Init(QObject *tgt)
{
}
ilFunction GetFirstFuncHandle()
{
}
ilFunction GetCurFuncHandle()
{
}
void analysis_Once()
{
}
void load(QString name)
{
option.filename = name;
Project::get()->create(name);
}
void prtout_asm(IXmlTarget *, int level)
{
}
void prtout_cpp(IXmlTarget *, int level)
{
}
size_t getFuncCount()
{
}
const lFunction &validFunctions() const
{
return Project::get()->functions();
}
void SetCurFunc_by_Name(QString)
{
}
QDir installDir() {
return QDir(".");
}
QDir dataDir(QString kind) { // return directory containing decompilation helper data -> signatures/includes/etc.
QDir res(installDir());
res.cd(kind);
return res;
}
};
IDcc* IDcc::get() {
static IDcc *v=0;
if(!v)
v = new DccImpl;
return v;
}

View File

@@ -150,10 +150,10 @@ void Disassembler::disassem(Function * ppProc)
if (pass != 3)
{
auto p = (pass == 1)? asm1_name: asm2_name;
m_fp.open(p,ios_base::app);
m_fp.open(p.toStdString(),ios_base::app);
if (!m_fp.is_open())
{
fatalError(CANNOT_OPEN, p);
fatalError(CANNOT_OPEN, p.toStdString().c_str());
}
}
/* Create temporary code array */

View File

@@ -82,7 +82,7 @@ bool DccFrontend::FrontEnd ()
if (option.asm1)
{
printf("dcc: writing assembler file %s\n", asm1_name);
printf("dcc: writing assembler file %s\n", asm1_name.c_str());
}
/* Search through code looking for impure references and flag them */

View File

@@ -3,7 +3,12 @@
* (C) Cristina Cifuentes
****************************************************************************/
#include <llvm/Support/PatternMatch.h>
//#include <llvm/Config/llvm-config.h>
//#if( (LLVM_VERSION_MAJOR==3 ) && (LLVM_VERSION_MINOR>3) )
//#include <llvm/IR/PatternMatch.h>
//#else
//#include <llvm/Support/PatternMatch.h>
//#endif
#include <boost/iterator/filter_iterator.hpp>
#include <cstring>
#include <deque>

View File

@@ -27,16 +27,16 @@ bool Idiom14::match(iICODE pIcode)
return false;
m_icodes[0]=pIcode++;
m_icodes[1]=pIcode++;
LLInst * matched [] = {m_icodes[0]->ll(),m_icodes[1]->ll()};
LLInst * matched [] {m_icodes[0]->ll(),m_icodes[1]->ll()};
/* Check for regL */
m_regL = m_icodes[0]->ll()->m_dst.regi;
if (not m_icodes[0]->ll()->testFlags(I) && ((m_regL == rAX) || (m_regL ==rBX)))
m_regL = matched[0]->m_dst.regi;
if (not matched[0]->testFlags(I) && ((m_regL == rAX) || (m_regL ==rBX)))
{
/* Check for XOR regH, regH */
if (m_icodes[1]->ll()->match(iXOR) && not m_icodes[1]->ll()->testFlags(I))
if (matched[1]->match(iXOR) && not matched[1]->testFlags(I))
{
m_regH = m_icodes[1]->ll()->m_dst.regi;
if (m_regH == m_icodes[1]->ll()->src().getReg2())
m_regH = matched[1]->m_dst.regi;
if (m_regH == matched[1]->src().getReg2())
{
if ((m_regL == rAX) && (m_regH == rDX))
return true;
@@ -49,14 +49,11 @@ bool Idiom14::match(iICODE pIcode)
}
int Idiom14::action()
{
int idx;
AstIdent *lhs;
Expr *rhs;
idx = m_func->localId.newLongReg (TYPE_LONG_SIGN, LONGID_TYPE(m_regH,m_regL), m_icodes[0]);
lhs = AstIdent::LongIdx (idx);
int idx = m_func->localId.newLongReg (TYPE_LONG_SIGN, LONGID_TYPE(m_regH,m_regL), m_icodes[0]);
AstIdent *lhs = AstIdent::LongIdx (idx);
m_icodes[0]->setRegDU( m_regH, eDEF);
rhs = AstIdent::id (*m_icodes[0]->ll(), SRC, m_func, m_icodes[0], *m_icodes[0], NONE);
Expr *rhs = AstIdent::id (*m_icodes[0]->ll(), SRC, m_func, m_icodes[0], *m_icodes[0], NONE);
m_icodes[0]->setAsgn(lhs, rhs);
m_icodes[1]->invalidate();
return 2;

View File

@@ -20,70 +20,8 @@ static void setBits(int16_t type, uint32_t start, uint32_t len);
static void process_MOV(LLInst &ll, STATE * pstate);
static SYM * lookupAddr (LLOperand *pm, STATE * pstate, int size, uint16_t duFlag);
void interactDis(Function * initProc, int ic);
static uint32_t SynthLab;
extern uint32_t SynthLab;
/* Parses the program, builds the call graph, and returns the list of
* procedures found */
void DccFrontend::parse(Project &proj)
{
PROG &prog(proj.prog);
STATE state;
/* Set initial state */
state.setState(rES, 0); /* PSP segment */
state.setState(rDS, 0);
state.setState(rCS, prog.initCS);
state.setState(rSS, prog.initSS);
state.setState(rSP, prog.initSP);
state.IP = ((uint32_t)prog.initCS << 4) + prog.initIP;
SynthLab = SYNTHESIZED_MIN;
// default-construct a Function object !
/*auto func = */;
/* Check for special settings of initial state, based on idioms of the
startup code */
state.checkStartup();
Function *start_proc;
/* Make a struct for the initial procedure */
if (prog.offMain != -1)
{
start_proc = proj.createFunction(0,"main");
start_proc->retVal.loc = REG_FRAME;
start_proc->retVal.type = TYPE_WORD_SIGN;
start_proc->retVal.id.regi = rAX;
/* We know where main() is. Start the flow of control from there */
start_proc->procEntry = prog.offMain;
/* In medium and large models, the segment of main may (will?) not be
the same as the initial CS segment (of the startup code) */
state.setState(rCS, prog.segMain);
state.IP = prog.offMain;
}
else
{
start_proc = proj.createFunction(0,"start");
/* Create initial procedure at program start address */
start_proc->procEntry = (uint32_t)state.IP;
}
/* The state info is for the first procedure */
start_proc->state = state;
/* Set up call graph initial node */
proj.callGraph = new CALL_GRAPH;
proj.callGraph->proc = start_proc;
/* This proc needs to be called to set things up for LibCheck(), which
checks a proc to see if it is a know C (etc) library */
SetupLibCheck();
//BUG: proj and g_proj are 'live' at this point !
/* Recursively build entire procedure list */
start_proc->FollowCtrl(proj.callGraph, &state);
/* This proc needs to be called to clean things up from SetupLibCheck() */
CleanupLibCheck();
}
/* Returns the size of the string pointed by sym and delimited by delim.
* Size includes delimiter. */

View File

@@ -1,101 +0,0 @@
/*
* Perfect hashing function library. Contains functions to generate perfect
* hashing functions
* (C) Mike van Emmerik
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "perfhlib.h"
/* Private data structures */
static uint16_t *T1, *T2; /* Pointers to T1[i], T2[i] */
static short *g; /* g[] */
//static int numEdges; /* An edge counter */
//static bool *visited; /* Array of bools: whether visited */
/* Private prototypes */
//static void initGraph(void);
//static void addToGraph(int e, int v1, int v2);
//static bool isCycle(void);
//static void duplicateKeys(int v1, int v2);
PatternHasher g_pattern_hasher;
void PatternHasher::init(int _NumEntry, int _EntryLen, int _SetSize, char _SetMin, int _NumVert)
{
/* These parameters are stored in statics so as to obviate the need for
passing all these (or defererencing pointers) for every call to hash()
*/
NumEntry = _NumEntry;
EntryLen = _EntryLen;
SetSize = _SetSize;
SetMin = _SetMin;
NumVert = _NumVert;
/* Allocate the variable sized tables etc */
T1base = new uint16_t [EntryLen * SetSize];
T2base = new uint16_t [EntryLen * SetSize];
graphNode = new int [NumEntry*2 + 1];
graphNext = new int [NumEntry*2 + 1];
graphFirst = new int [NumVert + 1];
g = new short [NumVert + 1];
// visited = new bool [NumVert + 1];
return;
}
void PatternHasher::cleanup(void)
{
/* Free the storage for variable sized tables etc */
delete [] T1base;
delete [] T2base;
delete [] graphNode;
delete [] graphNext;
delete [] graphFirst;
delete [] g;
// delete [] visited;
}
int PatternHasher::hash(uint8_t *string)
{
uint16_t u, v;
int j;
u = 0;
for (j=0; j < EntryLen; j++)
{
T1 = T1base + j * SetSize;
u += T1[string[j] - SetMin];
}
u %= NumVert;
v = 0;
for (j=0; j < EntryLen; j++)
{
T2 = T2base + j * SetSize;
v += T2[string[j] - SetMin];
}
v %= NumVert;
return (g[u] + g[v]) % NumEntry;
}
uint16_t * PatternHasher::readT1(void)
{
return T1base;
}
uint16_t *PatternHasher::readT2(void)
{
return T2base;
}
uint16_t * PatternHasher::readG(void)
{
return (uint16_t *)g;
}

View File

@@ -27,7 +27,6 @@ const char *indentStr(int indLevel) // Indentation according to the depth of the
* not exist. */
void CALL_GRAPH::insertArc (ilFunction newProc)
{
CALL_GRAPH *pcg;
/* Check if procedure already exists */
@@ -35,7 +34,7 @@ void CALL_GRAPH::insertArc (ilFunction newProc)
if(res!=outEdges.end())
return;
/* Include new arc */
pcg = new CALL_GRAPH;
CALL_GRAPH *pcg = new CALL_GRAPH;
pcg->proc = newProc;
outEdges.push_back(pcg);
}
@@ -49,13 +48,10 @@ bool CALL_GRAPH::insertCallGraph(ilFunction caller, ilFunction callee)
insertArc (callee);
return true;
}
else
{
for (CALL_GRAPH *edg : outEdges)
if (edg->insertCallGraph (caller, callee))
return true;
return (false);
}
return false;
}
bool CALL_GRAPH::insertCallGraph(Function *caller, ilFunction callee)
@@ -333,7 +329,6 @@ void STKFRAME::adjustForArgType(size_t numArg_, hlType actType_)
{
hlType forType;
STKSYM * psym, * nsym;
int off;
/* If formal argument does not exist, do not create new ones, just
* ignore actual argument
*/
@@ -341,7 +336,7 @@ void STKFRAME::adjustForArgType(size_t numArg_, hlType actType_)
return;
/* Find stack offset for this argument */
off = m_minOff;
int off = m_minOff;
size_t i=0;
for(STKSYM &s : *this) // walk formal arguments upto numArg_
{
@@ -353,7 +348,6 @@ void STKFRAME::adjustForArgType(size_t numArg_, hlType actType_)
/* Find formal argument */
//psym = &at(numArg_);
//i = numArg_;
//auto iter=std::find_if(sym.begin(),sym.end(),[off](STKSYM &s)->bool {s.off==off;});
auto iter=std::find_if(begin()+numArg_,end(),[off](STKSYM &s)->bool {return s.label==off;});
if(iter==end()) // symbol not found
@@ -361,15 +355,16 @@ void STKFRAME::adjustForArgType(size_t numArg_, hlType actType_)
psym = &(*iter);
forType = psym->type;
if (forType != actType_)
{
if (forType == actType_)
return;
switch (actType_) {
case TYPE_UNKNOWN: case TYPE_BYTE_SIGN:
case TYPE_BYTE_UNSIGN: case TYPE_WORD_SIGN:
case TYPE_WORD_UNSIGN: case TYPE_RECORD:
break;
case TYPE_LONG_UNSIGN: case TYPE_LONG_SIGN:
case TYPE_LONG_UNSIGN:
case TYPE_LONG_SIGN:
if ((forType == TYPE_WORD_UNSIGN) ||
(forType == TYPE_WORD_SIGN) ||
(forType == TYPE_UNKNOWN))
@@ -395,6 +390,5 @@ void STKFRAME::adjustForArgType(size_t numArg_, hlType actType_)
default:
fprintf(stderr,"STKFRAME::adjustForArgType unhandled actType_ %d \n",actType_);
} /* eos */
}
}

View File

@@ -1,10 +1,13 @@
#include <QtCore/QString>
#include <QtCore/QDir>
#include <utility>
#include "dcc.h"
#include "CallGraph.h"
#include "project.h"
#include "Procedure.h"
using namespace std;
//Project g_proj;
char *asm1_name, *asm2_name; /* Assembler output filenames */
QString asm1_name, asm2_name; /* Assembler output filenames */
SYMTAB symtab; /* Global symbol table */
STATS stats; /* cfg statistics */
//PROG prog; /* programs fields */
@@ -19,19 +22,17 @@ void Project::initialize()
delete callGraph;
callGraph = nullptr;
}
void Project::create(const string &a)
void Project::create(const QString &a)
{
initialize();
QFileInfo fi(a);
m_fname=a;
string::size_type ext_loc=a.find_last_of('.');
string::size_type slash_loc=a.find_last_of('/',ext_loc);
if(slash_loc==string::npos)
slash_loc=0;
else
slash_loc++;
if(ext_loc!=string::npos)
m_project_name = a.substr(slash_loc,(ext_loc-slash_loc));
else
m_project_name = a.substr(slash_loc);
m_project_name = fi.completeBaseName();
m_output_path = fi.path();
}
QString Project::output_name(const char *ext) {
return m_output_path+QDir::separator()+m_project_name+"."+ext;
}
bool Project::valid(ilFunction iter)
{

1
tools/CMakeLists.txt Normal file
View File

@@ -0,0 +1 @@
add_subdirectory(makedsig)

248
tools/dispsrch/dispsig.cpp Normal file
View File

@@ -0,0 +1,248 @@
/* Quick program to copy a named signature to a small file */
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <string.h>
#include "perfhlib.h"
/* statics */
byte buf[100];
int numKeys; /* Number of hash table entries (keys) */
int numVert; /* Number of vertices in the graph (also size of g[]) */
int PatLen; /* Size of the keys (pattern length) */
int SymLen; /* Max size of the symbols, including null */
FILE *f; /* File being read */
FILE *f2; /* File being written */
static word *T1base, *T2base; /* Pointers to start of T1, T2 */
static word *g; /* g[] */
/* prototypes */
void grab(int n);
word readFileShort(void);
void cleanup(void);
#define SYMLEN 16
#define PATLEN 23
/* Hash table structure */
typedef struct HT_tag
{
char htSym[SYMLEN];
byte htPat[PATLEN];
} HT;
HT ht; /* One hash table entry */
void
main(int argc, char *argv[])
{
word w, len;
int i;
if (argc <= 3)
{
printf("Usage: dispsig <SigFilename> <FunctionName> <BinFileName>\n");
printf("Example: dispsig dccm8s.sig printf printf.bin\n");
exit(1);
}
if ((f = fopen(argv[1], "rb")) == NULL)
{
printf("Cannot open %s\n", argv[1]);
exit(2);
}
if ((f2 = fopen(argv[3], "wb")) == NULL)
{
printf("Cannot write to %s\n", argv[3]);
exit(2);
}
/* Read the parameters */
grab(4);
if (memcmp("dccs", buf, 4) != 0)
{
printf("Not a dccs file!\n");
exit(3);
}
numKeys = readFileShort();
numVert = readFileShort();
PatLen = readFileShort();
SymLen = readFileShort();
/* Initialise the perfhlib stuff. Also allocates T1, T2, g, etc */
hashParams( /* Set the parameters for the hash table */
numKeys, /* The number of symbols */
PatLen, /* The length of the pattern to be hashed */
256, /* The character set of the pattern (0-FF) */
0, /* Minimum pattern character value */
numVert); /* Specifies C, the sparseness of the graph.
See Czech, Havas and Majewski for details
*/
T1base = readT1();
T2base = readT2();
g = readG();
/* Read T1 and T2 tables */
grab(2);
if (memcmp("T1", buf, 2) != 0)
{
printf("Expected 'T1'\n");
exit(3);
}
len = PatLen * 256 * sizeof(word);
w = readFileShort();
if (w != len)
{
printf("Problem with size of T1: file %d, calc %d\n", w, len);
exit(4);
}
if (fread(T1base, 1, len, f) != len)
{
printf("Could not read T1\n");
exit(5);
}
grab(2);
if (memcmp("T2", buf, 2) != 0)
{
printf("Expected 'T2'\n");
exit(3);
}
w = readFileShort();
if (w != len)
{
printf("Problem with size of T2: file %d, calc %d\n", w, len);
exit(4);
}
if (fread(T2base, 1, len, f) != len)
{
printf("Could not read T2\n");
exit(5);
}
/* Now read the function g[] */
grab(2);
if (memcmp("gg", buf, 2) != 0)
{
printf("Expected 'gg'\n");
exit(3);
}
len = numVert * sizeof(word);
w = readFileShort();
if (w != len)
{
printf("Problem with size of g[]: file %d, calc %d\n", w, len);
exit(4);
}
if (fread(g, 1, len, f) != len)
{
printf("Could not read T2\n");
exit(5);
}
/* This is now the hash table */
grab(2);
if (memcmp("ht", buf, 2) != 0)
{
printf("Expected 'ht'\n");
exit(3);
}
w = readFileShort();
if (w != numKeys * (SymLen + PatLen + sizeof(word)))
{
printf("Problem with size of hash table: file %d, calc %d\n", w, len);
exit(6);
}
for (i=0; i < numKeys; i++)
{
if (fread(&ht, 1, SymLen + PatLen, f) != (size_t)(SymLen + PatLen))
{
printf("Could not read pattern %d from %s\n", i, argv[1]);
exit(7);
}
if (stricmp(ht.htSym, argv[2]) == 0)
{
/* Found it! */
break;
}
}
fclose(f);
if (i == numKeys)
{
printf("Function %s not found!\n", argv[2]);
exit(2);
}
printf("Function %s index %d\n", ht.htSym, i);
for (i=0; i < PatLen; i++)
{
printf("%02X ", ht.htPat[i]);
}
fwrite(ht.htPat, 1, PatLen, f2);
fclose(f2);
printf("\n");
}
void
cleanup(void)
{
/* Free the storage for variable sized tables etc */
if (T1base) free(T1base);
if (T2base) free(T2base);
if (g) free(g);
}
void grab(int n)
{
if (fread(buf, 1, n, f) != (size_t)n)
{
printf("Could not read\n");
exit(11);
}
}
word
readFileShort(void)
{
byte b1, b2;
if (fread(&b1, 1, 1, f) != 1)
{
printf("Could not read\n");
exit(11);
}
if (fread(&b2, 1, 1, f) != 1)
{
printf("Could not read\n");
exit(11);
}
return (b2 << 8) + b1;
}
/* Following two functions not needed unless creating tables */
void getKey(int i, byte **keys)
{
}
/* Display key i */
void
dispKey(int i)
{
}

View File

@@ -0,0 +1,11 @@
CFLAGS = -Zi -c -AL -W3 -D__MSDOS__
dispsig.exe: dispsig.obj perfhlib.obj
link /CO dispsig perfhlib;
dispsig.obj: dispsig.c dcc.h perfhlib.h
cl $(CFLAGS) $*.c
perfhlib.obj: perfhlib.c dcc.h perfhlib.h
cl $(CFLAGS) $*.c

221
tools/dispsrch/dispsrch.txt Normal file
View File

@@ -0,0 +1,221 @@
DISPSIG and SRCHSIG
===================
1 What are DispSig and SrchSig?
2 How do I use DispSig?
3 How do I use SrchSig?
4 What can I do with the binary pattern file from DispSig?
5 How can I create a binary pattern file for SrchSig?
1 What are DispSig and SrchSig?
-------------------------------
SrchSig is a program to display the name of a function, given a
signature (pattern).
DispSig is a program to display a signature, given a function name.
Dispsig also writes the signature to a binary file, so you can
disassemble it, or use it in Srchsig to see if some other signature
file has the same pattern.
2 How do I use DispSig?
-----------------------
Just type
DispSig <SignatureFileName> <FunctionName> <BinaryFileName>
For example:
dispsig dccb2s.sig strcmp strcmp.bin
Function index 58
55 8B EC 56 57 8C D8 8E C0 FC 33 C0 8B D8 8B 7E 06 8B F7 32 C0 B9 F4
This tells us that the function was the 59th function in the
signature file (and that the signature above will hash to 58
(decimal)). We can see that it is a standard C function, since it
starts with "55 8B EC", which is the standard C function prologue.
The rest of it is a bit hard to follow, but fortunately we have also
written the pattern to a binary file, strcmp.bin. See section 4 on
how to disassemble this pattern.
If I type
dispsig dcct4p.sig writeln wl.bin
I get
Function writeln not found!
In fact, there is no one function that performs the writeln function;
there are functions like WriteString, WriteInt, CrLf (Carriage
return, linefeed), and so on. Dispsig is case insensitive, so:
dispsig dcct4p.sig writestring wl.bin
produces
Function WriteString index 53
55 8B EC C4 7E 0C E8 F4 F4 75 25 C5 76 08 8B 4E 06 FC AC F4 F4 2B C8
3 How do I use SrchSig?
-----------------------
Just type
srchsig <SignatureFileName> <BinaryFileName>
dispsig dcct4p.sig writeln wl.bin
where BinaryFileName contains a pattern. See section 5 for how to
create one of these. For now, we can use the pattern file from the
first example:
srchsig dccb2s.sig strcmp.bin
Pattern:
55 8B EC 56 57 8C D8 8E C0 FC 33 C0 8B D8 8B 7E 06 8B F7 32 C0 B9 F4
Pattern hashed to 58 (0x3A), symbol strcmp
Pattern matched
Note that the pattern reported above need not be exactly the same as
the one we provided in <BinaryFileName>. The pattern displayed is the
wildcarded and chopped version of the pattern provided; it will have
F4s (wildcards) and possibly zeroes at the end; see the file
makedstp.txt for a simple explanation of wildcarding and chopping.
If we type
srchsig dccb2s.sig ws.bin
we get
Pattern:
55 8B EC C4 7E 0C E8 F4 F4 75 25 C5 76 08 8B 4E 06 FC AC F4 F4 2B C8
Pattern hashed to 0 (0x0), symbol _IOERROR
Pattern mismatch: found following pattern
55 8B EC 56 8B 76 04 0B F6 7C 14 83 FE 58 76 03 BE F4 F4 89 36 F4 F4
300
The pattern often hashes to zero when the pattern is unknown, due to
the sparse nature of the tables used in the hash function. The first
pattern in dccb2s.sig happens to be _IOERROR, and its pattern is
completely different, apart from the first three bytes. The "300" at
the end is actually a running count of signatures searched linearly,
in case there is a problem with the hash function.
4 What can I do with the binary pattern file from DispSig?
----------------------------------------------------------
You can feed it into SrchSig; this might make sense if you wanted to
know if, e.g. the signature for printf was the same for version 2 as
it is for version 3. In this case, you would use DispSig on the
version 2 signature file, and SrchSig on the version 3 file.
You can also disassemble it, using debug (it comes with MS-DOS). For
example
debug strcmp.bin
-u100 l 17
1754:0100 55 PUSH BP
1754:0101 8BEC MOV BP,SP
1754:0103 56 PUSH SI
1754:0104 57 PUSH DI
1754:0105 8CD8 MOV AX,DS
1754:0107 8EC0 MOV ES,AX
1754:0109 FC CLD
1754:010A 33C0 XOR AX,AX
1754:010C 8BD8 MOV BX,AX
1754:010E 8B7E06 MOV DI,[BP+06]
1754:0111 8BF7 MOV SI,DI
1754:0113 32C0 XOR AL,AL
1754:0115 B9F42B MOV CX,2BF4
-q
Note that the "2B" at the end is actually past the end of the
signature. (Signatures are 23 bytes (17 in hex) long, so only
addresses 100-116 are valid). Remember that most 16 bit operands will
be "wildcarded", so don't believe the resultant addresses.
5 How can I create a binary pattern file for SrchSig?
-----------------------------------------------------
Again, you can use debug. Suppose you have found an interesing piece
of code at address 05BE (this example comes from a hello world
program):
-u 5be
15FF:05BE 55 PUSH BP
15FF:05BF 8BEC MOV BP,SP
15FF:05C1 83EC08 SUB SP,+08
15FF:05C4 57 PUSH DI
15FF:05C5 56 PUSH SI
15FF:05C6 BE1E01 MOV SI,011E
15FF:05C9 8D4606 LEA AX,[BP+06]
15FF:05CC 8946FC MOV [BP-04],AX
15FF:05CF 56 PUSH SI
15FF:05D0 E8E901 CALL 07BC
15FF:05D3 83C402 ADD SP,+02
15FF:05D6 8BF8 MOV DI,AX
15FF:05D8 8D4606 LEA AX,[BP+06]
15FF:05DB 50 PUSH AX
15FF:05DC FF7604 PUSH [BP+04]
-mcs:5be l 17 cs:100
-u100 l 17
15FF:0100 55 PUSH BP
15FF:0101 8BEC MOV BP,SP
15FF:0103 83EC08 SUB SP,+08
15FF:0106 57 PUSH DI
15FF:0107 56 PUSH SI
15FF:0108 BE1E01 MOV SI,011E
15FF:010B 8D4606 LEA AX,[BP+06]
15FF:010E 8946FC MOV [BP-04],AX
15FF:0111 56 PUSH SI
15FF:0112 E8E901 CALL 02FE
15FF:0115 83C41F ADD SP,+1F
-nfoo.bin
-rcx
CS 268A
:17
-w
Writing 0017 bytes
-q
c>dir foo.bin
foo.bin 23 3-25-94 12:04
c>
The binary file has to be exactly 23 bytes long; that's why we
changed cx to the value 17 (hex 17 = decimal 23). If you are studying
a large file (> 64K) remember to set bx to 0 as well. The m (block
move) command moves the code of interest to cs:100, which is where
debug will write the file from. The "rcx" changes the length of the
save, and the "nfoo.bin" sets the name of the file to be saved. Now
we can feed this into srchsig:
srchsig dccb2s.sig foo.bin
Pattern:
55 8B EC 83 EC 08 57 56 BE F4 F4 8D 46 06 89 46 FC 56 E8 F4 F4 83 C4
Pattern hashed to 278 (0x116), symbol sleep
Pattern mismatch: found following pattern
55 8B EC 83 EC 04 56 57 8D 46 FC 50 E8 F4 F4 59 80 7E FE 5A 76 05 BF
300
Hmmm. Not a Borland C version 2 small model signature. Perhaps its a
Microsoft Version 5 signature:
Pattern:
55 8B EC 83 EC 08 57 56 BE F4 F4 8D 46 06 89 46 FC 56 E8 F4 F4 83 C4
Pattern hashed to 31 (0x1F), symbol printf
Pattern matched
Yes, it was good old printf. Of course, no need for you to guess, DCC
will figure out the vendor, version number, and model for you.

287
tools/dispsrch/srchsig.cpp Normal file
View File

@@ -0,0 +1,287 @@
/* Quick program to see if a pattern is in a sig file. Pattern is supplied
in a small .bin or .com style file */
#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include "perfhlib.h"
/* statics */
byte buf[100];
int numKeys; /* Number of hash table entries (keys) */
int numVert; /* Number of vertices in the graph (also size of g[]) */
int PatLen; /* Size of the keys (pattern length) */
int SymLen; /* Max size of the symbols, including null */
FILE *f; /* Sig file being read */
FILE *fpat; /* Pattern file being read */
static word *T1base, *T2base; /* Pointers to start of T1, T2 */
static word *g; /* g[] */
#define SYMLEN 16
#define PATLEN 23
typedef struct HT_tag
{
/* Hash table structure */
char htSym[SYMLEN];
byte htPat[PATLEN];
} HT;
HT *ht; /* Declare a pointer to a hash table */
/* prototypes */
void grab(int n);
word readFileShort(void);
void cleanup(void);
void fixWildCards(char *buf); /* In fixwild.c */
void pattSearch(void);
void
main(int argc, char *argv[])
{
word w, len;
int h, i;
int patlen;
if (argc <= 2)
{
printf("Usage: srchsig <SigFilename> <PattFilename>\n");
printf("Searches the signature file for the given pattern\n");
printf("e.g. %s dccm8s.sig mypatt.bin\n", argv[0]);
exit(1);
}
if ((f = fopen(argv[1], "rb")) == NULL)
{
printf("Cannot open signature file %s\n", argv[1]);
exit(2);
}
if ((fpat = fopen(argv[2], "rb")) == NULL)
{
printf("Cannot open pattern file %s\n", argv[2]);
exit(2);
}
/* Read the parameters */
grab(4);
if (memcmp("dccs", buf, 4) != 0)
{
printf("Not a dccs file!\n");
exit(3);
}
numKeys = readFileShort();
numVert = readFileShort();
PatLen = readFileShort();
SymLen = readFileShort();
/* Initialise the perfhlib stuff. Also allocates T1, T2, g, etc */
hashParams( /* Set the parameters for the hash table */
numKeys, /* The number of symbols */
PatLen, /* The length of the pattern to be hashed */
256, /* The character set of the pattern (0-FF) */
0, /* Minimum pattern character value */
numVert); /* Specifies C, the sparseness of the graph.
See Czech, Havas and Majewski for details
*/
T1base = readT1();
T2base = readT2();
g = readG();
/* Read T1 and T2 tables */
grab(2);
if (memcmp("T1", buf, 2) != 0)
{
printf("Expected 'T1'\n");
exit(3);
}
len = PatLen * 256 * sizeof(word);
w = readFileShort();
if (w != len)
{
printf("Problem with size of T1: file %d, calc %d\n", w, len);
exit(4);
}
if (fread(T1base, 1, len, f) != len)
{
printf("Could not read T1\n");
exit(5);
}
grab(2);
if (memcmp("T2", buf, 2) != 0)
{
printf("Expected 'T2'\n");
exit(3);
}
w = readFileShort();
if (w != len)
{
printf("Problem with size of T2: file %d, calc %d\n", w, len);
exit(4);
}
if (fread(T2base, 1, len, f) != len)
{
printf("Could not read T2\n");
exit(5);
}
/* Now read the function g[] */
grab(2);
if (memcmp("gg", buf, 2) != 0)
{
printf("Expected 'gg'\n");
exit(3);
}
len = numVert * sizeof(word);
w = readFileShort();
if (w != len)
{
printf("Problem with size of g[]: file %d, calc %d\n", w, len);
exit(4);
}
if (fread(g, 1, len, f) != len)
{
printf("Could not read T2\n");
exit(5);
}
/* This is now the hash table */
/* First allocate space for the table */
if ((ht = (HT *)malloc(numKeys * sizeof(HT))) == 0)
{
printf("Could not allocate hash table\n");
exit(1);
}
grab(2);
if (memcmp("ht", buf, 2) != 0)
{
printf("Expected 'ht'\n");
exit(3);
}
w = readFileShort();
if (w != numKeys * (SymLen + PatLen + sizeof(word)))
{
printf("Problem with size of hash table: file %d, calc %d\n", w, len);
exit(6);
}
for (i=0; i < numKeys; i++)
{
if ((int)fread(&ht[i], 1, SymLen + PatLen, f) != SymLen + PatLen)
{
printf("Could not read\n");
exit(11);
}
}
/* Read the pattern to buf */
if ((patlen = fread(buf, 1, 100, fpat)) == 0)
{
printf("Could not read pattern\n");
exit(11);
}
if (patlen != PATLEN)
{
printf("Error: pattern length is %d, should be %d\n", patlen, PATLEN);
exit(12);
}
/* Fix the wildcards */
fixWildCards(buf);
printf("Pattern:\n");
for (i=0; i < PATLEN; i++)
printf("%02X ", buf[i]);
printf("\n");
h = hash(buf);
printf("Pattern hashed to %d (0x%X), symbol %s\n", h, h, ht[h].htSym);
if (memcmp(ht[h].htPat, buf, PATLEN) == 0)
{
printf("Pattern matched");
}
else
{
printf("Pattern mismatch: found following pattern\n");
for (i=0; i < PATLEN; i++)
printf("%02X ", ht[h].htPat[i]);
printf("\n");
pattSearch(); /* Look for it the hard way */
}
cleanup();
free(ht);
fclose(f);
fclose(fpat);
}
void pattSearch(void)
{
int i;
for (i=0; i < numKeys; i++)
{
if ((i % 100) == 0) printf("\r%d ", i);
if (memcmp(ht[i].htPat, buf, PATLEN) == 0)
{
printf("\nPattern matched offset %d (0x%X)\n", i, i);
}
}
printf("\n");
}
void
cleanup(void)
{
/* Free the storage for variable sized tables etc */
if (T1base) free(T1base);
if (T2base) free(T2base);
if (g) free(g);
}
void grab(int n)
{
if (fread(buf, 1, n, f) != (size_t)n)
{
printf("Could not read\n");
exit(11);
}
}
word
readFileShort(void)
{
byte b1, b2;
if (fread(&b1, 1, 1, f) != 1)
{
printf("Could not read\n");
exit(11);
}
if (fread(&b2, 1, 1, f) != 1)
{
printf("Could not read\n");
exit(11);
}
return (b2 << 8) + b1;
}
/* Following two functions not needed unless creating tables */
void getKey(int i, byte **keys)
{
}
/* Display key i */
void
dispKey(int i)
{
}

View File

@@ -0,0 +1,14 @@
CFLAGS = -Zi -c -AL -W3 -D__MSDOS__
srchsig.exe: srchsig.obj perfhlib.obj fixwild.obj
link /CO srchsig perfhlib fixwild;
srchsig.obj: srchsig.c dcc.h perfhlib.h
cl $(CFLAGS) $*.c
perfhlib.obj: perfhlib.c dcc.h perfhlib.h
cl $(CFLAGS) $*.c
fixwild.obj: fixwild.c dcc.h
cl $(CFLAGS) $*.c

View File

@@ -0,0 +1,11 @@
set(SRC
makedsig
fixwild.cpp
LIB_PatternCollector.cpp
LIB_PatternCollector.h
TPL_PatternCollector.cpp
TPL_PatternCollector.h
)
add_executable(makedsig ${SRC})
target_link_libraries(makedsig dcc_hash)
qt5_use_modules(makedsig Core)

View File

@@ -0,0 +1,237 @@
#include "LIB_PatternCollector.h"
#include <cstring>
#include <cstring>
/** \note there is an untested assumption that the *first* segment definition
with class CODE will be the one containing all useful functions in the
LEDATA records. Functions such as _exit() have more than one segment
declared with class CODE (MSC8 libraries) */
extern void fixWildCards(uint8_t pat[]);
void readNN(int n, FILE *fl)
{
if (fseek(fl, (long)n, SEEK_CUR) != 0)
{
printf("Could not seek file\n");
exit(2);
}
}
void LIB_PatternCollector::readString(FILE *fl)
{
uint8_t len;
len = readByte(fl);
if (fread(buf, 1, len, fl) != len)
{
printf("Could not read string len %d\n", len);
exit(2);
}
buf[len] = '\0';
offset += len;
}
int LIB_PatternCollector::readSyms(FILE *fl)
{
int i;
int count = 0;
int firstSym = 0; /* First symbol this module */
uint8_t b, c, type;
uint16_t w, len;
codeLNAMES = NONE; /* Invalidate indexes for code segment */
codeSEGDEF = NONE; /* Else won't be assigned */
offset = 0; /* For diagnostics, really */
if ((leData = (uint8_t *)malloc(0xFF80)) == 0)
{
printf("Could not malloc 64k bytes for LEDATA\n");
exit(10);
}
while (!feof(fl))
{
type = readByte(fl);
len = readWord(fl);
/* Note: uncommenting the following generates a *lot* of output */
/*printf("Offset %05lX: type %02X len %d\n", offset-3, type, len);//*/
switch (type)
{
case 0x96: /* LNAMES */
while (len > 1)
{
readString(fl);
++lnum;
if (strcmp((char *)buf, "CODE") == 0)
{
/* This is the class name we're looking for */
codeLNAMES= lnum;
}
len -= strlen((char *)buf)+1;
}
b = readByte(fl); /* Checksum */
break;
case 0x98: /* Segment definition */
b = readByte(fl); /* Segment attributes */
if ((b & 0xE0) == 0)
{
/* Alignment field is zero. Frame and offset follow */
readWord(fl);
readByte(fl);
}
w = readWord(fl); /* Segment length */
b = readByte(fl); /* Segment name index */
++segnum;
b = readByte(fl); /* Class name index */
if ((b == codeLNAMES) && (codeSEGDEF == NONE))
{
/* This is the segment defining the code class */
codeSEGDEF = segnum;
}
b = readByte(fl); /* Overlay index */
b = readByte(fl); /* Checksum */
break;
case 0x90: /* PUBDEF: public symbols */
b = readByte(fl); /* Base group */
c = readByte(fl); /* Base segment */
len -= 2;
if (c == 0)
{
w = readWord(fl);
len -= 2;
}
while (len > 1)
{
readString(fl);
w = readWord(fl); /* Offset */
b = readByte(fl); /* Type index */
if (c == codeSEGDEF)
{
char *p;
HASHENTRY entry;
p = (char *)buf;
if (buf[0] == '_') /* Leading underscore? */
{
p++; /* Yes, remove it*/
}
i = std::min(size_t(SYMLEN-1), strlen(p));
memcpy(entry.name, p, i);
entry.name[i] = '\0';
entry.offset = w;
/*printf("%04X: %s is sym #%d\n", w, keys[count].name, count);//*/
keys.push_back(entry);
count++;
}
len -= strlen((char *)buf) + 1 + 2 + 1;
}
b = readByte(fl); /* Checksum */
break;
case 0xA0: /* LEDATA */
{
b = readByte(fl); /* Segment index */
w = readWord(fl); /* Offset */
len -= 3;
/*printf("LEDATA seg %d off %02X len %Xh, looking for %d\n", b, w, len-1, codeSEGDEF);//*/
if (b != codeSEGDEF)
{
readNN(len,fl); /* Skip the data */
break; /* Next record */
}
if (fread(&leData[w], 1, len-1, fl) != len-1)
{
printf("Could not read LEDATA length %d\n", len-1);
exit(2);
}
offset += len-1;
maxLeData = std::max<uint16_t>(maxLeData, w+len-1);
readByte(fl); /* Checksum */
break;
}
default:
readNN(len,fl); /* Just skip the lot */
if (type == 0x8A) /* Mod end */
{
/* Now find all the patterns for public code symbols that
we have found */
for (i=firstSym; i < count; i++)
{
uint16_t off = keys[i].offset;
if (off == (uint16_t)-1)
{
continue; /* Ignore if already done */
}
if (keys[i].offset > maxLeData)
{
printf(
"Warning: no LEDATA for symbol #%d %s "
"(offset %04X, max %04X)\n",
i, keys[i].name, off, maxLeData);
/* To make things consistant, we set the pattern for
this symbol to nulls */
memset(&keys[i].pat, 0, PATLEN);
continue;
}
/* Copy to temp buffer so don't overrun later patterns.
(e.g. when chopping a short pattern).
Beware of short patterns! */
if (off+PATLEN <= maxLeData)
{
/* Available pattern is >= PATLEN */
memcpy(buf, &leData[off], PATLEN);
}
else
{
/* Short! Only copy what is available (and malloced!) */
memcpy(buf, &leData[off], maxLeData-off);
/* Set rest to zeroes */
memset(&buf[maxLeData-off], 0, PATLEN-(maxLeData-off));
}
fixWildCards((uint8_t *)buf);
/* Save into the hash entry. */
memcpy(keys[i].pat, buf, PATLEN);
keys[i].offset = (uint16_t)-1; // Flag it as done
//printf("Saved pattern for %s\n", keys[i].name);
}
while (readByte(fl) == 0);
readNN(-1,fl); /* Unget the last byte (= type) */
lnum = 0; /* Reset index into lnames */
segnum = 0; /* Reset index into snames */
firstSym = count; /* Remember index of first sym this mod */
codeLNAMES = NONE; /* Invalidate indexes for code segment */
codeSEGDEF = NONE;
memset(leData, 0, maxLeData); /* Clear out old junk */
maxLeData = 0; /* No data read this module */
}
else if (type == 0xF1)
{
/* Library end record */
return count;
}
}
}
free(leData);
keys.clear();
return count;
}

View File

@@ -0,0 +1,25 @@
#pragma once
#include "PatternCollector.h"
struct LIB_PatternCollector : public PatternCollector
{
protected:
unsigned long offset;
uint8_t lnum = 0; /* Count of LNAMES so far */
uint8_t segnum = 0; /* Count of SEGDEFs so far */
uint8_t codeLNAMES; /* Index of the LNAMES for "CODE" class */
uint8_t codeSEGDEF; /* Index of the first SEGDEF that has class CODE */
#define NONE 0xFF /* Improbable segment index */
uint8_t *leData; /* Pointer to 64K of alloc'd data. Some .lib files
have the symbols (PUBDEFs) *after* the data
(LEDATA), so you need to keep the data here */
uint16_t maxLeData; /* How much data we have in there */
/* read a length then string to buf[]; make it an asciiz string */
void readString( FILE *fl);
public:
/* Read the .lib file, and put the keys into the array *keys[]. Returns the count */
int readSyms(FILE *fl);
};

View File

@@ -0,0 +1,300 @@
#include "TPL_PatternCollector.h"
#include <cstring>
/** \note Fundamental problem: there seems to be no information linking the names
in the system unit ("V" category) with their routines, except trial and
error. I have entered a few. There is no guarantee that the same pmap
offset will map to the same routine in all versions of turbo.tpl. They
seem to match so far in version 4 and 5.0 */
#define roundUp(w) ((w + 0x0F) & 0xFFF0)
extern void fixWildCards(uint8_t pat[]);
void TPL_PatternCollector::enterSym(FILE *f, const char *name, uint16_t pmapOffset)
{
uint16_t pm, cm, codeOffset, pcode;
uint16_t j;
/* Enter a symbol with given name */
allocSym(count);
strcpy(keys[count].name, name);
pm = pmap + pmapOffset; /* Pointer to the 4 byte pmap structure */
fseek(f, unitBase+pm, SEEK_SET);/* Go there */
cm = readShort(f); /* CSeg map offset */
codeOffset = readShort(f); /* How far into the code segment is our rtn */
j = cm / 8; /* Index into the cmap array */
pcode = csegBase+csegoffs[j]+codeOffset;
fseek(f, unitBase+pcode, SEEK_SET); /* Go there */
grab(f,PATLEN); /* Grab the pattern to buf[] */
fixWildCards(buf); /* Fix the wild cards */
memcpy(keys[count].pat, buf, PATLEN); /* Copy to the key array */
count++; /* Done one more */
}
void TPL_PatternCollector::allocSym(int count)
{
keys.resize(count);
}
void TPL_PatternCollector::readCmapOffsets(FILE *f)
{
uint16_t cumsize, csize;
uint16_t i;
/* Read the cmap table to find the start address of each segment */
fseek(f, unitBase+cmap, SEEK_SET);
cumsize = 0;
csegIdx = 0;
for (i=cmap; i < pmap; i+=8)
{
readShort(f); /* Always 0 */
csize = readShort(f);
if (csize == 0xFFFF) continue; /* Ignore the first one... unit init */
csegoffs[csegIdx++] = cumsize;
cumsize += csize;
grab(f,4);
}
}
void TPL_PatternCollector::enterSystemUnit(FILE *f)
{
/* The system unit is special. The association between keywords and
pmap entries is not stored in the .tpl file (as far as I can tell).
So we hope that they are constant pmap entries.
*/
fseek(f, 0x0C, SEEK_SET);
cmap = readShort(f);
pmap = readShort(f);
fseek(f, offStCseg, SEEK_SET);
csegBase = roundUp(readShort(f)); /* Round up to next 16 bdry */
printf("CMAP table at %04X\n", cmap);
printf("PMAP table at %04X\n", pmap);
printf("Code seg base %04X\n", csegBase);
readCmapOffsets(f);
enterSym(f,"INITIALISE", 0x04);
enterSym(f,"UNKNOWN008", 0x08);
enterSym(f,"EXIT", 0x0C);
enterSym(f,"BlockMove", 0x10);
unknown(f,0x14, 0xC8);
enterSym(f,"PostIO", 0xC8);
enterSym(f,"UNKNOWN0CC", 0xCC);
enterSym(f,"STACKCHK", 0xD0);
enterSym(f,"UNKNOWN0D4", 0xD4);
enterSym(f,"WriteString", 0xD8);
enterSym(f,"WriteInt", 0xDC);
enterSym(f,"UNKNOWN0E0", 0xE0);
enterSym(f,"UNKNOWN0E4", 0xE4);
enterSym(f,"CRLF", 0xE8);
enterSym(f,"UNKNOWN0EC", 0xEC);
enterSym(f,"UNKNOWN0F0", 0xF0);
enterSym(f,"UNKNOWN0F4", 0xF4);
enterSym(f,"ReadEOL", 0xF8);
enterSym(f,"Read", 0xFC);
enterSym(f,"UNKNOWN100", 0x100);
enterSym(f,"UNKNOWN104", 0x104);
enterSym(f,"PostWrite", 0x108);
enterSym(f,"UNKNOWN10C", 0x10C);
enterSym(f,"Randomize", 0x110);
unknown(f,0x114, 0x174);
enterSym(f,"Random", 0x174);
unknown(f,0x178, 0x1B8);
enterSym(f,"FloatAdd", 0x1B8); /* A guess! */
enterSym(f,"FloatSub", 0x1BC); /* disicx - dxbxax -> dxbxax*/
enterSym(f,"FloatMult", 0x1C0); /* disicx * dxbxax -> dxbxax*/
enterSym(f,"FloatDivide", 0x1C4); /* disicx / dxbxax -> dxbxax*/
enterSym(f,"UNKNOWN1C8", 0x1C8);
enterSym(f,"DoubleToFloat",0x1CC); /* dxax to dxbxax */
enterSym(f,"UNKNOWN1D0", 0x1D0);
enterSym(f,"WriteFloat", 0x1DC);
unknown(f,0x1E0, 0x200);
}
void TPL_PatternCollector::readString(FILE *f)
{
uint8_t len;
len = readByte(f);
grab(f,len);
buf[len] = '\0';
}
void TPL_PatternCollector::unknown(FILE *f, unsigned j, unsigned k)
{
/* Mark calls j to k (not inclusive) as unknown */
unsigned i;
for (i=j; i < k; i+= 4)
{
sprintf((char *)buf, "UNKNOWN%03X", i);
enterSym(f,(char *)buf, i);
}
}
void TPL_PatternCollector::nextUnit(FILE *f)
{
/* Find the start of the next unit */
uint16_t dsegBase, sizeSyms, sizeOther1, sizeOther2;
fseek(f, unitBase+offStCseg, SEEK_SET);
dsegBase = roundUp(readShort(f));
sizeSyms = roundUp(readShort(f));
sizeOther1 = roundUp(readShort(f));
sizeOther2 = roundUp(readShort(f));
unitBase += dsegBase + sizeSyms + sizeOther1 + sizeOther2;
fseek(f, unitBase, SEEK_SET);
if (fread(buf, 1, 4, f) == 4)
{
buf[4]='\0';
printf("Start of unit: found %s\n", buf);
}
}
void TPL_PatternCollector::setVersionSpecifics()
{
version = buf[3]; /* The x of TPUx */
switch (version)
{
case '0': /* Version 4.0 */
offStCseg = 0x14; /* Offset to the LL giving the Cseg start */
charProc = 'T'; /* Indicates a proc in the dictionary */
charFunc = 'U'; /* Indicates a function in the dictionary */
skipPmap = 6; /* Bytes to skip after Func to get pmap offset */
break;
case '5': /* Version 5.0 */
offStCseg = 0x18; /* Offset to the LL giving the Cseg start */
charProc = 'T'; /* Indicates a proc in the dictionary */
charFunc = 'U'; /* Indicates a function in the dictionary */
skipPmap = 1; /* Bytes to skip after Func to get pmap offset */
break;
default:
printf("Unknown version %c!\n", version);
exit(1);
}
}
void TPL_PatternCollector::savePos(FILE *f)
{
if (positionStack.size() >= 20)
{
printf("Overflowed filePosn array\n");
exit(1);
}
positionStack.push_back(ftell(f));
}
void TPL_PatternCollector::restorePos(FILE *f)
{
if (positionStack.empty() == 0)
{
printf("Underflowed filePosn array\n");
exit(1);
}
fseek(f, positionStack.back(), SEEK_SET);
positionStack.pop_back();
}
void TPL_PatternCollector::enterUnitProcs(FILE *f)
{
uint16_t i, LL;
uint16_t hash, hsize, dhdr, pmapOff;
char cat;
char name[40];
fseek(f, unitBase+0x0C, SEEK_SET);
cmap = readShort(f);
pmap = readShort(f);
fseek(f, unitBase+offStCseg, SEEK_SET);
csegBase = roundUp(readShort(f)); /* Round up to next 16 bdry */
printf("CMAP table at %04X\n", cmap);
printf("PMAP table at %04X\n", pmap);
printf("Code seg base %04X\n", csegBase);
readCmapOffsets(f);
fseek(f, unitBase+pmap, SEEK_SET); /* Go to first pmap entry */
if (readShort(f) != 0xFFFF) /* FFFF means none */
{
sprintf(name, "UNIT_INIT_%d", ++unitNum);
enterSym(f,name, 0); /* This is the unit init code */
}
fseek(f, unitBase+0x0A, SEEK_SET);
hash = readShort(f);
//printf("Hash table at %04X\n", hash);
fseek(f, unitBase+hash, SEEK_SET);
hsize = readShort(f);
//printf("Hash table size %04X\n", hsize);
for (i=0; i <= hsize; i+= 2)
{
dhdr = readShort(f);
if (dhdr)
{
savePos(f);
fseek(f, unitBase+dhdr, SEEK_SET);
do
{
LL = readShort(f);
readString(f);
strcpy(name, (char *)buf);
cat = readByte(f);
if ((cat == charProc) || (cat == charFunc))
{
grab(f,skipPmap); /* Skip to the pmap */
pmapOff = readShort(f); /* pmap offset */
printf("pmap offset for %13s: %04X\n", name, pmapOff);
enterSym(f,name, pmapOff);
}
//printf("%13s %c ", name, cat);
if (LL)
{
//printf("LL seek to %04X\n", LL);
fseek(f, unitBase+LL, SEEK_SET);
}
} while (LL);
restorePos(f);
}
}
}
int TPL_PatternCollector::readSyms(FILE *f)
{
grab(f,4);
if ((strncmp((char *)buf, "TPU0", 4) != 0) && ((strncmp((char *)buf, "TPU5", 4) != 0)))
{
printf("Not a Turbo Pascal version 4 or 5 library file\n");
fclose(f);
exit(1);
}
setVersionSpecifics();
enterSystemUnit(f);
unitBase = 0;
do
{
nextUnit(f);
if (feof(f)) break;
enterUnitProcs(f);
} while (1);
return count;
}

View File

@@ -0,0 +1,38 @@
#ifndef TPL_PATTERNCOLLECTOR_H
#define TPL_PATTERNCOLLECTOR_H
#include "PatternCollector.h"
#include <stdio.h>
#include <stdint.h>
#include <vector>
struct TPL_PatternCollector : public PatternCollector {
protected:
uint16_t cmap, pmap, csegBase, unitBase;
uint16_t offStCseg, skipPmap;
int count = 0;
int cAllocSym = 0;
int unitNum = 0;
char version, charProc, charFunc;
uint16_t csegoffs[100];
uint16_t csegIdx;
std::vector<long int> positionStack;
void enterSym(FILE *f,const char *name, uint16_t pmapOffset);
void allocSym(int count);
void readCmapOffsets(FILE *f);
void enterSystemUnit(FILE *f);
void readString(FILE *f);
void unknown(FILE *f,unsigned j, unsigned k);
void nextUnit(FILE *f);
void setVersionSpecifics(void);
void savePos(FILE *f);
void restorePos(FILE *f);
void enterUnitProcs(FILE *f);
public:
/* Read the .tpl file, and put the keys into the array *keys[]. Returns the count */
int readSyms(FILE *f);
};
#endif // TPL_PATTERNCOLLECTOR_H

525
tools/makedsig/fixwild.cpp Normal file
View File

@@ -0,0 +1,525 @@
/*
*$Log: fixwild.c,v $
* Revision 1.10 93/10/28 11:10:10 emmerik
* Addressing mode [reg+nnnn] is now wildcarded
*
* Revision 1.9 93/10/26 13:40:11 cifuente
* op0F(byte pat[])
*
* Revision 1.8 93/10/26 13:01:29 emmerik
* Completed the odd opcodes, like 0F XX and F7. Result: some library
* functions that were not recognised before are recognised now.
*
* Revision 1.7 93/10/11 11:37:01 cifuente
* First walk of HIGH_LEVEL icodes.
*
* Revision 1.6 93/10/01 14:36:21 emmerik
* Added $ log, and made independant of dcc.h
*
*
*/
/* * * * * * * * * * * * *\
* *
* Fix Wild Cards Code *
* *
\* * * * * * * * * * * * */
#include <memory.h>
#include <stdint.h>
#ifndef PATLEN
#define PATLEN 23
#define WILD 0xF4
#endif
static int pc; /* Indexes into pat[] */
/* prototypes */
static bool ModRM(uint8_t pat[]); /* Handle the mod/rm byte */
static bool TwoWild(uint8_t pat[]); /* Make the next 2 bytes wild */
static bool FourWild(uint8_t pat[]); /* Make the next 4 bytes wild */
void fixWildCards(uint8_t pat[]); /* Main routine */
/* Handle the mod/rm case. Returns true if pattern exhausted */
static bool ModRM(uint8_t pat[])
{
uint8_t op;
/* A standard mod/rm byte follows opcode */
op = pat[pc++]; /* The mod/rm byte */
if (pc >= PATLEN) return true; /* Skip Mod/RM */
switch (op & 0xC0)
{
case 0x00: /* [reg] or [nnnn] */
if ((op & 0xC7) == 6)
{
/* Uses [nnnn] address mode */
pat[pc++] = WILD;
if (pc >= PATLEN) return true;
pat[pc++] = WILD;
if (pc >= PATLEN) return true;
}
break;
case 0x40: /* [reg + nn] */
if ((pc+=1) >= PATLEN) return true;
break;
case 0x80: /* [reg + nnnn] */
/* Possibly just a long constant offset from a register,
but often will be an index from a variable */
pat[pc++] = WILD;
if (pc >= PATLEN) return true;
pat[pc++] = WILD;
if (pc >= PATLEN) return true;
break;
case 0xC0: /* reg */
break;
}
return false;
}
/* Change the next two bytes to wild cards */
static bool TwoWild(uint8_t pat[])
{
pat[pc++] = WILD;
if (pc >= PATLEN) return true; /* Pattern exhausted */
pat[pc++] = WILD;
if (pc >= PATLEN) return true;
return false;
}
/* Change the next four bytes to wild cards */
static bool FourWild(uint8_t pat[])
{
TwoWild(pat);
return TwoWild(pat);
}
/* Chop from the current point by wiping with zeroes. Can't rely on anything
after this point */
static void chop(uint8_t pat[])
{
if (pc >= PATLEN) return; /* Could go negative otherwise */
memset(&pat[pc], 0, PATLEN - pc);
}
static bool op0F(uint8_t pat[])
{
/* The two byte opcodes */
uint8_t op = pat[pc++];
switch (op & 0xF0)
{
case 0x00: /* 00 - 0F */
if (op >= 0x06) /* Clts, Invd, Wbinvd */
return false;
else
{
/* Grp 6, Grp 7, LAR, LSL */
return ModRM(pat);
}
case 0x20: /* Various funnies, all with Mod/RM */
return ModRM(pat);
case 0x80:
pc += 2; /* Word displacement cond jumps */
return false;
case 0x90: /* Byte set on condition */
return ModRM(pat);
case 0xA0:
switch (op)
{
case 0xA0: /* Push FS */
case 0xA1: /* Pop FS */
case 0xA8: /* Push GS */
case 0xA9: /* Pop GS */
return false;
case 0xA3: /* Bt Ev,Gv */
case 0xAB: /* Bts Ev,Gv */
return ModRM(pat);
case 0xA4: /* Shld EvGbIb */
case 0xAC: /* Shrd EvGbIb */
if (ModRM(pat)) return true;
pc++; /* The #num bits to shift */
return false;
case 0xA5: /* Shld EvGb CL */
case 0xAD: /* Shrd EvGb CL */
return ModRM(pat);
default: /* CmpXchg, Imul */
return ModRM(pat);
}
case 0xB0:
if (op == 0xBA)
{
/* Grp 8: bt/bts/btr/btc Ev,#nn */
if (ModRM(pat)) return true;
pc++; /* The #num bits to shift */
return false;
}
return ModRM(pat);
case 0xC0:
if (op <= 0xC1)
{
/* Xadd */
return ModRM(pat);
}
/* Else BSWAP */
return false;
default:
return false; /* Treat as double byte opcodes */
}
}
/* Scan through the instructions in pat[], looking for opcodes that may
have operands that vary with different instances. For example, load and
store from statics, calls to other procs (even relative calls; they may
call procs loaded in a different order, etc).
Note that this procedure is architecture specific, and assumes the
processor is in 16 bit address mode (real mode).
PATLEN bytes are scanned.
*/
void fixWildCards(uint8_t pat[])
{
uint8_t op, quad, intArg;
pc=0;
while (pc < PATLEN)
{
op = pat[pc++];
if (pc >= PATLEN) return;
quad = op & 0xC0; /* Quadrant of the opcode map */
if (quad == 0)
{
/* Arithmetic group 00-3F */
if ((op & 0xE7) == 0x26) /* First check for the odds */
{
/* Segment prefix: treat as 1 byte opcode */
continue;
}
if (op == 0x0F) /* 386 2 byte opcodes */
{
if (op0F(pat)) return;
continue;
}
if (op & 0x04)
{
/* All these are constant. Work out the instr length */
if (op & 2)
{
/* Push, pop, other 1 byte opcodes */
continue;
}
else
{
if (op & 1)
{
/* Word immediate operands */
pc += 2;
continue;
}
else
{
/* Byte immediate operands */
pc++;
continue;
}
}
}
else
{
/* All these have mod/rm bytes */
if (ModRM(pat)) return;
continue;
}
}
else if (quad == 0x40)
{
if ((op & 0x60) == 0x40)
{
/* 0x40 - 0x5F -- these are inc, dec, push, pop of general
registers */
continue;
}
else
{
/* 0x60 - 0x70 */
if (op & 0x10)
{
/* 70-7F 2 byte jump opcodes */
pc++;
continue;
}
else
{
/* Odds and sods */
switch (op)
{
case 0x60: /* pusha */
case 0x61: /* popa */
case 0x64: /* overrides */
case 0x65:
case 0x66:
case 0x67:
case 0x6C: /* insb DX */
case 0x6E: /* outsb DX */
continue;
case 0x62: /* bound */
pc += 4;
continue;
case 0x63: /* arpl */
if (TwoWild(pat)) return;
continue;
case 0x68: /* Push byte */
case 0x6A: /* Push byte */
case 0x6D: /* insb port */
case 0x6F: /* outsb port */
/* 2 byte instr, no wilds */
pc++;
continue;
}
}
}
}
else if (quad == 0x80)
{
switch (op & 0xF0)
{
case 0x80: /* 80 - 8F */
/* All have a mod/rm byte */
if (ModRM(pat)) return;
/* These also have immediate values */
switch (op)
{
case 0x80:
case 0x83:
/* One byte immediate */
pc++;
continue;
case 0x81:
/* Immediate 16 bit values might be constant, but
also might be relocatable. Have to make them
wild */
if (TwoWild(pat)) return;
continue;
}
continue;
case 0x90: /* 90 - 9F */
if (op == 0x9A)
{
/* far call */
if (FourWild(pat)) return;
continue;
}
/* All others are 1 byte opcodes */
continue;
case 0xA0: /* A0 - AF */
if ((op & 0x0C) == 0)
{
/* mov al/ax to/from [nnnn] */
if (TwoWild(pat)) return;
continue;
}
else if ((op & 0xFE) == 0xA8)
{
/* test al,#byte or test ax,#word */
if (op & 1) pc += 2;
else pc += 1;
continue;
}
case 0xB0: /* B0 - BF */
{
if (op & 8)
{
/* mov reg, #16 */
/* Immediate 16 bit values might be constant, but also
might be relocatable. For now, make them wild */
if (TwoWild(pat)) return;
}
else
{
/* mov reg, #8 */
pc++;
}
continue;
}
}
}
else
{
/* In the last quadrant of the op code table */
switch (op)
{
case 0xC0: /* 386: Rotate group 2 ModRM, byte, #byte */
case 0xC1: /* 386: Rotate group 2 ModRM, word, #byte */
if (ModRM(pat)) return;
/* Byte immediate value follows ModRM */
pc++;
continue;
case 0xC3: /* Return */
case 0xCB: /* Return far */
chop(pat);
return;
case 0xC2: /* Ret nnnn */
case 0xCA: /* Retf nnnn */
pc += 2;
chop(pat);
return;
case 0xC4: /* les Gv, Mp */
case 0xC5: /* lds Gv, Mp */
if (ModRM(pat)) return;
continue;
case 0xC6: /* Mov ModRM, #nn */
if (ModRM(pat)) return;
/* Byte immediate value follows ModRM */
pc++;
continue;
case 0xC7: /* Mov ModRM, #nnnn */
if (ModRM(pat)) return;
/* Word immediate value follows ModRM */
/* Immediate 16 bit values might be constant, but also
might be relocatable. For now, make them wild */
if (TwoWild(pat)) return;
continue;
case 0xC8: /* Enter Iw, Ib */
pc += 3; /* Constant word, byte */
continue;
case 0xC9: /* Leave */
continue;
case 0xCC: /* Int 3 */
continue;
case 0xCD: /* Int nn */
intArg = pat[pc++];
if ((intArg >= 0x34) && (intArg <= 0x3B))
{
/* Borland/Microsoft FP emulations */
if (ModRM(pat)) return;
}
continue;
case 0xCE: /* Into */
continue;
case 0xCF: /* Iret */
continue;
case 0xD0: /* Group 2 rotate, byte, 1 bit */
case 0xD1: /* Group 2 rotate, word, 1 bit */
case 0xD2: /* Group 2 rotate, byte, CL bits */
case 0xD3: /* Group 2 rotate, word, CL bits */
if (ModRM(pat)) return;
continue;
case 0xD4: /* Aam */
case 0xD5: /* Aad */
case 0xD7: /* Xlat */
continue;
case 0xD8:
case 0xD9:
case 0xDA:
case 0xDB: /* Esc opcodes */
case 0xDC: /* i.e. floating point */
case 0xDD: /* coprocessor calls */
case 0xDE:
case 0xDF:
if (ModRM(pat)) return;
continue;
case 0xE0: /* Loopne */
case 0xE1: /* Loope */
case 0xE2: /* Loop */
case 0xE3: /* Jcxz */
pc++; /* Short jump offset */
continue;
case 0xE4: /* in al,nn */
case 0xE6: /* out nn,al */
pc++;
continue;
case 0xE5: /* in ax,nn */
case 0xE7: /* in nn,ax */
pc += 2;
continue;
case 0xE8: /* Call rel */
if (TwoWild(pat)) return;
continue;
case 0xE9: /* Jump rel, unconditional */
if (TwoWild(pat)) return;
chop(pat);
return;
case 0xEA: /* Jump abs */
if (FourWild(pat)) return;
chop(pat);
return;
case 0xEB: /* Jmp short unconditional */
pc++;
chop(pat);
return;
case 0xEC: /* In al,dx */
case 0xED: /* In ax,dx */
case 0xEE: /* Out dx,al */
case 0xEF: /* Out dx,ax */
continue;
case 0xF0: /* Lock */
case 0xF2: /* Repne */
case 0xF3: /* Rep/repe */
case 0xF4: /* Halt */
case 0xF5: /* Cmc */
case 0xF8: /* Clc */
case 0xF9: /* Stc */
case 0xFA: /* Cli */
case 0xFB: /* Sti */
case 0xFC: /* Cld */
case 0xFD: /* Std */
continue;
case 0xF6: /* Group 3 byte test/not/mul/div */
case 0xF7: /* Group 3 word test/not/mul/div */
case 0xFE: /* Inc/Dec group 4 */
if (ModRM(pat)) return;
continue;
case 0xFF: /* Group 5 Inc/Dec/Call/Jmp/Push */
/* Most are like standard ModRM */
if (ModRM(pat)) return;
continue;
default: /* Rest are single byte opcodes */
continue;
}
}
}
}

175
tools/makedsig/makedsig.cpp Normal file
View File

@@ -0,0 +1,175 @@
/* Program for making the DCC signature file */
#include "LIB_PatternCollector.h"
#include "TPL_PatternCollector.h"
#include "perfhlib.h" /* Symbol table prototypes */
#include <QtCore/QCoreApplication>
#include <QtCore/QStringList>
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <memory.h>
#include <string.h>
#include <algorithm>
/* Symbol table constnts */
#define C 2.2 /* Sparseness of graph. See Czech, Havas and Majewski for details */
/* prototypes */
void saveFile(FILE *fl, const PerfectHash &p_hash, PatternCollector *coll); /* Save the info */
int numKeys; /* Number of useful codeview symbols */
static void printUsage(bool longusage) {
if(longusage)
printf(
"This program is to make 'signatures' of known c and tpl library calls for the dcc program.\n"
"It needs as the first arg the name of a library file, and as the second arg, the name "
"of the signature file to be generated.\n"
"Example: makedsig CL.LIB dccb3l.sig\n"
" or makedsig turbo.tpl dcct4p.sig\n"
);
else
printf("Usage: makedsig <libname> <signame>\n"
"or makedsig -h for help\n");
}
int main(int argc, char *argv[])
{
QCoreApplication app(argc,argv);
FILE *f2; // output file
FILE *srcfile; // .lib file
int s;
if(app.arguments().size()<2) {
printUsage(false);
return 0;
}
QString arg2 = app.arguments()[1];
if (arg2.startsWith("-h") || arg2.startsWith("-?"))
{
printUsage(true);
return 0;
}
PatternCollector *collector;
if(arg2.endsWith("tpl")) {
collector = new TPL_PatternCollector;
} else if(arg2.endsWith(".lib")) {
collector = new LIB_PatternCollector;
}
if ((srcfile = fopen(argv[1], "rb")) == NULL)
{
printf("Cannot read %s\n", argv[1]);
exit(2);
}
if ((f2 = fopen(argv[2], "wb")) == NULL)
{
printf("Cannot write %s\n", argv[2]);
exit(2);
}
fprintf(stderr, "Seed: ");
scanf("%d", &s);
srand(s);
PerfectHash p_hash;
numKeys = collector->readSyms(srcfile); /* Read the keys (symbols) */
printf("Num keys: %d; vertices: %d\n", numKeys, (int)(numKeys*C));
/* Set the parameters for the hash table */
p_hash.setHashParams( numKeys, /* The number of symbols */
PATLEN, /* The length of the pattern to be hashed */
256, /* The character set of the pattern (0-FF) */
0, /* Minimum pattern character value */
numKeys*C); /* C is the sparseness of the graph. See Czech,
Havas and Majewski for details */
/* The following two functions are in perfhlib.c */
p_hash.map(collector); /* Perform the mapping. This will call getKey() repeatedly */
p_hash.assign(); /* Generate the function g */
saveFile(f2,p_hash,collector); /* Save the resultant information */
fclose(srcfile);
fclose(f2);
}
/* * * * * * * * * * * * *\
* *
* S a v e t h e s i g f i l e *
* *
\* * * * * * * * * * * * */
void writeFile(FILE *fl,const char *buffer, int len)
{
if ((int)fwrite(buffer, 1, len, fl) != len)
{
printf("Could not write to file\n");
exit(1);
}
}
void writeFileShort(FILE *fl,uint16_t w)
{
uint8_t b;
b = (uint8_t)(w & 0xFF);
writeFile(fl,(char *)&b, 1); /* Write a short little endian */
b = (uint8_t)(w>>8);
writeFile(fl,(char *)&b, 1);
}
void saveFile(FILE *fl, const PerfectHash &p_hash, PatternCollector *coll)
{
int i, len;
const uint16_t *pTable;
writeFile(fl,"dccs", 4); /* Signature */
writeFileShort(fl,numKeys); /* Number of keys */
writeFileShort(fl,(short)(numKeys * C)); /* Number of vertices */
writeFileShort(fl,PATLEN); /* Length of key part of entries */
writeFileShort(fl,SYMLEN); /* Length of symbol part of entries */
/* Write out the tables T1 and T2, with their sig and byte lengths in front */
writeFile(fl,"T1", 2); /* "Signature" */
pTable = p_hash.readT1();
len = PATLEN * 256;
writeFileShort(fl,len * sizeof(uint16_t));
for (i=0; i < len; i++)
{
writeFileShort(fl,pTable[i]);
}
writeFile(fl,"T2", 2);
pTable = p_hash.readT2();
writeFileShort(fl,len * sizeof(uint16_t));
for (i=0; i < len; i++)
{
writeFileShort(fl,pTable[i]);
}
/* Write out g[] */
writeFile(fl,"gg", 2); /* "Signature" */
pTable = p_hash.readG();
len = (short)(numKeys * C);
writeFileShort(fl,len * sizeof(uint16_t));
for (i=0; i < len; i++)
{
writeFileShort(fl,pTable[i]);
}
/* Now the hash table itself */
writeFile(fl,"ht ", 2); /* "Signature" */
writeFileShort(fl,numKeys * (SYMLEN + PATLEN + sizeof(uint16_t))); /* byte len */
for (i=0; i < numKeys; i++)
{
writeFile(fl,(char *)&coll->keys[i], SYMLEN + PATLEN);
}
}

188
tools/makedsig/makedsig.txt Normal file
View File

@@ -0,0 +1,188 @@
MAKEDSIG
1 What is MakeDsig?
2 How does it work?
3 How do I use MakeDsig?
4 What's in a signature file?
5 What other tools are useful for signature work?
1 What is MakeDsig?
-------------------
MakeDsig is a program that reads a library (.lib) file from a
compiler, and generates a signature file for use by DCC. Without
signature files, dcc cannot recognise library functions, and so will
attempt to decompile them, and cannot name them. This makes the
resultant decompiled code bulkier and difficult to understand.
2 How does it work?
-------------------
Library files contain complete functions, relocation information,
function names, and more. MakeDsig reads a library file, and for each
function found, it saves the name, and creates a signature. These
are stored in an array. When all functions are done, tables for the
perfect hashing function are generated. During this process,
duplicate keys (functions that produce identical signatures) may be
detected; if so, one of the keys will be zeroed.
The signature file contains information needed by dcc to hash the
signatures, as well as the symbols and signatures. Dcc reads the various
sections of the signature file to be able to hash signatures. The
signatures, not the symbols, are hashed, since dcc gets a signature
from the executable file, and needs to know quickly if there is a
symbolic name for it.
3 How do I use MakeDsig?
------------------------
You can always find out by just executing it with no arguments, or
MakeDsig -h for more details.
Basically, you just give it the names of the files that it needs:
MakeDsig <libname> <signame>
It will ask you for a seed; enter any number, e.g. 1.
You need the library file for the appropriate compiler. For example,
to analyse executable programs created from Turbo C 2.1 small model,
you need the cs.lib file that comes with that compiler.
You also need to know the correct name for the signature file, i.e.
<signame>. Dcc will detect certain compiler vendors and version
numbers, and will look for a signature file named like this:
d c c <vendor> <version> <model> . s i g
Here are the current vendors:
Vendor Vendor letter
Microsoft C/C++ m
Borland C/C++ b
Logitech (Modula) l
Turbo Pascal t
Here are the model codes:
small/tiny s
medium m
compact c
large l
Turbo Pascal p
The version codes are fairly self explanatory:
Microsoft C 5.1 5
Microsoft C 8 8
Borland C 2.0 2
Borland C 3.0 3
Turbo Pascal 3.0 3 Note: currently no way to make dcct3p.sig
Turbo Pascal 4.0 4 Use Makedstp, not makedsig
Turbo Pascal 5.0 5 Use Makedstp, not makedsig
Some examples: the signature file for Borland C version 2.0, small
model, would be dccb2s.sig. To generate it, you would supply as the
library file cs.lib that came with that compiler. Suppose it was in
the \bc\lib directory. To generate the signature file required to
work with files produced by this compiler, you would type
makedsig \bc\lib\cs.lib dccb2s.sig
This will create dccb2s.sig in the current directory. For dcc to use
this file, place it in the same directory as dcc itself, or point the
environment variable DCC to the directory containing it.
Another example: to make the signature file for Microsoft Visual
C/C++ (C 8.0), large model, and assuming the libraries are in
the directory \msvc\lib, you would type
makedsig \msvc\lib\llibce.lib dccm8l.sig
Note that the signature files for Turbo Pascal from version 4 onwards
are generated by makedstp, not makedsig. The latter program reads a
special file called turbo.tpl, as there are no normal .lib files for
turbo pascal. Dcc will recognise turbo pascal 3.0 files, and look
for dcct3p.sig. Because all the library routines are contained in
every Turbo Pascal executable, there are no library files or even a
turbo.tpl file, so the signature file would have to be constructed by
guesswork. You can still use dcc on these files; just ignore the
warning about not finding the signature file.
For executables that dcc does not recognise, it will look for the
signature file dccxxx.sig. That way, if you have a new compiler, you
can at least have dcc detect library calls, even if it attempts to
decompile them all, and has not identified the main program.
Logitech Modula V1.0 files are recognised, and the signature file
dccl1x.sig is looked for. This was experimental in nature, and is not
recommended for serious analysis at this stage.
4 What's in a signature file?
-----------------------------
The details of a signature file are best documented in the source for
makedsig; see the function saveFile(). Briefly:
1) a 4 byte pattern identifying the file as a signature file: "dccs".
2) a two byte integer containing the number of keys (signatures)
3) a two byte integer containing the number of vertices on the graph
used to generate the hash table. See the source code and/or the
Czech, Havas and Majewski articles for details
4) a two byte integer containing the pattern length
5) a two byte integer containing the symbolic name length
The next sections all have the following structure:
1) 2 char ID
2) a two byte integer containing the size of the body
3) the body.
There are 4 sections: "T1", "T2", "gg", and "ht". T1 and T2 are the
tables associated with the hash function. (The hash function is a
random function, meaning that it involves tables. T1 and T2 are the
tables used by the hash function). "gg" is another table associated
with the graph needed by the perfect hashing function algorithm.
"ht" contains the actual hash table. The body of this section is an
array of records of this structure:
typedef struct _hashEntry
{
char name[SYMLEN]; /* The symbol name */
byte pat [PATLEN]; /* The pattern */
word offset; /* Offset (needed temporarily) */
} HASHENTRY;
This part of the signature file can be browsed with a binary dump
program; a PATLEN length signature will follow the (null padded)
symbol name. There are tools for searching signature files, e.g.
srchsig, dispsig, and readsig. See below.
5 What other tools are useful for signature work?
-------------------------------------------------
Makedstp - makes signature files from turbo.tpl. Needed to make
signature files for Turbo Pascal version 4.0 and later.
SrchSig - tells you whether a given pattern exists in a signature
file, and gives its name. You need a binary file with the signature
in it, exactly the right length. This can most easily be done with
debug (comes with MS-DOS).
DispSig - given the name of a function, displays its signature, and
stores the signature into a binary file as well. (You can use this
file with srchsig on another signature file, if you want).
ReadSig - reads a signature file, checking for correct structure, and
displaying duplicate signatures. With the -a switch, it will display
all signatures, with their symbols.
The file perfhlib.c is used by various of these tools to do the work
of the perfect hashing functions. It could be used as part of other
tools that use signature files, or just perfect hashing functions for
that matter.

View File

117
tools/parsehdr/locident.h Normal file
View File

@@ -0,0 +1,117 @@
/*$Log: locident.h,v $
* Revision 1.6 94/02/22 15:20:23 cifuente
* Code generation is done.
*
* Revision 1.5 93/12/10 09:38:20 cifuente
* New high-level types
*
* Revision 1.4 93/11/10 17:30:51 cifuente
* Procedure header, locals
*
* Revision 1.3 93/11/08 12:06:35 cifuente
* du1 analysis finished. Instantiates procedure arguments for user
* declared procedures.
*
* Revision 1.2 93/10/25 11:01:00 cifuente
* New SYNTHETIC instructions for d/u analysis
*
* Revision 1.1 93/10/11 11:47:39 cifuente
* Initial revision
*
* File: locIdent.h
* Purpose: High-level local identifier definitions
* Date: October 1993
*/
/* Type definition */
typedef struct {
Int csym; /* # symbols used */
Int alloc; /* # symbols allocated */
Int *idx; /* Array of integer indexes */
} IDX_ARRAY;
/* Type definitions used in the decompiled program */
typedef enum {
TYPE_UNKNOWN = 0, /* unknown so far */
TYPE_BYTE_SIGN, /* signed byte (8 bits) */
TYPE_BYTE_UNSIGN, /* unsigned byte */
TYPE_WORD_SIGN, /* signed word (16 bits) */
TYPE_WORD_UNSIGN, /* unsigned word (16 bits) */
TYPE_LONG_SIGN, /* signed long (32 bits) */
TYPE_LONG_UNSIGN, /* unsigned long (32 bits) */
TYPE_RECORD, /* record structure */
TYPE_PTR, /* pointer (32 bit ptr) */
TYPE_STR, /* string */
TYPE_CONST, /* constant (any type) */
TYPE_FLOAT, /* floating point */
TYPE_DOUBLE, /* double precision float */
} hlType;
static char *hlTypes[13] = {"", "char", "unsigned char", "int", "unsigned int",
"long", "unsigned long", "record", "int *", "char *",
"", "float", "double"};
typedef enum {
STK_FRAME, /* For stack vars */
REG_FRAME, /* For register variables */
GLB_FRAME, /* For globals */
} frameType;
/* Enumeration to determine whether pIcode points to the high or low part
* of a long number */
typedef enum {
HIGH_FIRST, /* High value is first */
LOW_FIRST, /* Low value is first */
} hlFirst;
/* LOCAL_ID */
typedef struct {
hlType type; /* Probable type */
boolT illegal;/* Boolean: not a valid field any more */
IDX_ARRAY idx; /* Index into icode array (REG_FRAME only) */
frameType loc; /* Frame location */
boolT hasMacro;/* Identifier requires a macro */
char macro[10];/* Macro for this identifier */
char name[20];/* Identifier's name */
union { /* Different types of identifiers */
byte regi; /* For TYPE_BYTE(WORD)_(UN)SIGN registers */
struct { /* For TYPE_BYTE(WORD)_(UN)SIGN on the stack */
byte regOff; /* register offset (if any) */
Int off; /* offset from BP */
} bwId;
struct _bwGlb { /* For TYPE_BYTE(WORD)_(UN)SIGN globals */
int16 seg; /* segment value */
int16 off; /* offset */
byte regi; /* optional indexed register */
} bwGlb;
struct _longId{ /* For TYPE_LONG_(UN)SIGN registers */
byte h; /* high register */
byte l; /* low register */
} longId;
struct _longStkId { /* For TYPE_LONG_(UN)SIGN on the stack */
Int offH; /* high offset from BP */
Int offL; /* low offset from BP */
} longStkId;
struct { /* For TYPE_LONG_(UN)SIGN globals */
int16 seg; /* segment value */
int16 offH; /* offset high */
int16 offL; /* offset low */
byte regi; /* optional indexed register */
} longGlb;
struct { /* For TYPE_LONG_(UN)SIGN constants */
dword h; /* high word */
dword l; /* low word */
} longKte;
} id;
} ID;
typedef struct {
Int csym; /* No. of symbols in the table */
Int alloc; /* No. of symbols allocated */
ID *id; /* Identifier */
} LOCAL_ID;

1538
tools/parsehdr/parsehdr.cpp Normal file

File diff suppressed because it is too large Load Diff

98
tools/parsehdr/parsehdr.h Normal file
View File

@@ -0,0 +1,98 @@
/*
*$Log: parsehdr.h,v $
*/
/* Header file for parsehdr.c */
typedef unsigned long dword; /* 32 bits */
typedef unsigned char byte; /* 8 bits */
typedef unsigned short word; /* 16 bits */
typedef unsigned char boolT; /* 8 bits */
#define TRUE 1
#define FALSE 0
#define BUFF_SIZE 8192 /* Holds a declaration */
#define FBUF_SIZE 32700 /* Holds part of a header file */
#define NARGS 15
#define NAMES_L 160
#define TYPES_L 160
#define FUNC_L 160
#define ERRF stdout
void phError(char *errmsg);
void phWarning(char *errmsg);
#define ERR(msg) phError(msg)
#ifdef DEBUG
#define DBG(str) printf(str);
#else
#define DBG(str) ;
#endif
#define WARN(msg) phWarning(msg)
#define OUT(str) fprintf(outfile, str)
#define PH_PARAMS 32
#define PH_NAMESZ 15
#define SYMLEN 16 /* Including the null */
#define Int long /* For locident.h */
#define int16 short int /* For locident.h */
#include "locident.h" /* For the hlType enum */
#define bool unsigned char /* For internal use */
#define TRUE 1
#define FALSE 0
typedef
struct ph_func_tag
{
char name[SYMLEN]; /* Name of function or arg */
hlType typ; /* Return type */
int numArg; /* Number of args */
int firstArg; /* Index of first arg in chain */
int next; /* Index of next function in chain */
bool bVararg; /* True if variable num args */
} PH_FUNC_STRUCT;
typedef
struct ph_arg_tag
{
char name[SYMLEN]; /* Name of function or arg */
hlType typ; /* Parameter type */
} PH_ARG_STRUCT;
#define DELTA_FUNC 32 /* Number to alloc at once */
#define PH_JUNK 0 /* LPSTR buffer, nothing happened */
#define PH_PROTO 1 /* LPPH_FUNC ret val, func name, args */
#define PH_FUNCTION 2 /* LPPH_FUNC ret val, func name, args */
#define PH_TYPEDEF 3 /* LPPH_DEF definer and definee */
#define PH_DEFINE 4 /* LPPH_DEF definer and definee */
#define PH_ERROR 5 /* LPSTR error string */
#define PH_WARNING 6 /* LPSTR warning string */
#define PH_MPROTO 7 /* ????? multi proto???? */
#define PH_VAR 8 /* ????? var decl */
/* PROTOS */
boolT phData(char *buff, int ndata);
boolT phPost(void);
boolT phFree(void);
void checkHeap(char *msg); /* For debugging only */
void phBuffToFunc(char *buff);
void phBuffToDef(char *buff);
#define TOK_TYPE 256 /* A type name (e.g. "int") */
#define TOK_NAME 257 /* A function or parameter name */
#define TOK_DOTS 258 /* "..." */
#define TOK_EOL 259 /* End of line */
typedef enum
{
BT_INT, BT_CHAR, BT_FLOAT, BT_DOUBLE, BT_STRUCT, BT_VOID, BT_UNKWN
} baseType;

217
tools/parsehdr/parsehdr.txt Normal file
View File

@@ -0,0 +1,217 @@
PARSEHDR
1 What is ParseHdr?
2 What is dcclibs.dat?
3 How do I use ParseHdr?
4 What about languages other than C?
5 What is the structure of the dcclibs.dat file?
6 What are all these errors, and why do they happen?
1 What is ParseHdr?
-------------------
ParseHdr is a program that creates a special prototype file for DCC
from a set of include files (.h files). This allows DCC to be aware
of the type of library function arguments, and return types. The file
produced is called dcclibs.dat. ParseHdr is designed specifically for
C header files.
As an example, this is what allows DCC to recognise that printf has
(at least) a string argument, and so converts the first argument from
a numeric constant to a string. So you get
printf("Hello world")
instead of
printf(0x42).
2 What is dcclibs.dat?
----------------------
dcclibs.dat is the file created by the ParseHdr program. It contains
a list of function names and parameter and return types. See section
5 for details of the contents of the file.
3 How do I use ParseHdr?
------------------------
To use ParseHdr you need a file containing a list of header files,
like this:
\tc\include\alloc.h
\tc\include\assert.h
\tc\include\bios.h
...
\tc\include\time.h
There must be one file per line, no blank lines, and unless the
header files are in the current directory, a full path must be given.
The easiest way to create such a file is to redirect the output of a
dir command to a file, like this:
c>dir \tc\include\*.h > tcfiles.lst
and then edit the resultant file. Note that the path will not be
included in this, so you will have to add that manually. Remove
everything after the .h, such as file size, date, etc.
Once you have this file, you can run parsehdr:
parsehdr <listfile>
For example,
parsehdr tcfiles.lst
You will get some messages indicating which files are being
processed, but also some error messages. Just ignore the error
messages, see section 6 for why they occur.
4 What about languages other than C?
-----------------------------------------
ParseHdr will only work on C header files. It would be possible to
process files for other languages that contained type information, to
produce a dcclibs.dat file specific to that language. Ideally, DCC
should look for a different file for each language, but since only a
C version of dcclibs.dat has so far been created, this has not been
done.
Prototype information for Turbo Pascal exists in the file turbo.tpl,
at least for things like the graphics library, so it would be
possible for MakeDsTp to produce a dcclibs.dat file as well as the
signature file. However, the format of the turbo.tpl file is not
documented by Borland; for details see
W. L. Peavy, "Inside Turbo Pascal 6.0 Units", Public domain software
file tpu6doc.txt in tpu6.zip. Anonymous ftp from garbo.uwasa.fi and
mirrors, directory /pc/turbopas, 1991.
5 What is the structure of the dcclibs.dat file?
------------------------------------------------
The first 4 bytes are "dccp", identifying it as a DCC prototype file.
After this, there are two sections.
The first section begins with "FN", for Function Names. It is
followed by a two byte integer giving the number of function names
stored. The remainder of this section is an array of structures, one
per function name. Each has this structure:
char Name[SYMLEN]; /* Name of the function, NULL terminated */
int type; /* A 2 byte integer describing the return type */
int numArg; /* The number of arguments */
int firstArg; /* The index of the first arg, see below */
char bVarArg; /* 1 if variable arguments, 0 otherwise */
SYMLEN is 16, alowing 15 chars before the NULL. Therefore, the length
of this structure is 23 bytes.
The types are as defined in locident.h (actually a part of dcc), and
at present are as follows:
typedef enum {
TYPE_UNKNOWN = 0, /* unknown so far 00 */
TYPE_BYTE_SIGN, /* signed byte (8 bits) 01 */
TYPE_BYTE_UNSIGN, /* unsigned byte 02 */
TYPE_WORD_SIGN, /* signed word (16 bits) 03 */
TYPE_WORD_UNSIGN, /* unsigned word (16 bits) 04 */
TYPE_LONG_SIGN, /* signed long (32 bits) 05 */
TYPE_LONG_UNSIGN, /* unsigned long (32 bits) 06 */
TYPE_RECORD, /* record structure 07 */
TYPE_PTR, /* pointer (32 bit ptr) 08 */
TYPE_STR, /* string 09 */
TYPE_CONST, /* constant (any type) 0A */
TYPE_FLOAT, /* floating point 0B */
TYPE_DOUBLE, /* double precision float 0C */
} hlType;
firstArg is an index into the array in the second section.
The second section begins with "PM" (for Parameters). It is followed
by a 2 byte integer giving the number of parameter records. After
this is the array of parameter structures. Initially, the names of the
parameters were being stored, but this has been removed at present.
The parameter structure is therefore now just a single 2 byte
integer, representing the type of that argument.
The way it all fits together is perhaps best described by an example.
Lets consider this entry in dcclibs.dat:
73 74 72 63 6D 70 00 ; "strcmp"
00 00 00 00 00 00 00 00 00 ; Padding to 16 bytes
03 00 ; Return type 3, TYPE_WORD_UNSIGN
02 00 ; 2 arguments
15 02 ; First arg is 0215
00 ; Not var args
If we now skip to the "PM" part of the file, skip the number of
arguments word, then skip 215*2 = 42A bytes, we find this:
09 00 09 00 09 00 ...
The first 09 00 (TYPE_STR) refers to the type of the first parameter,
and the second to the second parameter. There are only 2 arguments,
so the third 09 00 refers to the first parameter of the next
function. So both parameters are strings, as is expected.
For functions with variable parameters, bVarArg is set to 01, and the
number of parameters reported is the number of fixed parameters. Here
is another example:
66 70 72 69 6E 74 66 00 ; "fprintf"
00 00 00 00 00 00 00 00 ; padding
03 00 ; return type 3, TYPE_WORD_UNSIGN
02 00 ; 2 fixed args
81 01 ; First arg at index 0181
01 ; Var args
and in the "PM" section at offset 181*2 = 0302, we find 08 00 09 00
03 00 meaning that the first parameter is a pointer (in fact, we know
it's a FILE *), and the second parameter is a string.
6 What are all these errors, and why do they happen?
----------------------------------------------------
When you run ParseHdr, as well as the progress statements like
Processing \tc\include\alloc.h ...
you can get error messages. Basically, ignore these errors. They occur
for a variety of reasons, most of which are detailed below.
1)
Expected type: got ) (29)
void __emit__()
^
This include file contained a non ansi prototype. This is rare, and
__emit__ is not a standard function anyway. If it really bothers you,
you could add the word "void" to the empty parentheses in your
include file.
2)
Expected ',' between parameter defs: got ( (28)
void _Cdecl ctrlbrk (int _Cdecl (*handler)(void))
Here "handler" is a pointer to a function. Being a basically simple
program, ParseHdr does not expand all typedef and #define statements,
so it cannot distinguish between types and user defined function
names. Therefore, it is not possible in general to parse any
prototypes containing pointers to functions, so at this stage, any
such prototypes will produce an error of some sort. DCC cannot
currently make use of this type information anyway, so this is no
real loss. There are typically half a dozen such errors.
3)
Unknown type time_t
Types (such as time_t) that are structures or pointers to structures
are not handled by ParseHdr, since typedef and #define statements are
ignored. Again, there are typically only about a dozen of these.

View File

@@ -0,0 +1,8 @@
CFLAGS = -Zi -c -AS -W3 -D__MSDOS__
parselib.exe: parselib.obj
link /CO parselib;
parselib.obj: parselib.c
cl $(CFLAGS) $*.c

View File

@@ -0,0 +1,24 @@
\tc\include\alloc.h
\tc\include\assert.h
\tc\include\bios.h
\tc\include\conio.h
\tc\include\ctype.h
\tc\include\dir.h
\tc\include\dos.h
\tc\include\errno.h
\tc\include\fcntl.h
\tc\include\float.h
\tc\include\io.h
\tc\include\limits.h
\tc\include\math.h
\tc\include\mem.h
\tc\include\process.h
\tc\include\setjmp.h
\tc\include\share.h
\tc\include\signal.h
\tc\include\stdarg.h
\tc\include\stddef.h
\tc\include\stdio.h
\tc\include\stdlib.h
\tc\include\string.h
\tc\include\time.h

View File

239
tools/readsig/readsig.cpp Normal file
View File

@@ -0,0 +1,239 @@
/* Quick program to read the output from makedsig */
#include <stdio.h>
#include <io.h>
#include <stdlib.h>
#include <memory.h>
#include <string.h>
#include "perfhlib.h"
/* statics */
byte buf[100];
int numKeys; /* Number of hash table entries (keys) */
int numVert; /* Number of vertices in the graph (also size of g[]) */
int PatLen; /* Size of the keys (pattern length) */
int SymLen; /* Max size of the symbols, including null */
FILE *f; /* File being read */
static word *T1base, *T2base; /* Pointers to start of T1, T2 */
static word *g; /* g[] */
/* prototypes */
void grab(int n);
word readFileShort(void);
void cleanup(void);
static bool bDispAll = FALSE;
void
main(int argc, char *argv[])
{
word w, len;
int h, i, j;
long filePos;
if (argc <= 1)
{
printf("Usage: readsig [-a] <SigFilename>\n");
printf("-a for all symbols (else just duplicates)\n");
exit(1);
}
i = 1;
if (strcmp(argv[i], "-a") == 0)
{
i++;
bDispAll = TRUE;
}
if ((f = fopen(argv[i], "rb")) == NULL)
{
printf("Cannot open %s\n", argv[i]);
exit(2);
}
/* Read the parameters */
grab(4);
if (memcmp("dccs", buf, 4) != 0)
{
printf("Not a dccs file!\n");
exit(3);
}
numKeys = readFileShort();
numVert = readFileShort();
PatLen = readFileShort();
SymLen = readFileShort();
/* Initialise the perfhlib stuff. Also allocates T1, T2, g, etc */
hashParams( /* Set the parameters for the hash table */
numKeys, /* The number of symbols */
PatLen, /* The length of the pattern to be hashed */
256, /* The character set of the pattern (0-FF) */
0, /* Minimum pattern character value */
numVert); /* Specifies C, the sparseness of the graph.
See Czech, Havas and Majewski for details
*/
T1base = readT1();
T2base = readT2();
g = readG();
/* Read T1 and T2 tables */
grab(2);
if (memcmp("T1", buf, 2) != 0)
{
printf("Expected 'T1'\n");
exit(3);
}
len = PatLen * 256 * sizeof(word);
w = readFileShort();
if (w != len)
{
printf("Problem with size of T1: file %d, calc %d\n", w, len);
exit(4);
}
if (fread(T1base, 1, len, f) != len)
{
printf("Could not read T1\n");
exit(5);
}
grab(2);
if (memcmp("T2", buf, 2) != 0)
{
printf("Expected 'T2'\n");
exit(3);
}
w = readFileShort();
if (w != len)
{
printf("Problem with size of T2: file %d, calc %d\n", w, len);
exit(4);
}
if (fread(T2base, 1, len, f) != len)
{
printf("Could not read T2\n");
exit(5);
}
/* Now read the function g[] */
grab(2);
if (memcmp("gg", buf, 2) != 0)
{
printf("Expected 'gg'\n");
exit(3);
}
len = numVert * sizeof(word);
w = readFileShort();
if (w != len)
{
printf("Problem with size of g[]: file %d, calc %d\n", w, len);
exit(4);
}
if (fread(g, 1, len, f) != len)
{
printf("Could not read T2\n");
exit(5);
}
/* This is now the hash table */
grab(2);
if (memcmp("ht", buf, 2) != 0)
{
printf("Expected 'ht'\n");
exit(3);
}
w = readFileShort();
if (w != numKeys * (SymLen + PatLen + sizeof(word)))
{
printf("Problem with size of hash table: file %d, calc %d\n", w, len);
exit(6);
}
if (bDispAll)
{
fseek(f, 0, SEEK_CUR); /* Needed due to bug in MS fread()! */
filePos = _lseek(fileno(f), 0, SEEK_CUR);
for (i=0; i < numKeys; i++)
{
grab(SymLen + PatLen);
printf("%16s ", buf);
for (j=0; j < PatLen; j++)
{
printf("%02X", buf[SymLen+j]);
if ((j%4) == 3) printf(" ");
}
printf("\n");
}
printf("\n\n\n");
fseek(f, filePos, SEEK_SET);
}
for (i=0; i < numKeys; i++)
{
grab(SymLen + PatLen);
h = hash(&buf[SymLen]);
if (h != i)
{
printf("Symbol %16s (index %3d) hashed to %d\n",
buf, i, h);
}
}
printf("Done!\n");
fclose(f);
}
void
cleanup(void)
{
/* Free the storage for variable sized tables etc */
if (T1base) free(T1base);
if (T2base) free(T2base);
if (g) free(g);
}
void grab(int n)
{
if (fread(buf, 1, n, f) != (size_t)n)
{
printf("Could not read\n");
exit(11);
}
}
word
readFileShort(void)
{
byte b1, b2;
if (fread(&b1, 1, 1, f) != 1)
{
printf("Could not read\n");
exit(11);
}
if (fread(&b2, 1, 1, f) != 1)
{
printf("Could not read\n");
exit(11);
}
return (b2 << 8) + b1;
}
/* Following two functions not needed unless creating tables */
void getKey(int i, byte **keys)
{
}
/* Display key i */
void
dispKey(int i)
{
}

11
tools/readsig/readsig.mak Normal file
View File

@@ -0,0 +1,11 @@
CFLAGS = -Zi -c -AL -W3 -D__MSDOS__
readsig.exe: readsig.obj perfhlib.obj
link /CO readsig perfhlib;
readsig.obj: readsig.c dcc.h perfhlib.h
cl $(CFLAGS) $*.c
perfhlib.obj: perfhlib.c dcc.h perfhlib.h
cl $(CFLAGS) $*.c

97
tools/readsig/readsig.txt Normal file
View File

@@ -0,0 +1,97 @@
READSIG
1 What is ReadSig?
2 How do I use ReadSig?
3 What are duplicate signatures?
4 How can I make sense of the signatures?
1 What is ReadSig?
------------------
ReadSig is a quick and dirty program to read signatures from a DCC
signature file. It was originally written as an integrity checker for
signature files, but can now be used to see what's in a signature
file, and which functions have duplicate signatures.
2 How do I use ReadSig?
-----------------------
Just type
readsig <sigfilename>
or
readsig -a <sigfilename>
For example:
readsig -a dcct2p.sig
Either way, you get a list of duplicate signatures, i.e. functions
whose first 23 bytes, after wildcarding and chopping, (see section 3
for details), that have the same signature.
With the -a switch, you also (before the above) get a list of all
symbolic names in the signature file, and the signatures themselves
in hex. This could be a dozen or more pages for large signature
files.
Currently, signatures are 23 bytes long, and the symbolic names are
truncated to 15 characters.
3 What are duplicate signatures?
--------------------------------
Duplicate signatures arise for 3 reasons. 1: length of the signature.
2: wildcards. 3: chopping of the signature.
1: Because signatures are only 23 bytes long, there is a chance that
two distinct signatures (first part of the binary image of a
function) are identical in the first 23 bytes, but diverge later.
2: Because part of the binary image of a function depends on where it
is loaded, parts of the signature are replaced with wildcards. It is
possible that two functions are distinct only in places that are
replaced by the wildcard byte (F4).
3: Signatures are "chopped" (cut short, and the remainder filled with
binary zeroes) after an unconditional branch or subroutine return.
This is to cope with functions shorter than the 23 byte size of
signatures, so unrelated functions are not included at the end of a
signature. (This would cause dcc to fail to recognise these short
signatures if some other function happened to be loaded at the end).
The effect of duplicate signatures is that only one of the functions
that has the same signature will be recognised. For example, suppose
that sin, cos, and tan were just one wildcarded instruction followed
by a jump to the same piece of code. Then all three would have the
same signature, and calls to sin, cos, or tan would all be reported
by dcc as just one of these, e.g. tan. If you suspect that this is
happening, then at least ReadSig can alert you to this problem.
In general, the number of duplicate signatures that would actually be
used in dcc is small, but it is possible that the above problem will
occur.
4 How can I make sense of the signatures?
-----------------------------------------
If you're one of those unfortunate individuals that can't decode hex
instructions in your head, you can always use DispSig to copy it to a
binary file, since you now know the name of the function. Then you
can use debug or some other debugger to disassemble the binary file.
Generally, most entries in signature files will be executable code,
so it should disassemble readily.
Be aware that signatures are wildcarded, so don't pay any attention
to the destination of jmp or call instructions (three or 5 byte
jumps, anyway; 2 byte jumps are not wildcarded), and 16 bit immediate
values. The latter will always be F4F4 (two wildcard bytes),
regardless of what they were in the original function.